SECTION 13 REGRESSION ANALYSIS (REGRESS)

REGRESS permits the estimation of single equation models under various conditions.

SECTION 13.1 LINEAR REGRESSION ANALYSIS (REGR)

REGR estimates the parameters of the model

Y(i) = b(1)*X(i,1) + b(2)*X(i,2) + . . . + b(k)*X(i,k) + u(i)

where i indexes observations, b(1) ... b(k) are parameters to be estimated, Y is the dependent variable and the matrix X contains (by columns) the independent variables. u(i) is an error term with zero mean and uncorrelated with the regressors. The regression computation is performed by executing

CALL REGR(B,X,Y,STD,T,XBAR,YBAR,RSQ,VAR,NDIM,N,K,IER)

where all real variables should be declared REAL*8 and where

B = vector of computed regression coefficients (of length k)

X = n x k matrix of exogenous variables (n observations on k
variables)

Y = vector of dependent variable observations of length n

STD = vector of estimated standard deviations of regression
coefficients (of length k)

T = vector of t-values for regression coefficients (of length k)
[T(j)=B(j)/STD(j)]

XBAR = vector of mean values for exogenous variables (of length
k)

YBAR = mean value of dependent variable

RSQ = R-squared (unadjusted)

VAR = residual variance (sum of squares of residuals divided by
n)

NDIM = ndim, the first dimension of X in the calling program

N = n, the variable containing the number of observations

K = k, the variable containing the number of independent
variables

1. In the present version the number of exogenous variables k must be less than or equal to 30.

2. If there is a constant term in the regression (as is normally the case) the user should fill the first column of the X matrix with the number 1.0 before calling REGR.

3. Since ndim can be greater than n, the number of rows in X may exceed the number of observations.

4. The user may also include in the MAIN program COMMON/BREGR0/EPSTOL and reset EPSTOL, a tolerance level for matrix inversion. Default value of EPSTOL is 1.D-15.

5. Error codes are as follows:

IER = -3 k exceeds 30

IER = -67 The matrix X'X is singular

IER = -73 The covariance matrix is not positive definite

IER = -74 The Y variable is constant over all observations

Return to Section 13.5

SECTION 13.2 LINEAR REGRESSIONS WITH AUTOCORRELATED ERROR TERMS (AUTOC)

If the error term in the linear regression model of Section 13.1 obeys

u(i) = r*u(i-1) + e(i)

where e(i) are iid normal with mean zero and variance sigma squared, the parameters including r can be estimated by maximum likelihood. The user should include in the MAIN program the following:

COMMON/USERR1/Y(n) COMMON/USERR2/X(ndim,k) COMMON/BAUTOC/NDIM,N,K EXTERNAL AUTOC,method (for method e.g., GRADX) C next three statements set values of variables NDIM=ndim N=n K=k C statements to read in the data into X and Y, to set up the C necessary parameters C for OPT and to read in starting values for B CALL OPT(B,NP,F,method,ITERL,MAX,IER,ACC,AUTOC,ALABEL)

where

NDIM = ndim, the first dimension of the X matrix in the MAIN
program

N = n, the number of observations which must be .LE. ndim

K = k, the maximum number of parameters to be estimated in the
equation, not counting the error variance and r.

SUBROUTINE AUTOC computes the LOGlikelihood in condensed form, i.e., the potential number of parameters is k+1. If the optimization is NOT with respect to a PROPER SUBSET of a full set of parameters (see Section 1.11), then the user must set NP=K+1 and assign the autocorrelation coefficient to the last location in the vector of parameters (dimensioned NP). If optimization IS done with respect to a subset of parameters (e.g., the equation itself has 6 potential parameters, but in the current run we wish to assign zero to parameters 2 and 4 and not optimize with respect to these), NP would still be 7 (6 potential parameters plus the autocorrelation coefficient), the autocorrelation coefficient would still be assigned to the seventh location in the parameter vector. but NPE=5 and the IPRM array would contain the elements 1 3 5 6 7 2 4 (i.e., we optimize with respect to the 5 parameters 1 3 5 6 7 and hold constant parameters 2 and 4. NOTE: The number of columns in X may, of course, be larger than k; i.e., there may be unused columns in X.

SECTION 13.3 LINEAR REGRESSION WITH HETEROSCEDASTIC ERROR TERMS

The model to be estimated is the following:

Y(i) = b(1)*X(i,1) + b(2)*X(i,2) + ... + b(k1)*X(i,k1) + u(i)

where u(i) is distributed normally with mean zero and variance given by

var(u(i)) = s^{2}*(1. + b(k1+1)*Z(i,1) + ... + b(k1+k2)*Z(i,k2)

where s^{2} and b(k1+1),...b(k1+k2) are additional
parameters. As before, if there is a constant term in the
regression, the user must fill the first column of the X-matrix
with 1.0's. The user may place any variables into the Z-matrix
that he/she thinks affect the variance; in particular, certain
columns of Z may contain certain columns of X. The estimation is
by maximum likelihood and a call to OPTOUT produces the
asymptotic covariance matrix of the estimates (see Section 1.2). The likelihood function is
condensed with respect to the parameter s^{2}. Hence the
total number of parameters estimated is k1+k2. A likelihood ratio
test of homoscedasticity is easily performed. First estimate the
model with respect to all parameters. Then set the last k2
parameters equal to zero and "PERM" them out (see Section 1.11). The difference in the two
likelihood values times 2 is the appropriate test statistic (Chi
square with k2 degrees of freedom). Note that it is the
LOGlikelihood which is being maximized. In addition to the usual
statements required to run a GQOPT optimization, the MAIN program
must contain the following:

COMMON/USERH1/Y(n) COMMON/USERH2/X(ndim,k1) COMMON/USERH3/Z(ndim,k2) COMMON/BHETR0/NDIM,N,K1,K2 EXTERNAL HETER1,method NDIM=ndim K1=k1 K2=k2 C Must read in data into X, Z, and Y CALL OPT(B,NP,F,method,ITERL,MAX,IER,ACC,HETER1,ALABEL)

where

NDIM = ndim, the first dimension of X and Z in the calling
program

N = n, the number of observations (.LE. ndim)

K1 = k1, the number of coefficients in the equation

K2 = k2, the number of coefficients in the expression for the
variance.

NOTE: The number of columns in X and/or Z may, of course, be larger than k1 or k2 respectively; i.e., the may certainly include columns that are not used in a particular run.

SECTION 13.4 HYPOTHESIS TESTING IN LINEAR REGRESSIONS

The routines in this section permit the testing of linear restrictions on the regression coefficients. If the restrictions are only of the type in which certain coefficients are hypothesized to have specific values, the regression subject to these restrictions can be obtained by using

CALL RESRG2(B,X,Y,R,NDIM,N,K,IR,VAR,IER)

where

Input variables:

X = REAL*8 array of dimension ndim x k, containing n
observations (n.LE.ndim) on k variables (the first column should
normally be set equal to 1.0's to account for the constant term
in the regression )

Y = REAL*8 array with n elements, containg the dependent variable

R = LOGICAL*1 array of k elements, set equal to .TRUE. if the
corresponding element of the coefficient vector B is restricted
by the hypothesis, and equal to .FALSE. otherwise

NDIM = the first dimension of the X array

N = the variable containing the number of observations n

K = the variable containing the number of independent variables

B = REAL*8 array of k elements; those elements corresponding to
elements of R which have been set equal to .TRUE. should contain
the hypothesized value for that coefficient. The remaining
elements of B are irrelevant.

Output variables:

B = the elements of B corresponding to those elements of R
which have been set equal to .FALSE. will contain the coefficient
estimates subject to the restrictions

VAR = REAL*8, the restricted sum of squares of residuals

IER = error code, 0 for normal return

IER ==-2 if X'X is singular

If the restrictions are as above, i.e., exclusively of the type such that particular coefficients are hypothesized to have particular values, the hypothesis representing these restrictions can be tested by using

CALL FTEST(B,C,X,Y,R,STD,T,XBAR,F,W,LAGR,LRAT,NDIM,N,K,IER)

where

Input variables:

R = LOGICAL*1 array of k elements, with elements corresponding
to coefficients to be restricted set equal to .TRUE., with the
others set equal to .FALSE.

C = REAL*8 coefficient array of k elements; with those
corresponding to elements of R set equal to .TRUE. being set to
the hypothesized value

X = REAL*8 array of dimension ndim x k, containing n observations
(n.LE.ndim) on k variables (the first column should normally be
set equal to 1.0's to account for the constant term in the
regression

Y = REAL*8 array with n elements, containg the dependent variable

STD = REAL*8 array of k elements--scratch storage

T = REAL*8 array of k elements--scratch storage

XBAR = REAL*8 array of k elements--scratch storage

NDIM = the first dimension of the X array

N = the variable containing the number of observations n

K = the variable containing the number of independent variables

Output variables:

B = REAL*8 array of k elements, will contain the unrestricted
least squares estimates

C = REAL*8 array of k elements, will contain the restricted least
squares estimates

FVAL = the F-statistic with numerator degrees of freedom equal to
the number of restrictions and denominator degrees of freedom
equal to n-k (REAL*8)

W = the Wald statistic (REAL*8)

LAGR = the Lagrange Multiplier test statistic (REAL*8)

LRAT = the likelihood ratio test statistic (REAL*8)

IER = error codes---same as in REGR

If there are general linear restrictions, the hypothesis can be testing by using

CALL RESRG1(BH,BT,X,Y,RM,RV,STD,T,F,W,LAGR,LRAT,NDIM,N,K,IR,IER)

where

Input variables:

X = as above

Y = as above

RM = REAL*8 matrix of dimension ir x k containing the
coefficients that express the restrictions as in the matrix
equation (RM)b = RV, where b is the coefficient vector to be
restricted and RV is an ir-vector of constants

RV = REAL*8 vector of constants of length ir

NDIM = as above

N = as above

K = as above

IR = variable containing the number ir, the number of
restrictions, which is also the first dimension of RV

Output variables:

BH = REAL*8 vector of length k containing the unrestricted
coefficient estimates

BT = REAL*8 vector of length k containing the restricted
estimates

STD = REAL*8 vector of length k containing the standard errors of
BH T = REAL*8 vector of length k containing the t-values

F = REAL*8 variable containing the F-test value

W = REAL*8 variable containing the Wald Statistic

LAGR = REAL*8 variable containing the Lagrange multiplier test
statistic

LRAT = REAL*8 variable containing the likelihood ratio test
statistic IER = as above

SECTION 13.5 RECURSIVE RESIDUALS

This section permits the computation of linear, unbiased regression residuals with scalar covariance matrix. See Theorems 35, 36 and 37 in Elementary Regression Theory−Part 2.

The call is

CALL RECURS(B,X,Y,U,V,NDIM,N,K,IER)

where B, X, Y, NDIM, N, K have exactly the same meaning as in Section 13.1, and where U and V must be REAL*8 arrays dimensioned U(N), V(N). U will contain the recursive residuals (since there are only n−k such rersiduals, the first k will be zero), and V will contain the ordinary least squares residuals. IER is an error return with the same meaning as in Section 13.1.

Note that the sum of squares of the ordinary and recursive residuals is the same; see the test program XRECURS.FOR.

SECTION 13.6 JACKKNIFE REGRESSIONS

See Sunil K. Sapra, "A Jackknife Maximum Likelihood Estimator for the Probit Model," Applied Economics Letters, 9(2002), 73-74.
The jackknife estimator is computed by alternately omitting each one of the n observations and computing the regression estimates. If *B(i)* denotes the
regression estimate with the ith observation omitted, *B(.)* the
quantity *(1/n)ΣB(i)*, and *B* the regression estimate based on all observations, the bias-corrected jackknife
regression estimate is

CALL REGRJACK(B,X,Y,STD,T,XBAR,YBAR,RSQ,VAR,NDIM,N,K,IER,BSTOR,SSTOR,TSTOR,RSTOR)

where all arrays should be REAL*8 and where

X, Y, NDIM, N, K and IER have the same meanings as in Section 13.1

B and STD respectively will contain the jackknife estimates and the jackknife standard deviations and should be dimension as in Section 13.1

XBAR,YBAR,RSQ,VAR are used only internally (but XBAR dimensioned as in Section 3.1)

BSTOR = dimensioned BSTOR(0:N,K) contains the individual regression estimates, with B(0,K) containing the estimates with no data omission,

SSTOR = dimensioned SSTOR(0:N,K) contains the individual standard deviations

TSTOR = dimensioned TSTOR(0:N,K) contains the individual t-values

RSTOR = dimensioned RSTOR(0:N) contains the individual R^{2}

SECTION 13.7 PRINCIPAL COMPONENTS

See J. Johnston, *Econometric Methods*, 2nd ed.; New York: McGraw-Hill, 1963.

This section computes the principal components of k vectors of length n stored in a REAL*8 matrix X of dimensions n by k.The call to the subroutine is

CALL PRINCOMP(X,XX,P,N,K,PROP,Z)

where all arguments should be REAL*8 and where

X = n by k input array of the k variable (input)

XX = k by k matrix which will contain as output the eigenvalues of X'X on the main diagonal in descending order (output)

P = k byt k matrix of corresponding eigenvectors stored as columns (output)

N = n (input)

K = k (input)

PROP = an array dimensioned PROP(K) which will contain the proportions of total variation contributed by the principal components (output)

Z = an array dimensioned Z(N,K), which will contain the principal components (output).