SECTION 13 REGRESSION ANALYSIS (REGRESS)
REGRESS permits the estimation of single equation models under various conditions.
SECTION 13.1 LINEAR REGRESSION ANALYSIS (REGR)
REGR estimates the parameters of the model
Y(i) = b(1)*X(i,1) + b(2)*X(i,2) + . . . + b(k)*X(i,k) + u(i)
where i indexes observations, b(1) ... b(k) are parameters to be estimated, Y is the dependent variable and the matrix X contains (by columns) the independent variables. u(i) is an error term with zero mean and uncorrelated with the regressors. The regression computation is performed by executing
CALL REGR(B,X,Y,STD,T,XBAR,YBAR,RSQ,VAR,NDIM,N,K,IER)
where all real variables should be declared REAL*8 and where
B = vector of computed regression coefficients (of length k)
X = n x k matrix of exogenous variables (n observations on k
variables)
Y = vector of dependent variable observations of length n
STD = vector of estimated standard deviations of regression
coefficients (of length k)
T = vector of t-values for regression coefficients (of length k)
[T(j)=B(j)/STD(j)]
XBAR = vector of mean values for exogenous variables (of length
k)
YBAR = mean value of dependent variable
RSQ = R-squared (unadjusted)
VAR = residual variance (sum of squares of residuals divided by
n)
NDIM = ndim, the first dimension of X in the calling program
N = n, the variable containing the number of observations
K = k, the variable containing the number of independent
variables
1. In the present version the number of exogenous variables k must be less than or equal to 30.
2. If there is a constant term in the regression (as is normally the case) the user should fill the first column of the X matrix with the number 1.0 before calling REGR.
3. Since ndim can be greater than n, the number of rows in X may exceed the number of observations.
4. The user may also include in the MAIN program COMMON/BREGR0/EPSTOL and reset EPSTOL, a tolerance level for matrix inversion. Default value of EPSTOL is 1.D-15.
5. Error codes are as follows:
IER = -3 k exceeds 30
IER = -67 The matrix X'X is singular
IER = -73 The covariance matrix is not positive definite
IER = -74 The Y variable is constant over all observations
SECTION 13.2 LINEAR REGRESSIONS WITH AUTOCORRELATED ERROR TERMS (AUTOC)
If the error term in the linear regression model of Section 13.1 obeys
u(i) = r*u(i-1) + e(i)
where e(i) are iid normal with mean zero and variance sigma squared, the parameters including r can be estimated by maximum likelihood. The user should include in the MAIN program the following:
COMMON/USERR1/Y(n) COMMON/USERR2/X(ndim,k) COMMON/BAUTOC/NDIM,N,K EXTERNAL AUTOC,method (for method e.g., GRADX) C next three statements set values of variables NDIM=ndim N=n K=k C statements to read in the data into X and Y, to set up the C necessary parameters C for OPT and to read in starting values for B CALL OPT(B,NP,F,method,ITERL,MAX,IER,ACC,AUTOC,ALABEL)
where
NDIM = ndim, the first dimension of the X matrix in the MAIN
program
N = n, the number of observations which must be .LE. ndim
K = k, the maximum number of parameters to be estimated in the
equation, not counting the error variance and r.
SUBROUTINE AUTOC computes the LOGlikelihood in condensed form, i.e., the potential number of parameters is k+1. If the optimization is NOT with respect to a PROPER SUBSET of a full set of parameters (see Section 1.11), then the user must set NP=K+1 and assign the autocorrelation coefficient to the last location in the vector of parameters (dimensioned NP). If optimization IS done with respect to a subset of parameters (e.g., the equation itself has 6 potential parameters, but in the current run we wish to assign zero to parameters 2 and 4 and not optimize with respect to these), NP would still be 7 (6 potential parameters plus the autocorrelation coefficient), the autocorrelation coefficient would still be assigned to the seventh location in the parameter vector. but NPE=5 and the IPRM array would contain the elements 1 3 5 6 7 2 4 (i.e., we optimize with respect to the 5 parameters 1 3 5 6 7 and hold constant parameters 2 and 4. NOTE: The number of columns in X may, of course, be larger than k; i.e., there may be unused columns in X.
SECTION 13.3 LINEAR REGRESSION WITH HETEROSCEDASTIC ERROR TERMS
The model to be estimated is the following:
Y(i) = b(1)*X(i,1) + b(2)*X(i,2) + ... + b(k1)*X(i,k1) + u(i)
where u(i) is distributed normally with mean zero and variance given by
var(u(i)) = s2*(1. + b(k1+1)*Z(i,1) + ... + b(k1+k2)*Z(i,k2)
where s2 and b(k1+1),...b(k1+k2) are additional parameters. As before, if there is a constant term in the regression, the user must fill the first column of the X-matrix with 1.0's. The user may place any variables into the Z-matrix that he/she thinks affect the variance; in particular, certain columns of Z may contain certain columns of X. The estimation is by maximum likelihood and a call to OPTOUT produces the asymptotic covariance matrix of the estimates (see Section 1.2). The likelihood function is condensed with respect to the parameter s2. Hence the total number of parameters estimated is k1+k2. A likelihood ratio test of homoscedasticity is easily performed. First estimate the model with respect to all parameters. Then set the last k2 parameters equal to zero and "PERM" them out (see Section 1.11). The difference in the two likelihood values times 2 is the appropriate test statistic (Chi square with k2 degrees of freedom). Note that it is the LOGlikelihood which is being maximized. In addition to the usual statements required to run a GQOPT optimization, the MAIN program must contain the following:
COMMON/USERH1/Y(n) COMMON/USERH2/X(ndim,k1) COMMON/USERH3/Z(ndim,k2) COMMON/BHETR0/NDIM,N,K1,K2 EXTERNAL HETER1,method NDIM=ndim K1=k1 K2=k2 C Must read in data into X, Z, and Y CALL OPT(B,NP,F,method,ITERL,MAX,IER,ACC,HETER1,ALABEL)
where
NDIM = ndim, the first dimension of X and Z in the calling
program
N = n, the number of observations (.LE. ndim)
K1 = k1, the number of coefficients in the equation
K2 = k2, the number of coefficients in the expression for the
variance.
NOTE: The number of columns in X and/or Z may, of course, be larger than k1 or k2 respectively; i.e., the may certainly include columns that are not used in a particular run.
SECTION 13.4 HYPOTHESIS TESTING IN LINEAR REGRESSIONS
The routines in this section permit the testing of linear restrictions on the regression coefficients. If the restrictions are only of the type in which certain coefficients are hypothesized to have specific values, the regression subject to these restrictions can be obtained by using
CALL RESRG2(B,X,Y,R,NDIM,N,K,IR,VAR,IER)
where
Input variables:
X = REAL*8 array of dimension ndim x k, containing n
observations (n.LE.ndim) on k variables (the first column should
normally be set equal to 1.0's to account for the constant term
in the regression )
Y = REAL*8 array with n elements, containg the dependent variable
R = LOGICAL*1 array of k elements, set equal to .TRUE. if the
corresponding element of the coefficient vector B is restricted
by the hypothesis, and equal to .FALSE. otherwise
NDIM = the first dimension of the X array
N = the variable containing the number of observations n
K = the variable containing the number of independent variables
B = REAL*8 array of k elements; those elements corresponding to
elements of R which have been set equal to .TRUE. should contain
the hypothesized value for that coefficient. The remaining
elements of B are irrelevant.
Output variables:
B = the elements of B corresponding to those elements of R
which have been set equal to .FALSE. will contain the coefficient
estimates subject to the restrictions
VAR = REAL*8, the restricted sum of squares of residuals
IER = error code, 0 for normal return
IER ==-2 if X'X is singular
If the restrictions are as above, i.e., exclusively of the type such that particular coefficients are hypothesized to have particular values, the hypothesis representing these restrictions can be tested by using
CALL FTEST(B,C,X,Y,R,STD,T,XBAR,F,W,LAGR,LRAT,NDIM,N,K,IER)
where
Input variables:
R = LOGICAL*1 array of k elements, with elements corresponding
to coefficients to be restricted set equal to .TRUE., with the
others set equal to .FALSE.
C = REAL*8 coefficient array of k elements; with those
corresponding to elements of R set equal to .TRUE. being set to
the hypothesized value
X = REAL*8 array of dimension ndim x k, containing n observations
(n.LE.ndim) on k variables (the first column should normally be
set equal to 1.0's to account for the constant term in the
regression
Y = REAL*8 array with n elements, containg the dependent variable
STD = REAL*8 array of k elements--scratch storage
T = REAL*8 array of k elements--scratch storage
XBAR = REAL*8 array of k elements--scratch storage
NDIM = the first dimension of the X array
N = the variable containing the number of observations n
K = the variable containing the number of independent variables
Output variables:
B = REAL*8 array of k elements, will contain the unrestricted
least squares estimates
C = REAL*8 array of k elements, will contain the restricted least
squares estimates
FVAL = the F-statistic with numerator degrees of freedom equal to
the number of restrictions and denominator degrees of freedom
equal to n-k (REAL*8)
W = the Wald statistic (REAL*8)
LAGR = the Lagrange Multiplier test statistic (REAL*8)
LRAT = the likelihood ratio test statistic (REAL*8)
IER = error codes---same as in REGR
If there are general linear restrictions, the hypothesis can be testing by using
CALL RESRG1(BH,BT,X,Y,RM,RV,STD,T,F,W,LAGR,LRAT,NDIM,N,K,IR,IER)
where
Input variables:
X = as above
Y = as above
RM = REAL*8 matrix of dimension ir x k containing the
coefficients that express the restrictions as in the matrix
equation (RM)b = RV, where b is the coefficient vector to be
restricted and RV is an ir-vector of constants
RV = REAL*8 vector of constants of length ir
NDIM = as above
N = as above
K = as above
IR = variable containing the number ir, the number of
restrictions, which is also the first dimension of RV
Output variables:
BH = REAL*8 vector of length k containing the unrestricted
coefficient estimates
BT = REAL*8 vector of length k containing the restricted
estimates
STD = REAL*8 vector of length k containing the standard errors of
BH T = REAL*8 vector of length k containing the t-values
F = REAL*8 variable containing the F-test value
W = REAL*8 variable containing the Wald Statistic
LAGR = REAL*8 variable containing the Lagrange multiplier test
statistic
LRAT = REAL*8 variable containing the likelihood ratio test
statistic IER = as above
SECTION 13.5 RECURSIVE RESIDUALS
This section permits the computation of linear, unbiased regression residuals with scalar covariance matrix. See Theorems 35, 36 and 37 in Elementary Regression Theory−Part 2.
The call is
CALL RECURS(B,X,Y,U,V,NDIM,N,K,IER)
Note that the sum of squares of the ordinary and recursive residuals is the same; see the test program XRECURS.FOR.
SECTION 13.6 JACKKNIFE REGRESSIONS
See Sunil K. Sapra, "A Jackknife Maximum Likelihood Estimator for the Probit Model," Applied Economics Letters, 9(2002), 73-74. The jackknife estimator is computed by alternately omitting each one of the n observations and computing the regression estimates. If B(i) denotes the regression estimate with the ith observation omitted, B(.) the quantity (1/n)ΣB(i), and B the regression estimate based on all observations, the bias-corrected jackknife regression estimate is
CALL REGRJACK(B,X,Y,STD,T,XBAR,YBAR,RSQ,VAR,NDIM,N,K,IER,BSTOR,SSTOR,TSTOR,RSTOR)
where all arrays should be REAL*8 and where
X, Y, NDIM, N, K and IER have the same meanings as in Section 13.1
B and STD respectively will contain the jackknife estimates and the jackknife standard deviations and should be dimension as in Section 13.1
XBAR,YBAR,RSQ,VAR are used only internally (but XBAR dimensioned as in Section 3.1)
BSTOR = dimensioned BSTOR(0:N,K) contains the individual regression estimates, with B(0,K) containing the estimates with no data omission,
SSTOR = dimensioned SSTOR(0:N,K) contains the individual standard deviations
TSTOR = dimensioned TSTOR(0:N,K) contains the individual t-values
RSTOR = dimensioned RSTOR(0:N) contains the individual R2
SECTION 13.7 PRINCIPAL COMPONENTS
See J. Johnston, Econometric Methods, 2nd ed.; New York: McGraw-Hill, 1963.
This section computes the principal components of k vectors of length n stored in a REAL*8 matrix X of dimensions n by k.The call to the subroutine is
CALL PRINCOMP(X,XX,P,N,K,PROP,Z)
where all arguments should be REAL*8 and where
X = n by k input array of the k variable (input)
XX = k by k matrix which will contain as output the eigenvalues of X'X on the main diagonal in descending order (output)
P = k byt k matrix of corresponding eigenvectors stored as columns (output)
N = n (input)
K = k (input)
PROP = an array dimensioned PROP(K) which will contain the proportions of total variation contributed by the principal components (output)
Z = an array dimensioned Z(N,K), which will contain the principal components (output).