SECTION 13 REGRESSION ANALYSIS (REGRESS)

REGRESS permits the estimation of single equation models under various conditions.

SECTION 13.1 LINEAR REGRESSION ANALYSIS (REGR)

REGR estimates the parameters of the model

        Y(i) = b(1)*X(i,1) + b(2)*X(i,2) + . . . + b(k)*X(i,k) + u(i) 

where i indexes observations, b(1) ... b(k) are parameters to be estimated, Y is the dependent variable and the matrix X contains (by columns) the independent variables. u(i) is an error term with zero mean and uncorrelated with the regressors. The regression computation is performed by executing

      CALL REGR(B,X,Y,STD,T,XBAR,YBAR,RSQ,VAR,NDIM,N,K,IER) 

where all real variables should be declared REAL*8 and where

B = vector of computed regression coefficients (of length k)
X = n x k matrix of exogenous variables (n observations on k variables)
Y = vector of dependent variable observations of length n
STD = vector of estimated standard deviations of regression coefficients (of length k)
T = vector of t-values for regression coefficients (of length k) [T(j)=B(j)/STD(j)]
XBAR = vector of mean values for exogenous variables (of length k)
YBAR = mean value of dependent variable
RSQ = R-squared (unadjusted)
VAR = residual variance (sum of squares of residuals divided by n)
NDIM = ndim, the first dimension of X in the calling program
N = n, the variable containing the number of observations
K = k, the variable containing the number of independent variables

1. In the present version the number of exogenous variables k must be less than or equal to 30.

2. If there is a constant term in the regression (as is normally the case) the user should fill the first column of the X matrix with the number 1.0 before calling REGR.

3. Since ndim can be greater than n, the number of rows in X may exceed the number of observations.

4. The user may also include in the MAIN program COMMON/BREGR0/EPSTOL and reset EPSTOL, a tolerance level for matrix inversion. Default value of EPSTOL is 1.D-15.

5. Error codes are as follows:

IER = -3 k exceeds 30
IER = -67 The matrix X'X is singular
IER = -73 The covariance matrix is not positive definite
IER = -74 The Y variable is constant over all observations

Return to Section 13.5

SECTION 13.2 LINEAR REGRESSIONS WITH AUTOCORRELATED ERROR TERMS (AUTOC)

If the error term in the linear regression model of Section 13.1 obeys

                            u(i) = r*u(i-1) + e(i) 

where e(i) are iid normal with mean zero and variance sigma squared, the parameters including r can be estimated by maximum likelihood. The user should include in the MAIN program the following:

      COMMON/USERR1/Y(n)
      COMMON/USERR2/X(ndim,k)
      COMMON/BAUTOC/NDIM,N,K
      EXTERNAL AUTOC,method (for method e.g., GRADX)
C next three statements set values of variables
      NDIM=ndim
      N=n
      K=k
C statements to read in the data into X and Y, to set up the
C necessary parameters
C for OPT and to read in starting values for B
      CALL OPT(B,NP,F,method,ITERL,MAX,IER,ACC,AUTOC,ALABEL) 
where 

NDIM = ndim, the first dimension of the X matrix in the MAIN program
N = n, the number of observations which must be .LE. ndim
K = k, the maximum number of parameters to be estimated in the equation, not counting the error variance and r.

SUBROUTINE AUTOC computes the LOGlikelihood in condensed form, i.e., the potential number of parameters is k+1. If the optimization is NOT with respect to a PROPER SUBSET of a full set of parameters (see Section 1.11), then the user must set NP=K+1 and assign the autocorrelation coefficient to the last location in the vector of parameters (dimensioned NP). If optimization IS done with respect to a subset of parameters (e.g., the equation itself has 6 potential parameters, but in the current run we wish to assign zero to parameters 2 and 4 and not optimize with respect to these), NP would still be 7 (6 potential parameters plus the autocorrelation coefficient), the autocorrelation coefficient would still be assigned to the seventh location in the parameter vector. but NPE=5 and the IPRM array would contain the elements 1 3 5 6 7 2 4 (i.e., we optimize with respect to the 5 parameters 1 3 5 6 7 and hold constant parameters 2 and 4. NOTE: The number of columns in X may, of course, be larger than k; i.e., there may be unused columns in X.

SECTION 13.3 LINEAR REGRESSION WITH HETEROSCEDASTIC ERROR TERMS

The model to be estimated is the following:

       Y(i) = b(1)*X(i,1) + b(2)*X(i,2) + ... + b(k1)*X(i,k1) + u(i) 

where u(i) is distributed normally with mean zero and variance given by

        var(u(i)) = s2*(1. + b(k1+1)*Z(i,1) + ... + b(k1+k2)*Z(i,k2) 

where s2 and b(k1+1),...b(k1+k2) are additional parameters. As before, if there is a constant term in the regression, the user must fill the first column of the X-matrix with 1.0's. The user may place any variables into the Z-matrix that he/she thinks affect the variance; in particular, certain columns of Z may contain certain columns of X. The estimation is by maximum likelihood and a call to OPTOUT produces the asymptotic covariance matrix of the estimates (see Section 1.2). The likelihood function is condensed with respect to the parameter s2. Hence the total number of parameters estimated is k1+k2. A likelihood ratio test of homoscedasticity is easily performed. First estimate the model with respect to all parameters. Then set the last k2 parameters equal to zero and "PERM" them out (see Section 1.11). The difference in the two likelihood values times 2 is the appropriate test statistic (Chi square with k2 degrees of freedom). Note that it is the LOGlikelihood which is being maximized. In addition to the usual statements required to run a GQOPT optimization, the MAIN program must contain the following:

      COMMON/USERH1/Y(n)
      COMMON/USERH2/X(ndim,k1)
      COMMON/USERH3/Z(ndim,k2)
      COMMON/BHETR0/NDIM,N,K1,K2
      EXTERNAL HETER1,method
      NDIM=ndim
      K1=k1
      K2=k2
C Must read in data into X, Z, and Y
      CALL OPT(B,NP,F,method,ITERL,MAX,IER,ACC,HETER1,ALABEL) 

where

NDIM = ndim, the first dimension of X and Z in the calling program
N = n, the number of observations (.LE. ndim)
K1 = k1, the number of coefficients in the equation
K2 = k2, the number of coefficients in the expression for the variance.

NOTE: The number of columns in X and/or Z may, of course, be larger than k1 or k2 respectively; i.e., the may certainly include columns that are not used in a particular run.

SECTION 13.4 HYPOTHESIS TESTING IN LINEAR REGRESSIONS

The routines in this section permit the testing of linear restrictions on the regression coefficients. If the restrictions are only of the type in which certain coefficients are hypothesized to have specific values, the regression subject to these restrictions can be obtained by using

      CALL RESRG2(B,X,Y,R,NDIM,N,K,IR,VAR,IER)

where

Input variables:

X = REAL*8 array of dimension ndim x k, containing n observations (n.LE.ndim) on k variables (the first column should normally be set equal to 1.0's to account for the constant term in the regression )
Y = REAL*8 array with n elements, containg the dependent variable
R = LOGICAL*1 array of k elements, set equal to .TRUE. if the corresponding element of the coefficient vector B is restricted by the hypothesis, and equal to .FALSE. otherwise
NDIM = the first dimension of the X array
N = the variable containing the number of observations n
K = the variable containing the number of independent variables
B = REAL*8 array of k elements; those elements corresponding to elements of R which have been set equal to .TRUE. should contain the hypothesized value for that coefficient. The remaining elements of B are irrelevant.

Output variables:

B = the elements of B corresponding to those elements of R which have been set equal to .FALSE. will contain the coefficient estimates subject to the restrictions
VAR = REAL*8, the restricted sum of squares of residuals
IER = error code, 0 for normal return
IER ==-2 if X'X is singular

If the restrictions are as above, i.e., exclusively of the type such that particular coefficients are hypothesized to have particular values, the hypothesis representing these restrictions can be tested by using

      CALL FTEST(B,C,X,Y,R,STD,T,XBAR,F,W,LAGR,LRAT,NDIM,N,K,IER) 

where

Input variables:

R = LOGICAL*1 array of k elements, with elements corresponding to coefficients to be restricted set equal to .TRUE., with the others set equal to .FALSE.
C = REAL*8 coefficient array of k elements; with those corresponding to elements of R set equal to .TRUE. being set to the hypothesized value
X = REAL*8 array of dimension ndim x k, containing n observations (n.LE.ndim) on k variables (the first column should normally be set equal to 1.0's to account for the constant term in the regression
Y = REAL*8 array with n elements, containg the dependent variable
STD = REAL*8 array of k elements--scratch storage
T = REAL*8 array of k elements--scratch storage
XBAR = REAL*8 array of k elements--scratch storage
NDIM = the first dimension of the X array
N = the variable containing the number of observations n
K = the variable containing the number of independent variables

Output variables:

B = REAL*8 array of k elements, will contain the unrestricted least squares estimates
C = REAL*8 array of k elements, will contain the restricted least squares estimates
FVAL = the F-statistic with numerator degrees of freedom equal to the number of restrictions and denominator degrees of freedom equal to n-k (REAL*8)
W = the Wald statistic (REAL*8)
LAGR = the Lagrange Multiplier test statistic (REAL*8)
LRAT = the likelihood ratio test statistic (REAL*8)
IER = error codes---same as in REGR

If there are general linear restrictions, the hypothesis can be testing by using

      CALL RESRG1(BH,BT,X,Y,RM,RV,STD,T,F,W,LAGR,LRAT,NDIM,N,K,IR,IER) 

where

Input variables:

X = as above
Y = as above
RM = REAL*8 matrix of dimension ir x k containing the coefficients that express the restrictions as in the matrix equation (RM)b = RV, where b is the coefficient vector to be restricted and RV is an ir-vector of constants
RV = REAL*8 vector of constants of length ir
NDIM = as above
N = as above
K = as above
IR = variable containing the number ir, the number of restrictions, which is also the first dimension of RV

Output variables:

BH = REAL*8 vector of length k containing the unrestricted coefficient estimates
BT = REAL*8 vector of length k containing the restricted estimates
STD = REAL*8 vector of length k containing the standard errors of BH T = REAL*8 vector of length k containing the t-values
F = REAL*8 variable containing the F-test value
W = REAL*8 variable containing the Wald Statistic
LAGR = REAL*8 variable containing the Lagrange multiplier test statistic
LRAT = REAL*8 variable containing the likelihood ratio test statistic IER = as above

SECTION 13.5 RECURSIVE RESIDUALS

This section permits the computation of linear, unbiased regression residuals with scalar covariance matrix. See Theorems 35, 36 and 37 in Elementary Regression Theory−Part 2.

The call is

     CALL RECURS(B,X,Y,U,V,NDIM,N,K,IER)

where B, X, Y, NDIM, N, K have exactly the same meaning as in Section 13.1, and where U and V must be REAL*8 arrays dimensioned U(N), V(N). U will contain the recursive residuals (since there are only n−k such rersiduals, the first k will be zero), and V will contain the ordinary least squares residuals. IER is an error return with the same meaning as in Section 13.1.

Note that the sum of squares of the ordinary and recursive residuals is the same; see the test program XRECURS.FOR.

SECTION 13.6 JACKKNIFE REGRESSIONS

See Sunil K. Sapra, "A Jackknife Maximum Likelihood Estimator for the Probit Model," Applied Economics Letters, 9(2002), 73-74. The jackknife estimator is computed by alternately omitting each one of the n observations and computing the regression estimates. If B(i) denotes the regression estimate with the ith observation omitted, B(.) the quantity (1/n)ΣB(i), and B the regression estimate based on all observations, the bias-corrected jackknife regression estimate is

nB − (n-1)B(i).

Call the jackknife estimator as
      CALL REGRJACK(B,X,Y,STD,T,XBAR,YBAR,RSQ,VAR,NDIM,N,K,IER,BSTOR,SSTOR,TSTOR,RSTOR)

where all arrays should be REAL*8 and where

X, Y, NDIM, N, K and IER have the same meanings as in Section 13.1
B and STD respectively will contain the jackknife estimates and the jackknife standard deviations and should be dimension as in Section 13.1
XBAR,YBAR,RSQ,VAR are used only internally (but XBAR dimensioned as in Section 3.1)
BSTOR = dimensioned BSTOR(0:N,K) contains the individual regression estimates, with B(0,K) containing the estimates with no data omission,
SSTOR = dimensioned SSTOR(0:N,K) contains the individual standard deviations
TSTOR = dimensioned TSTOR(0:N,K) contains the individual t-values
RSTOR = dimensioned RSTOR(0:N) contains the individual R2

SECTION 13.7 PRINCIPAL COMPONENTS

See J. Johnston, Econometric Methods, 2nd ed.; New York: McGraw-Hill, 1963.

This section computes the principal components of k vectors of length n stored in a REAL*8 matrix X of dimensions n by k.The call to the subroutine is

      CALL PRINCOMP(X,XX,P,N,K,PROP,Z)

where all arguments should be REAL*8 and where

X = n by k input array of the k variable (input)
XX = k by k matrix which will contain as output the eigenvalues of X'X on the main diagonal in descending order (output)
P = k byt k matrix of corresponding eigenvectors stored as columns (output)
N = n (input)
K = k (input)
PROP = an array dimensioned PROP(K) which will contain the proportions of total variation contributed by the principal components (output)
Z = an array dimensioned Z(N,K), which will contain the principal components (output).

Return to the Beginning