SECTION 15

STATISTICAL ROUTINES SECTION

15.1 CUMULATIVE SAMPLE DISTRIBUTION (CUMDIS)

In order to obtain the sample cumulative distribution of a set of numbers in the array A, issue the command

      CALL CUMDIS(A,P,N)

where

A = an array dimensioned at least N (double precision)
P = an array dimensioned at least N (double precision)
N = an integer giving the number of data points in A

Upon returning from CUMDIS, the data in A will have been sorted in ascending order. The array P will contain the relative frequencies; i.e., P(I) is the relative frequency of observations with values less than or equal to P(I). Error messages: TOO FEW DATA indicates that N was less than or equal to 1. The message is written only if IPT or JPT are nonzero (see Section 1.1).

SECTION 15.2 SORTING (SORT1, SORT2, SORT3, SORT4)

To sort data stored in an array A, dimensioned at least N, issue the command

      CALL SORT1(A,N) 

where

A = contains the data to be sorted (REAL*8)
N = number of observations to be sorted

Upon returning from SORT1, A will contain the data sorted into ascending order.

If the array A is single precision (REAL*4), use the subroutine SORT1A instead.

To sort a REAL*4 array A into ascending order and to perform the same permutations on a REAL*4 array B, issue the command

      CALL SORT2(A,B,N) 

where

A = contains the data to be sorted (REAL*4)
B = data to be permuted parallel with A (REAL*4)
N = number of observations to be sorted.

Thus, if A contains 3.,4.,2.,1., and B contains 4.,3.,2.,1., after invoking SORT2 A will contain 1.,2.,3.,4., and B will contain 1.,2.,4.,3.

To sort a REAL*8 array A into ascending order and to perform the same permutations on a INTEGER*4 array IP, issue the command

      CALL SORT3(A,IP,N) 

where

A = contains the data to be sorted (REAL*8)
IP = data to be permuted parallel with A (INTEGER*4)
N = number of observations to be sorted.

If the array A is single precision (REAL*4), use the subroutine SORT3A instead.

SORT4(A,IP,N) is identical with SORT3, but sorts A into descending order.

SECTION 15.3 SHAPIRO-WILK STATISTIC (SHPWLK)

In order to test the null hypothesis that a sample of n observations (2 .LE. n .LE. 50) comes from a normal distribution, using the Shapiro-Wilk statistic (see Shapiro, S. S. and M. B. Wilk, "An Analysis of Variance Test for Normality (Complete Samples)," Biometrika, 52 (1965), 591-611).

      CALL SHPWLK(X,N,SW,VAR) 

where

X is an array containing N observations,
SW is the Shapiro-Wilk statistic and
VAR is the sample variance.

Before issuing the call above, you should, as usual, CALL DFLT in order to initialize IPT and JPT for controlling the print options. If these are not turned off, the significance levels for SW are also given. All variables and arrays are double precision.

SECTION 15.4 CHI SQUARED DISTRIBUTION (CHISPR)

To compute the probability that a random variable is Chi-squared distributed with DF degrees of freedom, use

      CALL CHISPR(X,DF,P,DENS,IER) 

where all floating point quantities are single precision and where

X = Value of the Chi square statistic (input)
DF = Degrees of freedom (input)
P = Probability that a Chi squared variate is less than or equal to X (output)
DENS = The density of the Chi square distribution at X (output)
IER = Error return with 0 = normal termination, -1= invalid input, invalid output (P.LT.0 or P.GT.1 or T1 has failed to converge). P in these cases is set to + or - 3.E38.(output)

SECTION 15.5 SMIRNOV LIMIT PROBABILITY (SMIRNV)

See Kendall and Stuart, The Advanced Theory of Statistics, Vol.2 or Siegel, S. and N. J. Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, 1988, pp. 51-55. To perform the Kolmogorov-Smirnov one-sample test comparing a sample of data with a hypothesized distribution, compute

                             D = max |F(X) - S(X)| 

where X ranges over the sample points, F(X) is the value of the hypothesized cumulative distribution and S(X) is the corresponding value of the sample cumulative distribution. Then Y=DSQRT(SNGL(N))*D CALL SMIRNV(Y,P) where N = sample size P = the (output) asymptotic probability that the discrepancy between the two cumulative distributions is greater than the observed one. Y and P are single precision numbers.

SECTION 15.6 SPEARMAN'S RHO

See Siegel, S. and N. J. Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, 1988, pp.235-244 or Kendall, M. G., Rank Correlation Methods, Charles Griffin, 1962. This routine computes the Spearman rank correlation coefficient by the following call

      CALL SPEAR1(X,Y,N,RHO) 

where

X = array containing the ranks of the x-series (REAL*4)
Y = array containing the ranks of the Y-series (REAL*4)
N = the number of observations in X and Y
RHO = the rank correlation coefficient (REAL*4)
Printing of error messages is controlled by the COMMON/BPRINT/ block.

If some observations are tied, the ranks assigned should be the average ranks applicable to those observations (e.g., if first two observations in the original series of x's have the same value, the corresponding entries in the X-array should be 1.5, 1.5). The rank correlation coefficient computation makes the apporpriate adjustment for tied observations.

SECTION 15.7 KENDALL'S TAU

See references under 15.6. Kendall's Tau is computed by the call

      CALL TAUK(X,Y,N,TAU,Z) 

where

X = array containing the ranks on the x-series (REAL*4)
Y = array containing the ranks on the y-series (REAL*4)
N = number of observations in X and Y
TAU= the computed Tau coefficient (REAL*4)
Z = the asymptotic normal test statistic for testing the significance of TAU (REAL*4)
Printing of error messages is controlled by the COMMON/BPRINT/ block.

Tied observations are to be handled as in 15.6.

SECTION 15.8 KENDALL'S PARTIAL TAU

For references see 15.6. Kendall's partial tau coefficient is computed by

      CALL TAU3(X,Y,Z,N,TAU) 

where

X = array of ranks on the x-series (REAL*4)
Y = array of ranks on the y-series (REAL*4)
Z = array of ranks on the z-series (REAL*4)
N = the number of observations in X, Y, and Z Must be .LE. 1000
TAU = the partial tau coefficient tauxy.z Printing of error messages is controlled by the COMMON/BPRINT/ block.

Tied observations are handled as in 15.6.

SECTION 15.9 KENDALL'S W COEFFICIENT OF CONCORDANCE.

See Siegel, S. and N. J. Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, 1988, pp.262-271. If M judges rank N items, W measures the degree of agreement among them. It is computed by the following:

      CALL KENCORD(RK,M,N,S,W,CHISQ) 

where

RK = M by N array of ranks (REAL*4)
M = number of judges (rows in RK)
N = number of items (columns in RK, must be less than 101)
S = sum of squared deviations of column sums from their mean (needed for small sample test) (REAL*4)
W = Kendall's W coefficient of concordance (REAL*4)
CHISQ = Chi squared statistic for testing null hypothesis of no agreement (N-1 d.f.) (REAL*4)

SECTION 15.10 KOLMOGOROV-SMIRNOV 1-SAMPLE TEST

See Morris H. DeGroot, Probability and Statistics, 1989. Given a sample of n data x, with cumulative sample distribution F(x), and a hypothesized distribution function G(x), the Kolmogorov- Smirnov statistic tests the null hypothesis that the x-data were generated by G(x). The test is based on the statistic sup|F(x)-G(x)| and is computed by the following:

      CALL KLSMR1(X,B,P,N,SIG,D,D1,CRIT,DIST,IER) 

where

X = a REAL*8 array, dimensioned N, containing the data
B = a REAL*8 array, dimensioned N, which will contain the hypothesized theoretical probabilities
P = a REAL*8 array, dimensioned N, which will contain the cumulative sample distribution of x
N = the number of data points
SIG = the significance level requested (e.g. 0.05)
D = the largest absolute difference between F(x) and G(x)
D1 = the Kolmogorov-Smirnov Statistic (DSQRT(N)*D)
CRIT = the critical value of D1
DIST = the name of a REAL*8 function which computes the hypothesized cumulative probabilities for each element of x. DIST must be declared in an EXTERNAL statement in the calling program
IER = error return (0 for normal return, -1 if SIG.LT.0)

SECTION 15.11 KOLMOGOROV-SMIRNOV 2-SAMPLE TEST

See Morris H. DeGroot, Probability and Statistics, 1989. Given two samples, a sample of n x's and m y's, the test tests the null hypothesis that the samples have been drawn from the same distribution. Letting F(x) and G(y) be the two sample distribution functions, the test is based on

                                sup|F(x)-G(y)|

and is computed by

      CALL KLSMR2(X,Y,PX,PY,WORK,IWORK,N,M,NPM,SIG,D,D1,CRIT,IER)

where

X = a REAL*8 array, dimensioned N, containing the x-sample
Y = a REAL*8 array, dimensioned M, containing the y-sample
PX = a REAL*8 array (work space), dimensioned N
PY = a REAL*8 array (work space), dimensioned M
WORK = a REAL*8 array (work space), dimensioned NPM x 3
IWORK = an INTEGER array (work space), dimensioned NPM
N = number of x-data
M = number of y-data
NPM = N+M, set in the calling routine
SIG = the significance level requested (e.g., 0.05)
D = the largest absolute difference between F(x) and G(x)
D1 = the Kolmogorov-Smirnov statistic (DSQRT(N*M/(N+M))*D)
CRIT = the critical value for D1
IER = error return (0 for normal return, -1 if SIG.LT.0)

SECTION 15.12 WILCOXON-MANN-WHITNEY TEST

To test the hypothesis that two independent populations have been drawn from the same distribution. See S. Siegel and N. J. Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw Hill, 1988, 128-144. To call this procedure use

      CALL WMW(A,B,C,D,N1,N2,M,RT,SUM,TX,XM,S,Z)

where

Input variables:

A = REAL*4 array of dimension N1 containing sample 1
B = REAL*4 array of dimension N2 containing sample 2
C = REAL*4 array of dimension M (work space)
D = REAL*4 array of dimension M (work space)
N1 = INTEGER*4 variable---sample size of sample 1
N2 = INTEGER*4 variable---sample size of sample 2
M = INTEGER*4 variable to be set by user equal to N1+N2
RT = LOGICAL*1 variable to be set by user; = .TRUE. if we are to find probabilities in the left tail = .FALSE. if we are to find right tail probabilities

Output variables:

SUM = sum of the ranks for sample 1 (REAL*4)
TX = correction for ties (REAL*4)
XM = the expected value of SUM (REAL*4)
S = the standard deviation (REAL*4)
Z = the normal test statistic

SECTION 15.13 CONTINGENCY TABLE CHI SQUARE

This procedure computes the Chi square statistic when the data are frequencies in discrete categories in order to determine the significance of differences between two independent groups. See Siegel and Castellan, Nonparametric Statistics for the Behavioral Sciences, 1988, pp. 111-124. To call this procedure use

      CALL CONTNG(A,B,TR,TC,M,N,CHISQ,P,IER)

where

Input variables:

A = an INTEGER*4 array with m rowns and n columns containing the frequencies of occurrences in the (ij)th cell M = m, the number of rows in A
N = n, the number of columns in A

Output variables

B = a REAL*4 array with M rows and N columns of expected frequencies
TR = an INTEGER*4 array of row-sums of A
TC = an INTEGER*4 array of column-sums of A
CHISQ= a REAL*4 variable containing the computed Chi square value
P = a REAL*4 variable containing the probability of a Chi square value higher than the observed one
IER = an error return, normally 0
IER = -100 if any expected frequency is less than 1
IER = -101 if more than 20 percent of expected frequencies are less than 5
IER = -102 if it is a 2x2 table with total frequency less than 21
IER = -103 if it is a 2x2 table, total frequency is between 20 and 21 and any expected frequency is less than 5.

 

SECTION 15.14 FRIEDMAN ANALYSIS OF VARIANCE TEST (FRIED)

See S. Siegel and J. Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw Hill, 1988, pp. 174-180.

The test tests the null hypothesis that the sums of the ranks assigned by m judges to n objects, summed for each object, are the same. The test is carried out by

      CALL FRIED(RK,RKS,TEMP,N,M,CHISQ,PROB) 

where

RK = a REAL*4 array dimensioned RK(M,N) containing the ranks assigned by the M judges to the N objects
RKS= a REAL*4 array dimensioned RK(N), which will contain the rank sums
TEMP= a REAL*4 array dimensioned TEMP(N) (scratch array)
N = the number of objects ranked
M = the number of judges doing the ranking
CHISQ = the Chi Squared value produced by the test
PROB = the probability that under the null hypothesis of identical ranks a chi squared value of the magnitude obtained or higher could be obtained.

Tied ranks are permitted.

Return to

|Beginning|SHELL|