SECTION 15

STATISTICAL ROUTINES SECTION

15.1 CUMULATIVE SAMPLE DISTRIBUTION (CUMDIS)

In order to obtain the sample cumulative distribution of a set of numbers in the array A, issue the command

CALL CUMDIS(A,P,N)

where

A = an array dimensioned at least N (double precision)

P = an array dimensioned at least N (double precision)

N = an integer giving the number of data points in A

Upon returning from CUMDIS, the data in A will have been sorted in ascending order. The array P will contain the relative frequencies; i.e., P(I) is the relative frequency of observations with values less than or equal to P(I). Error messages: TOO FEW DATA indicates that N was less than or equal to 1. The message is written only if IPT or JPT are nonzero (see Section 1.1).

SECTION 15.2 SORTING (SORT1, SORT2, SORT3, SORT4)

To sort data stored in an array A, dimensioned at least N, issue the command

CALL SORT1(A,N)

where

A = contains the data to be sorted (REAL*8)

N = number of observations to be sorted

Upon returning from SORT1, A will contain the data sorted into ascending order.

If the array A is single precision (REAL*4), use the subroutine SORT1A instead.

To sort a REAL*4 array A into ascending order and to perform the same permutations on a REAL*4 array B, issue the command

CALL SORT2(A,B,N)

where

A = contains the data to be sorted (REAL*4)

B = data to be permuted parallel with A (REAL*4)

N = number of observations to be sorted.

Thus, if A contains 3.,4.,2.,1., and B contains 4.,3.,2.,1., after invoking SORT2 A will contain 1.,2.,3.,4., and B will contain 1.,2.,4.,3.

To sort a REAL*8 array A into ascending order and to perform the same permutations on a INTEGER*4 array IP, issue the command

CALL SORT3(A,IP,N)

where

A = contains the data to be sorted (REAL*8)

IP = data to be permuted parallel with A (INTEGER*4)

N = number of observations to be sorted.

If the array A is single precision (REAL*4), use the subroutine SORT3A instead.

SORT4(A,IP,N) is identical with SORT3, but sorts A into descending order.

SECTION 15.3 SHAPIRO-WILK STATISTIC (SHPWLK)

In order to test the null hypothesis that a sample of n
observations (2 .LE. n .LE. 50) comes from a normal distribution,
using the Shapiro-Wilk statistic (see Shapiro, S. S. and M. B.
Wilk, "An Analysis of Variance Test for Normality (Complete
Samples)," *Biometrika*, 52 (1965), 591-611).

CALL SHPWLK(X,N,SW,VAR)

where

X is an array containing N observations,

SW is the Shapiro-Wilk statistic and

VAR is the sample variance.

Before issuing the call above, you should, as usual, CALL DFLT in order to initialize IPT and JPT for controlling the print options. If these are not turned off, the significance levels for SW are also given. All variables and arrays are double precision.

SECTION 15.4 CHI SQUARED DISTRIBUTION (CHISPR)

To compute the probability that a random variable is Chi-squared distributed with DF degrees of freedom, use

CALL CHISPR(X,DF,P,DENS,IER)

where all floating point quantities are single precision and where

X = Value of the Chi square statistic (input)

DF = Degrees of freedom (input)

P = Probability that a Chi squared variate is less than or equal
to X (output)

DENS = The density of the Chi square distribution at X (output)

IER = Error return with 0 = normal termination, -1= invalid
input, invalid output (P.LT.0 or P.GT.1 or T1 has failed to
converge). P in these cases is set to + or - 3.E38.(output)

SECTION 15.5 SMIRNOV LIMIT PROBABILITY (SMIRNV)

See Kendall and Stuart, *The Advanced Theory of Statistics*,
Vol.2 or Siegel, S. and N. J. Castellan, *Nonparametric
Statistics for the Behavioral Sciences*, McGraw-Hill, 1988,
pp. 51-55. To perform the Kolmogorov-Smirnov one-sample test
comparing a sample of data with a hypothesized distribution,
compute

D = max |F(X) - S(X)|

where X ranges over the sample points, F(X) is the value of the hypothesized cumulative distribution and S(X) is the corresponding value of the sample cumulative distribution. Then Y=DSQRT(SNGL(N))*D CALL SMIRNV(Y,P) where N = sample size P = the (output) asymptotic probability that the discrepancy between the two cumulative distributions is greater than the observed one. Y and P are single precision numbers.

See Siegel, S. and N. J. Castellan, *Nonparametric
Statistics for the Behavioral Sciences*, McGraw-Hill, 1988,
pp.235-244 or Kendall, M. G., *Rank Correlation Methods*,
Charles Griffin, 1962. This routine computes the Spearman rank
correlation coefficient by the following call

CALL SPEAR1(X,Y,N,RHO)

where

X = array containing the ranks of the x-series (REAL*4)

Y = array containing the ranks of the Y-series (REAL*4)

N = the number of observations in X and Y

RHO = the rank correlation coefficient (REAL*4)

Printing of error messages is controlled by the COMMON/BPRINT/
block.

If some observations are tied, the ranks assigned should be the average ranks applicable to those observations (e.g., if first two observations in the original series of x's have the same value, the corresponding entries in the X-array should be 1.5, 1.5). The rank correlation coefficient computation makes the apporpriate adjustment for tied observations.

See references under 15.6. Kendall's Tau is computed by the call

CALL TAUK(X,Y,N,TAU,Z)

where

X = array containing the ranks on the x-series (REAL*4)

Y = array containing the ranks on the y-series (REAL*4)

N = number of observations in X and Y

TAU= the computed Tau coefficient (REAL*4)

Z = the asymptotic normal test statistic for testing the
significance of TAU (REAL*4)

Printing of error messages is controlled by the COMMON/BPRINT/
block.

Tied observations are to be handled as in 15.6.

SECTION 15.8 KENDALL'S PARTIAL TAU

For references see 15.6. Kendall's partial tau coefficient is computed by

CALL TAU3(X,Y,Z,N,TAU)

where

X = array of ranks on the x-series (REAL*4)

Y = array of ranks on the y-series (REAL*4)

Z = array of ranks on the z-series (REAL*4)

N = the number of observations in X, Y, and Z Must be .LE. 1000

TAU = the partial tau coefficient tauxy.z Printing of error
messages is controlled by the COMMON/BPRINT/ block.

Tied observations are handled as in 15.6.

SECTION 15.9 KENDALL'S W COEFFICIENT OF CONCORDANCE.

See Siegel, S. and N. J. Castellan, *Nonparametric
Statistics for the Behavioral Sciences*, McGraw-Hill, 1988,
pp.262-271. If M judges rank N items, W measures the degree of
agreement among them. It is computed by the following:

CALL KENCORD(RK,M,N,S,W,CHISQ)

where

RK = M by N array of ranks (REAL*4)

M = number of judges (rows in RK)

N = number of items (columns in RK, must be less than 101)

S = sum of squared deviations of column sums from their mean
(needed for small sample test) (REAL*4)

W = Kendall's W coefficient of concordance (REAL*4)

CHISQ = Chi squared statistic for testing null hypothesis of no
agreement (N-1 d.f.) (REAL*4)

SECTION 15.10 KOLMOGOROV-SMIRNOV 1-SAMPLE TEST

See Morris H. DeGroot, *Probability and Statistics*,
1989. Given a sample of n data x, with cumulative sample
distribution F(x), and a hypothesized distribution function G(x),
the Kolmogorov- Smirnov statistic tests the null hypothesis that
the x-data were generated by G(x). The test is based on the
statistic sup|F(x)-G(x)| and is computed by the following:

CALL KLSMR1(X,B,P,N,SIG,D,D1,CRIT,DIST,IER)

where

X = a REAL*8 array, dimensioned N, containing the data

B = a REAL*8 array, dimensioned N, which will contain the
hypothesized theoretical probabilities

P = a REAL*8 array, dimensioned N, which will contain the
cumulative sample distribution of x

N = the number of data points

SIG = the significance level requested (e.g. 0.05)

D = the largest absolute difference between F(x) and G(x)

D1 = the Kolmogorov-Smirnov Statistic (DSQRT(N)*D)

CRIT = the critical value of D1

DIST = the name of a REAL*8 function which computes the
hypothesized cumulative probabilities for each element of x. DIST
must be declared in an EXTERNAL statement in the calling program

IER = error return (0 for normal return, -1 if SIG.LT.0)

SECTION 15.11 KOLMOGOROV-SMIRNOV 2-SAMPLE TEST

See Morris H. DeGroot, *Probability
and Statistics*, 1989. Given two samples, a sample of
n x's and m y's, the test tests the null hypothesis that the
samples have been drawn from the same distribution. Letting F(x)
and G(y) be the two sample distribution functions, the test is
based on

sup|F(x)-G(y)|

and is computed by

CALL KLSMR2(X,Y,PX,PY,WORK,IWORK,N,M,NPM,SIG,D,D1,CRIT,IER)

where

X = a REAL*8 array, dimensioned N, containing the x-sample

Y = a REAL*8 array, dimensioned M, containing the y-sample

PX = a REAL*8 array (work space), dimensioned N

PY = a REAL*8 array (work space), dimensioned M

WORK = a REAL*8 array (work space), dimensioned NPM x 3

IWORK = an INTEGER array (work space), dimensioned NPM

N = number of x-data

M = number of y-data

NPM = N+M, set in the calling routine

SIG = the significance level requested (e.g., 0.05)

D = the largest absolute difference between F(x) and G(x)

D1 = the Kolmogorov-Smirnov statistic (DSQRT(N*M/(N+M))*D)

CRIT = the critical value for D1

IER = error return (0 for normal return, -1 if SIG.LT.0)

SECTION 15.12 WILCOXON-MANN-WHITNEY TEST

To test the hypothesis that two independent populations have
been drawn from the same distribution. See S. Siegel and N. J.
Castellan, *Nonparametric Statistics for the Behavioral
Sciences*, McGraw Hill, 1988, 128-144. To call this procedure
use

CALL WMW(A,B,C,D,N1,N2,M,RT,SUM,TX,XM,S,Z)

where

Input variables:

A = REAL*4 array of dimension N1 containing sample 1

B = REAL*4 array of dimension N2 containing sample 2

C = REAL*4 array of dimension M (work space)

D = REAL*4 array of dimension M (work space)

N1 = INTEGER*4 variable---sample size of sample 1

N2 = INTEGER*4 variable---sample size of sample 2

M = INTEGER*4 variable to be set by user equal to N1+N2

RT = LOGICAL*1 variable to be set by user; = .TRUE. if we are to
find probabilities in the left tail = .FALSE. if we are to find
right tail probabilities

Output variables:

SUM = sum of the ranks for sample 1 (REAL*4)

TX = correction for ties (REAL*4)

XM = the expected value of SUM (REAL*4)

S = the standard deviation (REAL*4)

Z = the normal test statistic

SECTION 15.13 CONTINGENCY TABLE CHI SQUARE

This procedure computes the Chi square statistic when the data
are frequencies in discrete categories in order to determine the
significance of differences between two independent groups. See
Siegel and Castellan, *Nonparametric Statistics for the
Behavioral Sciences*, 1988, pp. 111-124. To call this
procedure use

CALL CONTNG(A,B,TR,TC,M,N,CHISQ,P,IER)

where

Input variables:

A = an INTEGER*4 array with m rowns and n columns containing
the frequencies of occurrences in the (ij)th cell M = m, the
number of rows in A

N = n, the number of columns in A

Output variables

B = a REAL*4 array with M rows and N columns of expected frequencies

TR = an INTEGER*4 array of row-sums of A

TC = an INTEGER*4 array of column-sums of A

CHISQ= a REAL*4 variable containing the computed Chi square value

P = a REAL*4 variable containing the probability of a Chi square
value higher than the observed one

IER = an error return, normally 0

IER = -100 if any expected frequency is less than 1

IER = -101 if more than 20 percent of expected frequencies are
less than 5

IER = -102 if it is a 2x2 table with total frequency less than 21

IER = -103 if it is a 2x2 table, total frequency is between 20
and 21 and any expected frequency is less than 5.

SECTION 15.14 FRIEDMAN ANALYSIS OF VARIANCE TEST (FRIED)

See S. Siegel and J. Castellan, *Nonparametric Statistics
for the Behavioral Sciences*, McGraw Hill, 1988, pp. 174-180.

The test tests the null hypothesis that the sums of the ranks assigned by m judges to n objects, summed for each object, are the same. The test is carried out by

CALL FRIED(RK,RKS,TEMP,N,M,CHISQ,PROB)

where

RK = a REAL*4 array dimensioned RK(M,N) containing the ranks
assigned by the M judges to the N objects

RKS= a REAL*4 array
dimensioned RK(N), which will contain the rank sums

TEMP= a REAL*4 array dimensioned TEMP(N) (scratch array)

N = the number of objects ranked

M = the number of judges doing the ranking

CHISQ = the Chi Squared value produced by the test

PROB = the probability that under the null hypothesis of
identical ranks a chi squared value of the magnitude obtained or
higher could be obtained.

Tied ranks are permitted.

Return to