CHISQUARE observedValuesVector expectedValuesVector resultVariable

Replaces the contents of the result variable with the Chi-square statistic determined by the first two vectors. The Chi-square statistic is the sum of N terms, where N is the size of each of the two input vectors. Each term contributing to the sum is the square of the difference between one element from the observed values vector and the same element of the expected values vector, divided by the expected value.

If the input vectors are of different lengths, the shorter vector will be "extended" to the length of the longer by repeating its last element as many times as necessary. This extension is only done internally and does not change the actual size or content of the shorter vector.

If any element is a "missing" value (represented by a "." or "NaN") the result for CHISQUARE will be NaN. To avoid this, you can CLEAN the input vector(s) before applying the CHISQUARE command.

COPY (30 29 16 12 33  5) observedData
COPY (25.2 37.2 12.6 16.8 24.8 8.4) expectedValues
CHISQUARE observedData expectedValues chiSquare
PRINT chiSquare

The above program produces the following output:

chiSquare: 9.098182283666157

The next program is a realistic example that shows the computation of the expected values using a subroutine, generates the appropriate chi-square distribution by simulation, and then determines the acceptance or rejection of the null hypothesis.

'From CliffsQuickReview Statistics, Page 110:
'Suppose 125 children are shown three TV commercials
'A, B, and C, for breakfast cereal and are asked to
'pick which they liked best. The results are:
'         A        B       C     Totals
'Boys    30       29       16      75
'Girls   12       33        5      50
'Totals  42       62       21     125
'Is the choice of favorite commercial related to
'whether the child is a boy or a girl?
'Null hypothesis: the commercial choice is not
'related to the sex of the child. This can be
'restated as: How often (or what is the
'probability that) the contents of the six inner
'cells would be as far or farther than they
'currently are from their expected values?
'Compare results for alpha = 0.05 vs. alpha =0.01.
COPY (30 29 16) boysData
COPY (12 33  5) girlsData
COPY boysData girlsData observedData

'Subroutine to compute the expected values given
'the rows of a two-row data table.
NEWCMD EXPECTED_VALUES row1 row2 expectedVals
   SUM row1 row1Sum
   SUM row2 row2Sum
   ADD row1 row2 columnSums
   SUM columnSums grandTotal
   MULTIPLY row1Sum columnSums row1Products
   MULTIPLY row2Sum columnSums row2Products
   DIVIDE row1Products grandTotal row1ExpectedVals
   DIVIDE row2Products grandTotal row2ExpectedVals
   COPY row1ExpectedVals row2ExpectedVals expectedVals

EXPECTED_VALUES boysData girlsData expectedValues
PRINT expectedValues

'Now, compute and print the Chi-square value
'for the given table data.
CHISQUARE observedData expectedValues chiSquare
PRINT chiSquare

'Construct a "universe" of ads in the same proportions
'as in the original data table:
NAME (1 2 3) adA adB adC
COPY 42#adA 62#adB 21#adC ads

'Compute and record (SCORE) chi-square values for
'many simulated table cell entries derived from 
'the ad universe.
COPY 5000 rptcount
REPEAT rptcount
   SHUFFLE ads ads
   TAKE ads  1,75  boys
   TAKE ads 76,125 girls
   COUNT boys = adA boysAdA
   COUNT boys = adB boysAdB
   COUNT boys = adC boysAdC
   COUNT girls = adA girlsAdA
   COUNT girls = adB girlsAdB
   COUNT girls = adC girlsAdC
   'Rebuild a new table with the simulated data
   COPY boysAdA boysAdB boysAdC girlsAdA girlsAdB girlsAdC observedData$
   CHISQUARE observedData$ expectedValues chiSquare$
   SCORE chiSquare$ chiSquareScores

'Just for information purposes, print out the 
'chi-square distribution we have generated:
HISTOGRAM chiSquareScores

COUNT chiSquarescores >= chiSquare chiCount
DIVIDE chiCount rptCount significanceLevel
PRINT significanceLevel

OUTPUT "Conclusions:\n"
OUTPUT "The null hypothesis is "
IF significanceLevel >= 0.05
OUTPUT "accepted at the 0.05 significance level.\n"

OUTPUT "The null hypothesis is "
IF significanceLevel >= 0.01
OUTPUT "accepted at the 0.01 significance level.\n"