These examples are taken from the book CliffsNotes Statistics Quick Review (Cliffsquickreview) .
That book (shown in the right-hand column of this page unless your
browser is blocking ads) is an overview of a standard introductory
Statistics class with typical examples. The book solves all its
examples with standard statistical formulas and tables. I have taken
each of the book's worked-out examples and shown here how to solve them
using Resampling Stats without any formulas. Most of the problems end
up having quite simple Resampling Stats solutions.
You can cut examples out, paste them in Statistics101's editor window and run them.
'From CliffsQuickReview Statistics, p. 38, example 1
'What is the probability of simultaneously
'flipping 3 coins and having them all land heads?
COPY (0 1) coin
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 3 coin flip
COUNT flip =1 heads
SCORE heads result
END
COUNT result =3 successes
DIVIDE successes rptCount probability
PRINT probability
'From CliffsQuickReview Statistics, p. 39, example 2
'What is the probability of randomly drawing an ace
'from a deck of cards (without replacement) and then
'drawing an ace again from the same deck on the next
'draw? Calculated answer= 1/(52*51) = 0.000377.
COPY 1,13 1,13 1,13 1,13 deck
COPY 100000 rptCount
REPEAT rptCount
SHUFFLE deck deck
TAKE deck 1 card1
IF card1 =1
TAKE deck 2 card2
IF card2 =1
SCORE 1 successes
END
END
END
COUNT successes =1 successCount
DIVIDE successCount rptCount probability
PRINT probability
'From CliffsQuickReview Statistics, p. 41, example 3
'What is the probability of at least one spade or one
'club being randomly chosen in one draw from a deck
'of cards? Calculated result: 13/52 + 13/52 = 0.5.
COPY 13#1 13#2 13#3 13#4 deck
'(1=spade, 2=club, 3=heart, 4=diamond)
COPY 1000 rptCount
REPEAT rptCount
SAMPLE 1 deck suit
IF suit =1
SCORE 1 successes
END
IF suit =2
SCORE 1 successes
END
END
COUNT successes =1 successCount
DIVIDE successCount rptCount probability
PRINT probability
'From CliffsQuickReview Statistics, p. 42, example 4
'What is the probability of at least one head in
'two coin flips? Calculated result: 0.75
COPY 0,1 coin
COPY 1000 rptCount
REPEAT rptCount
SAMPLE 2 coin flips
SUM flips heads
IF heads >=1
SCORE 1 successes
END
END
COUNT successes =1 successCount
DIVIDE successCount rptCount probability
PRINT probability
5 Back to top Drawing either a spade or an ace from a deck of cards
'From CliffsQuickReview Statistics, p. 43, example 5
'What is the probability of drawing either a spade
'or an ace from a deck of cards?
'Calculated result: 16/52 = 0.308
'Simple method:
'4 aces + 13 spades - 1 ace of spaces = 16
COPY 16#1 36#2 deck
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 1 deck card
SCORE card result
END
COUNT result =1 successes
DIVIDE successes rptCount probability
PRINT probability
' More general alternative way
COPY 1,13 value
COPY 1,4 suit
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 1 value cardValue
IF cardValue = 1
SCORE 1 successes
END
IF cardValue <> 1
SAMPLE 1 suit cardSuit
IF cardSuit = 1
SCORE 1 successes
END
END
END
COUNT successes =1 successCount
DIVIDE successCount rptCount probability
PRINT probability
'From CliffsQuickReview Statistics, p. 47, example 6
'If you flip a coin 10 times what is the
'probability of getting exactly 5 heads?
'Calculated result using binomial formula: 0.246
COPY 0,1 coin 'heads = 1
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 10 coin flips
SUM flips heads 'count heads
IF heads = 5
SCORE 1 result
END
END
COUNT result =1 successes
DIVIDE successes rptCount probability
PRINT probability
7 Back to top Mean and standard deviation for 10 flips of a fair coin
'From CliffsQuickReview Statistics, p. 47, example 7
'What is the mean and standard deviation for a
'binomial probability distribution for 10 flips
'of a fair coin?
'Calculated result using binomial formula:
'mean = 5, standard deviation = 1.58
COPY 0,1 coin 'heads = 1
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 10 coin flips
SUM flips heads 'count heads
SCORE heads result 'save the count in a list
END
MEAN result mean
STDEV result stdDev
PRINT mean stdDev
'from CliffsQuickReview Statistics p. 54 Example 1
'If the population mean of number of fish caught
'per trip to a particular fishing hole is 3.2
'and the population standard deviation is 1.8,
'what are the population mean and the standard
'error of the mean of 40 trips?
'NOTE: you can plug in different numbers for
'popStdDev, popMean, and sampleSize to compute
'any standard error of the mean.
COPY 1.8 popStdDev
COPY 3.2 popMean
COPY 40 sampleSize
REPEAT 1000
NORMAL sampleSize popMean popStdDev sample
MEAN sample sampleMean
SCORE sampleMean means
END
MEAN means popMean
STDEV means stdError
PRINT popMean stdError
' From CliffsQuickReview Statistics, p. 56, example 2
' A normal distribution of retail store purchases has
' a mean of $14.31 and a standard deviation of 6.40.
' What percentage of purchases were under $10?
COPY 100000 size
NORMAL size 14.31 6.4 population
COUNT population <=10.0 purchasesBelowTen
DIVIDE purchasesBelowTen size percentageBelowTen
PRINT percentageBelowTen
'From CliffsQuickReview Statistics, p. 58, example 3
'A normal distribution of retail store purchases
'has a mean of $14.31 and a standard deviation of
'6.40. What purchase amount marks the lower 10%
'of the distribution?
COPY 100000 size
NORMAL size 14.31 6.4 population
PERCENTILE population (10) pcval
PRINT pcval
'From CliffsQuickReview Statistics, p. 60, example 4
'Assuming an equal chance of a new baby being a
'boy or a girl (that is pi=0.5), what is the
'likelihood that 60 or more out of the next 100
'births at a local hospital will be boys?
'The answer computed from the cumulative
'binomial distribution is 0.02844. The book's answer,
'0.0228, is based on the normal approximation to the
'binomial, and is therefore somewhat in error.
COPY (0 1) birth '0 = girl, 1 = boy
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 100 birth births
COUNT births =1 boys
SCORE boys results
END
COUNT results >=60 successes
DIVIDE successes rptCount probability
PRINT probability
'From Cliffs QuickReview: Statistics pg 71
'avg wt of 10 player sample is 198 lbs
'population std dev is 11.5 lbs.
'What is the 90% confidence interval for the
'population weight if you assume the player's
'weights are normally distributed?
REPEAT 1000
NORMAL 10 198 11.5 weights
MEAN weights avg
SCORE avg averages
END
PRINT averages
'histogram averages
PERCENTILE averages (5 95) confidenceInterval
PRINT confidenceInterval
'From Cliffs QuickReview: Statistics pg 75
'avg age of 50 viewer sample is 19 yrs
'population std dev is 1.7 yrs.
'What is the 90% confidence interval for the
'viewer age if you assume the player's ages
'are normally distributed
REPEAT 1000
NORMAL 50 19 1.7 ages
MEAN ages avg
SCORE avg averages
END
PRINT averages
histogram averages
PERCENTILE averages (5 95) confidenceInterval
PRINT confidenceInterval
'From: CliffsQuickReview Statistics, p 77, Example 1.
'A herd of 1500 steers was fed a special high-protein
'diet for a month. A random sample of 29 were
'weighed and had gained an average of 6.7 pounds.
'If the standard deviation of weight gain for the
'entire herd is 7.1, what is the likelihood that the
'average weight gain per steer for the
'month was at least 5 pounds?
'Null hypothesis: avg gain was < 5.
'Reject null hypothesis if probability < 0.05.
COPY 10000 numTrials
REPEAT numtrials
NORMAL 29 6.7 7.1 sample
MEAN sample avgGain
IF avgGain < 5
SCORE 1 successes 'score gains < 5 for null hypothesis
END
END
COUNT successes = 1 successCount
DIVIDE successCount numTrials probability
PRINT probability
IF probability < 0.05
OUTPUT "Null hypothesis is rejected.\n"
END
IF probability >= 0.05
OUTPUT "Null hypothesis is NOT rejected.\n"
END
'From: CliffsQuickReview Statistics, p 77, Example 2.
'In national use, a vocabulary test is known to
'have a mean score of 68 and a standard deviation
'of 13. A class of 19 students takes the test and
'has a mean score of 65. Is the class typical of
'others who have taken the test?
'Assume a significance level of p<0.05.
'Null hypothesis: avg gain was < 5.
'Reject null hypothesis if probability < 0.05.
REPEAT 1000
NORMAL 19 68 13 sample
MEAN sample sampleMean
SCORE sampleMean means
END
'This is a two tail problem, so divide the 0.05 in half
'to set the lower and upper limits.
PERCENTILE means (2.5 97.5) limits 'Confidence interval
PRINT limits
TAKE limits 1 lowLimit
TAKE limits 2 highLimit
'Output the conclusion:
IF 65 between lowLimit highLimit
OUTPUT "Null hypothesis can NOT be rejected.\n"
END
'From: CliffsQuickReview Statistics, p 78.
'A sample of 12 machine pins has a mean diameter
'of 1.15 inches, and the population standard
'deviation is known to be 0.04. What is a 99
'percent confidence interval of diameter width
'for the population?
'Note that the 99 percent interval is from 0.5% to 99.5%.
COPY 1000 numTrials
REPEAT numTrials
NORMAL 12 1.15 0.04 sample
MEAN sample mean
SCORE mean means
END
PERCENTILE means (0.5 99.5) confidenceInterval
PRINT confidenceInterval
17 Back to top Hypothesis test (SD unknown. t distribution one tail)
'From cliffsQuickReview Statistics p. 80, example 5
'A professor wants to know if her introductory
'statistics class has a good grasp of basic math.
'Six students are chosen at random from the class
'and given a math proficiency test. The professor
'wants the class to be able to score at least 70
'on the test. The six students get scores of
'62 92 75 68 83 95. Can the professor be at least
'90 percent certain that the mean score for the class
'on the test would be at least 70?
'Null hypothesis: mean score < 70.
COPY (62 92 75 68 83 95) scores
MEAN scores actualScoresMean 'Computed for reference only
STDEV scores actualScoresStdDev 'Computed for ref. only
COPY 1000 numTrials
REPEAT numTrials
SAMPLE 6 scores sample
MEAN sample sampleMean
IF sampleMean < 70
SCORE 1 successes
END
END
COUNT successes = 1 result
DIVIDE result numTrials probability
PRINT actualScoresMean actualScoresStdDev probability
18 Back to top Hypothesis test (SD unknown. t distribution two tail)
'From CliffsQuickReview Statistics, Example 6, Page 81:
'A Little League baseball coach wants to know if
'his team is representative of other teams in scoring
'runs. Nationally, the average number of runs scored
'by a Little League team in a game is 5.7. He
'chooses five games at random in which his team
'scored 5, 9, 4, 11, and 8 runs. Is it likely that
'his team's scores could have come from the
'national distribution?
'Assume an alpha level of 0.05.
'Null hypothesis: Team's mean equals the national
'mean (5.7).
COPY (5 9 4 11 8) gameScores
MEAN gameScores mean
PRINT mean
COPY 1000 numTrials
REPEAT numTrials
SAMPLE 5 gameScores newSample
MEAN newSample newSampleMean
SCORE newSampleMean means
END
'This is a two-tail problem, so the 0.05, or
'5 percent should be split between the high
'and low end of the range.
PERCENTILE means (2.5 97.5) meansRange
PRINT meansRange
'Print conclusion:
TAKE meansRange 1 lowLim
TAKE meansRange 2 highLim
IF 5.7 between lowLim highLim
OUTPUT "Null hypothesis can not be rejected\n"
END
19 Back to top Confidence interval for population mean using t
'From CliffsQuickReview Statistics, Example 7, Page 82:
'Using the Little League baseball data from the previous
'example, what is a 95 percent confidence interval for
'runs scored per team per game?
'Repeating the previous examples info: Nationally,
'the average number of runs scored by a Little League team in a
'game is 5.7. He chooses five games at random in which his team
'scored 5, 9, 4, 11, and 8 runs.
'ANS: In resampling terms, this is really the same problem as
'the previous one. The only difference is that here we're not
'deciding whether to reject a Null hypothesis.
COPY (5 9 4 11 8) gameScores
MEAN gameScores mean
PRINT mean
COPY 1000 numTrials
REPEAT numTrials
SAMPLE 5 gameScores newSample
MEAN newSample newSampleMean
SCORE newSampleMean means
END
PERCENTILE means (2.5 97.5) confidenceInterval
PRINT confidenceInterval
20 Back to top Two-sample z-test for comparing two means
'From CliffsQuickReview Statistics, Example 8, Page 83:
'The amount of a certain trace element in blood is
'known to vary with a standard deviation of 14.1ppm
'(parts per million) for male blood donors and 9.5 ppm
'for female donors. Random samples of 75 male and 50
'female donors yield concentration means of 28 and
'33 ppm, respectively. What is the likelihood that the
'population means of concentrations of the element are
'the same for men and women?
'Null hypothesis: the means are the same (their difference
'is zero).
'Alternate hypothesis: the means are different.
'SOLUTION: create male and female samples with the given
'sample sizes and standard deviations, but with the same
'means. For the common mean you can use either the male
'mean (28), the female mean (33), or the mean of those
'two means (30.5). In this program the significanceLevel,
'the commonMean, and the rptCount have been made variables
'at the top of the program so you can easily change them.
COPY 0.05 significanceLevel
COPY 28 commonMean 'assume same mean for both
COPY 1000 rptCount
REPEAT rptCount
NORMAL 75 commonMean 14.1 maleSample
NORMAL 50 commonMean 9.5 femaleSample 'assume same mean for both
MEAN maleSample maleSampleMean
MEAN femaleSample femaleSampleMean
SUBTRACT maleSampleMean femaleSampleMean difference
SCORE difference differences
END
ABS differences differences 'make differences positive
PRINT differences
COUNT differences >5 outliers
DIVIDE outliers rptCount probability
PRINT probability
'Print out the conclusion:
IF probability < significanceLevel
OUTPUT "Null hypothesis is rejected at a significance level of %10.4F.\n" significanceLevel
END
IF probability >= significanceLevel
OUTPUT "Null hypothesis is NOT rejected at a significance level of %10.4F.\n" significanceLevel
END
21 Back to top Two-sample t-test for comparing two means (hypothesis test)
'From CliffsQuickReview Statistics, Example 9, Page 84:
'An experiment is conducted to determine whether
'intensive tutoring is more effective than paced tutoring.
'Two randomly chosen groups are tutored separately and
'then administered proficiency tests. Use a significance
'level of alpha < 0.05.
'DATA:
'Group Method n sampleMean sampleStdDev
' 1 intensive 12
46.31
6.44
' 2 paced
10 42.79
7.52
'
'Null hypothesis: mean of intensive tutoring is <= that
'of paced tutoring.
'
'SOLUTION: In the problem, the authors give the summary
'statistics. These statistics came from sampled data.
'It would be better, using the Resampling method, to work
'with the actual data rather than the summary statistics.
'But since that data is unavailable, we'll use the summary
'statistics to generate our own samples.
COPY 1000 rptCount
REPEAT rptCount
NORMAL 12 46.31 6.44 intensiveSample
NORMAL 10 42.79 7.52 pacedSample
MEAN intensiveSample intensiveMean
MEAN pacedSample pacedMean
IF intensiveMean <= pacedMean
SCORE 1 successes
END
END
COUNT successes = 1 successCount
DIVIDE successCount rptCount probability
PRINT probability
IF probability >= 0.05
OUTPUT "Null hypothesis is accepted.\n"
END
IF probability < 0.05
OUTPUT "Null hypothesis is rejected.\n"
END
22 Back to top Two-sample t-test for comparing two means (confidence interval)
'From CliffsQuickReview Statistics, Example 10, Page 85:
'Estimate a 90 percent confidence interval for the difference
'between the number of raisins per box in two brands of
'breakfast cereal.
'
'DATA:
'Brand sampleSize sampleMean sampleStdDev
' A 6
102.1
12.3
' B 9
93.6
7.52
'
'SOLUTION: In the problem, the authors give the summary
'statistics. These statistics came from sampled data.
'It would be better, using the Resampling method, to work
'with the actual data rather than the summary statistics.
'But since that data is unavailable, we'll use the summary
'statistics to generate our own samples.
COPY 1000 rptCount
REPEAT rptCount
NORMAL 6 102.1 12.3 brandASample
NORMAL 9 93.6 7.52 brandBSample
MEAN brandASample brandAMean
MEAN brandBSample brandBMean
SUBTRACT brandAMean brandBMean diff
SCORE diff differences
END
PERCENTILE differences (5 95) confidenceInterval
PRINT confidenceInterval
'From CliffsQuickReview Statistics, Example 11, Page 87:
'Does right- or left-handedness affect how fast people type?
'Random samples of students from a typing clas are given
'a typing speed test (words per minute) and the results
'are compared. Significance level for the test: 0.10.
'Because you are looking for a difference between the
'groups in either direction, this is a two-tailed test.
'Null hypothesis: Means are equal.
'
'DATA:
'Group sampleSize sampleMean sampleStdDev
'right 16
55.8
5.7
'left 9
59.3
4.3
'
'SOLUTION: In the problem, the authors give the summary
'statistics. These statistics came from sampled data.
'It would be simpler, using the Resampling method, to work
'with the actual data rather than the summary statistics.
'But since that data is unavailable, we'll use the summary
'statistics to generate our own samples.
'Since the authors are using "variance pooling",
'which assumes that the (unknown) standard deviations are equal,
'we will simulate that process to choose a pooled standard
'deviation and while we are at it, a pooled mean.
'Compute the "pooled" statistics:
REPEAT 1000
NORMAL 16 55.8 5.7 rightSample
NORMAL 9 59.3 4.3 leftSample
COPY rightSample leftSample pooledSample
STDEV pooledSample pooledSampleStdDev
MEAN pooledSample pooledSampleMean
SCORE pooledSampleStdDev stdDevs
SCORE pooledSampleMean means
END
MEAN stdDevs pooledStdDev
MEAN means pooledMean
PRINT pooledMean pooledStdDev
COPY 1000 rptCount
REPEAT rptCount
NORMAL 16 pooledMean pooledStdDev rightSample
NORMAL 9 pooledMean pooledStdDev leftSample
MEAN rightSample rightMean
MEAN leftSample leftMean
SUBTRACT rightMean leftMean diff
SCORE diff differences
END
PERCENTILE differences (5 95) acceptanceRegion
PRINT acceptanceRegion
TAKE acceptanceRegion 1 lowLimit
TAKE acceptanceRegion 2 highLimit
OUTPUT "Conclusion: The Null hypothesis is "
'(3.5 is the difference between the original sample means)
IF 3.5 between lowLimit highLimit
OUTPUT "NOT "
END
OUTPUT "rejected.\n"
'From CliffsQuickReview Statistics, Example 12, Page 88:
'A farmer decides to try out a new fertilizer on a test plot
'containing 10 stalks of corn. Before applying the fertilizer,
'he measures the height of each stalk. Two weeks later, he
'measures the stalks again, being careful to match each
'stalk's new height to its previous one. The stalks would
'have grown an average of six inches during that time even
'without the fertilizer. Did the fertilizer help? Use a
'significance level of 0.05.
'Null hypothesis: Fertilizer had no effect, i.e., height
'change <= 6.
copy 0.05 significanceLevel
COPY 1000 rptCount
COPY (35.5 31.7 31.2 36.3 22.8 28.0 24.6 26.1 34.5 27.7) beforeHeights
COPY (45.3 36.0 38.6 44.7 31.4 33.5 28.8 35.8 42.9 35.0) afterHeights
SUBTRACT afterHeights beforeHeights changes
REPEAT rptCount
SAMPLE 10 changes bootstrapSample
MEAN bootstrapSample sampleMean
SCORE sampleMean means
END
COUNT means <=6 successes
DIVIDE successes rptCount probability
PRINT probability
OUTPUT "Conclusion: null hypothesis is "
IF probability <= significanceLevel
OUTPUT "NOT "
END
OUTPUT "accepted at the %10.4F significance level\n" significanceLevel
25 Back to top Test for a single population proportion (hypothesis test)
'From CliffsQuickReview Statistics, Example 13, Page 89:
'The sponsors of a city marathon have been trying to encourage
'more women to participate in the event. A sample of 70 runners
'is taken, of which 32 are women. The sponsors would like to
'be 90 percent certain that at least 40 percent of the participants
'are women. Were their recruitment efforts successful?
'Null hypothesis: sample proportion < 0.4
'Alternate hypothesis: sample proportion >= 0.4
COPY 1000 rptCount
COPY 0.1 significanceLevel ' 100% - 90% as a decimal fraction
COPY 38#0 32#1 runners '0=men, 1=women
REPEAT rptCount
SAMPLE 70 runners newSample
COUNT newSample =1 women
DIVIDE women 70.0 proportion
SCORE proportion results
END
COUNT results < 0.4 successes
DIVIDE successes rptCount probability
PRINT probability
OUTPUT "Conclusion: null hypothesis is "
IF probability < significanceLevel
OUTPUT "NOT "
END
OUTPUT "accepted at the %10.4F significance level.\n" significanceLevel
26 Back to top Test for a single population proportion (confidence interval)
'From CliffsQuickReview Statistics, Example 14, Page 90:
'A sample of 100 voters selected at random in a congressional district
'prefer Candidate Smith to Candidate Jones by a ratio of 3 to 2.
'What is a 95 percent confidence interval of the percentage of
'voters in the district who prefer Smith?
COPY 1000 rptCount
COPY 100 sampleSize
COPY 3#1 2#2 voters '1=Smith 2=Jones
REPEAT rptCount
SAMPLE sampleSize voters sample
COUNT sample =1 smithVoters
SCORE smithVoters results
END
DIVIDE results sampleSize results
PERCENTILE results (2.5 97.5) confidenceInterval
PRINT confidenceInterval
27 Back to top Choosing a sample size for a given confidence interval
'From CliffsQuickReview Statistics, Example 15, Page 91:
'How large a sample is needed to estimate the preference of
'voters for Candidate Smith with a margin of error of
'+ or - 4 percent at a 95 percent significance level?
'To be conservative, assume voters are split 50/50.
'This one requires a little trial and error on your part.
'You choose a sample size, run the program and see if you
'get a confidence interval of around (0.46 0.54). If not,
'choose another sample size and try again. After a few
'tries you'll settle on 600 as the right choice.
COPY 600 sampleSize
COPY 1000 rptCount
COPY (1 2) voters '1=Smith 2=Jones. Assume voters 50% split
REPEAT rptCount
SAMPLE sampleSize voters sample
COUNT sample =1 smithVoters
SCORE smithVoters results
END
DIVIDE results sampleSize results
PERCENTILE results (2.5 97.5) confidenceInterval
PRINT confidenceInterval
28 Back to top Comparing two proportions (hypothesis test)
'From CliffsQuickReview Statistics, Example 16, Page 92:
'A swimming school wants to determine whether a recently
'hired instructor is working out. Sixteen out of 25 of
'Instructor A's students passed the lifeguard certification
'test on the first try. In comparison, 57 out of 72 of more
'experienced Instructor B's students passed the test on the
'first try. Is Instructor A's success rate worse than
'Instructor B's? Use alpha = 0.10.
'Null hypothesis: A's rate is >= B's rate
'Alternate hypothesis: A's rate is < B's rate
'This is a one-tailed test.
COPY 1000 rptCount
COPY 0.10 significanceLevel
COPY 16#1 9#0 studentsOfA '1=passed, 0=failed
COPY 57#1 15#0 studentsOfB
REPEAT rptCount
SAMPLE 25 studentsOfA sampleA
SAMPLE 72 studentsOfB sampleB
COUNT sampleA =1 passedA
COUNT sampleB =1 passedB
DIVIDE passedA 25 passedARate
DIVIDE passedB 72 passedBRate
IF passedARate >= passedBRate
SCORE 1 successes
END
END
COUNT successes =1 successesA
DIVIDE successesA rptCount probability
PRINT probability
OUTPUT "Conclusion: null hypothesis is "
IF probability < significanceLevel
OUTPUT "NOT "
END
OUTPUT "accepted at a %10.4F significance level." significanceLevel
29 Back to top Comparing two proportions (confidence interval)
'From CliffsQuickReview Statistics, Example 17, Page 93:
'A public health researcher wants to know how two high
'schools, one in the inner city and one in the suburbs,
'differ in the percentage of students who smoke. A
'random survey of students gives the following results:
'
'Population sampleSize Smokers
'inner-city 125 47
'suburban 153 52
'
'What is a 90 percent confidence interval for the
'difference between the two schools?
COPY 1000 rptCount
COPY 47#1 78#0 innerCity
COPY 52#1 101#0 suburban
REPEAT rptCount
SAMPLE 125 innerCity innerCitySample
SAMPLE 153 suburban suburbanSample
COUNT innerCitySample =1 innerCitySmokers
COUNT suburbanSample =1 suburbanSmokers
DIVIDE innerCitySmokers 125 innerCityPercentage
DIVIDE suburbanSmokers 153 suburbanPercentage
SUBTRACT innerCityPercentage suburbanPercentage difference
SCORE difference differences
END
PERCENTILE differences (5 95) confidenceInterval
PRINT confidenceInterval
'From CliffsQuickReview Statistics, Example 1, Page 99:
'Compute the correlation coefficient for the relationship
'between months of exercise-machine ownership and hours
'of exercise per week. (The data is given in the program
'below.
'
'NOTE: this is not a resampling or Monte Carlo simulation.
'It is simply a use of the Statistics101 built-in CORR
'command, which computes the Pearson's product moment
'correlation coefficient.
DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised
CORR monthsOwned hoursExercised correlationCoefficient
PRINT correlationCoefficient
31 Back to top Finding significance of the Correlation Coefficient
'From CliffsQuickReview Statistics, Example 1 partB, Page 100:
'Compute the significance level for the correlation
'coefficient for the relationship between months of exercise-machine
'ownership and hours of exercise per week. (The data is given in
'the program below.)
'
'The null hypothesis is that the data are not correlated, i.e.,
'that the population correlation coefficient = 0.
'Therefore, we can bootstrap the two data items separately.
'That means we choose pairs of elements independently.
'Then we see how often the original sample's correlation
'coefficient, r,(independent of its sign)
'shows up based on the assumption that they are uncorrelated.
DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised
CORR monthsOwned hoursExercised r
PRINT "Sample correlation coefficient: " r
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 10 monthsOwned monthsOwnedBootstrap
SAMPLE 10 hoursExercised hoursExercisedBootstrap
CORR monthsOwnedBootstrap hoursExercisedBootstrap bootstrapCorrelationCoefficient
SCORE bootstrapCorrelationCoefficient correlationCoefficients
END
HISTOGRAM percent binsize 0.1 correlationCoefficients
'Compute 2-sided probability:
rPlus = ABS(r)
rMinus = -ABS(r)
COUNT correlationCoefficients <= rMinus coeffCountMinus
COUNT correlationCoefficients >= rPlus coeffCountPlus
significanceLevel = (coeffCountMinus + coeffCountPlus) / rptCount
PRINT significanceLevel
32 Back to top Confidence interval for the Correlation Coefficient
'This problem is not in the CliffsQuickReview book. I've just
'added it to demonstrate the technique.
'
'Compute the 95 percent confidence interval for the correlation
'coefficient for the relationship between months of exercise-machine
'ownership and hours of exercise per week. (The data is given in
'the program below.
'
'Since the data pairs are correlated, we must sample them
'in pairs, always taking for any random position in one,
'the corresponding element in the other. We do that using
'a "chooser" variable and the TAKE command as you see below.
'
DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised
COPY 1000 rptCount
REPEAT rptCount
SAMPLE 10 1,10 chooser
TAKE monthsOwned chooser monthsOwnedBootstrap
TAKE hoursExercised chooser hoursExercisedBootstrap
CORR monthsOwnedBootstrap hoursExercisedBootstrap correlationCoefficient
SCORE correlationCoefficient coefficients
END
percentile coefficients (2.5 97.5) confidenceInterval
PRINT confidenceInterval
'Here is a solution to the problem using the "jackknife" method
'instead of the "bootstrap" used above. Thanks to Gaj Vidmar for
'this solution.
DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised
COPY 1,10 is
FOREACH i is
WEED is =i j
TAKE monthsOwned j monthsOwnedJackknife
TAKE hoursExercised j hoursExercisedJackknife
CORR monthsOwnedJackknife hoursExercisedJackknife correlationCoefficientJackknife
SCORE correlationCoefficientJackknife coefficientsJackknife
END
PERCENTILE coefficientsJackknife (2.5 97.5) confidenceIntervalJackknife
PRINT confidenceIntervalJackknife
'From CliffsQuickReview Statistics, Page 102:
'Compute the linear regression coefficients for
'the relationship between months of exercise-machine
'ownership and hours of exercise per week. (The data
'is given in the program below.
'
'NOTE: this is not a resampling or Monte Carlo simulation.
'It is simply a use of the Statistics101 built-in REGRESS
'command, which computes the linear regression coefficients.
'
DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised
REGRESS hoursExercised monthsOwned coefficients
PRINT coefficients
TAKE coefficients 1 slope
TAKE coefficients 2 yIntercept
PRINT slope yIntercept
34 Back to top Confidence interval for the linear regression slope
'From CliffsQuickReview Statistics, Example 2 Page 105:
'Compute the 95% confidence interval for the slope of the
'regression line for the relationship between months of
'exercise-machine ownership and hours of exercise per week.
'(The data is given in the program below.)
'
'Since the data pairs are correlated, we must sample them
'in pairs, always taking for any random position in one,
'the corresponding element in the other. We do that using
'a "chooser" variable and the TAKE command as you see below.
'
DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 10 1,10 chooser
TAKE monthsOwned chooser monthsOwnedBootstrap
TAKE hoursExercised chooser hoursExercisedBootstrap
REGRESS hoursExercisedBootstrap monthsOwnedBootstrap linearCoefficients
TAKE linearCoefficients 1 slope
SCORE slope slopes
END
PERCENTILE slopes (2.5 97.5) confidenceInterval
PRINT confidenceInterval
'From CliffsQuickReview Statistics, Example 3 Page 107:
'What is a 90% confidence interval for the number of
'hours spent exercising per week if the exercise machine
'is owned 11 months?
'(The data is given in the program below.)
'
'Since the data pairs are correlated, we must sample them
'in pairs, always taking for any random position in one,
'the corresponding element in the other. We do that using
'a "chooser" variable and the TAKE command as you see below.
'
DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised
COPY 100000 rptCount
REPEAT rptCount
SAMPLE 10 1,10 chooser
TAKE monthsOwned chooser monthsOwnedBootstrap
TAKE hoursExercised chooser hoursExercisedBootstrap
REGRESS hoursExercisedBootstrap monthsOwnedBootstrap linearCoefficients
TAKE linearCoefficients 1 slope
TAKE linearCoefficients 2 yIntercept
'compute y value for x = 11 months:
MULTIPLY 11 slope term1
ADD term1 yIntercept yValue
SCORE yValue yValues
END
PERCENTILE yValues (5 95) confidenceInterval
PRINT confidenceInterval
'From CliffsQuickReview Statistics, Page 110:
'Suppose 125 children are shown three TV commercials
'A, B, and C, for breakfast cereal and are asked to
'pick which they liked best. The results are:
'
' A
B
C
Totals
'Boys 30 29
16
75
'Girls 12 33
5
50
'Totals 42 62 21 125
'
'Is the choice of favorite commercial related to
'whether the child is a boy or a girl?
'Null hypothesis: the commercial choice is not
'related to the sex of the child. This can be
'restated as: How often (or what is the
'probability that) the contents of the six inner
'cells would be as far or farther than they
'currently are from their expected values?
'Compare results for alpha = 0.05 vs. alpha =0.01.
'
'Setup vectors to hold the expected values and
'the observed values of the table.
COPY (25.2 37.2 12.6 16.8 24.8 8.4) expectedValues
COPY (30 29 16 12 33 5) observedData
CHISQUARE observedData expectedValues chiSquare
PRINT chiSquare
'Compute and record (SCORE) chi-square values for
'many simulated table cell entries.
COPY 42#1 62#2 21#3 ads
COPY 5000 rptcount
REPEAT rptcount
SHUFFLE ads ads
TAKE ads 1,75 boys
TAKE ads 76,125 girls
COUNT boys =1 boysAdA
COUNT boys =2 boysAdB
COUNT boys =3 boysAdC
COUNT girls =1 girlsAdA
COUNT girls =2 girlsAdB
COUNT girls =3 girlsAdC
'Rebuild a new table with the simulated data
COPY boysAdA boysAdB boysAdC girlsAdA girlsAdB girlsAdC observedData$
CHISQUARE observedData$ expectedValues chiSquare$
SCORE chiSquare$ chiSquareScores
END
COUNT chiSquarescores >= chiSquare chiCount
DIVIDE chiCount rptCount significanceLevel
PRINT significanceLevel
OUTPUT "Conclusions:\n"
OUTPUT "The null hypothesis is "
IF significanceLevel >= 0.05
OUTPUT "NOT "
END
OUTPUT "rejected at the 0.05 significance level.\n"
OUTPUT "The null hypothesis is "
IF significanceLevel >= 0.01
OUTPUT "NOT "
END
OUTPUT "rejected at the 0.01 significance level.\n"
|