Community Server

The platform that enables you to build rich, interactive communities
Welcome to Community Server Sign in | Join | Help
in Search

resampling problem

Last post 07-14-2009, 1:48 PM by carbonmetrics. 2 replies.
Sort Posts: Previous Next
  •  07-14-2009, 5:24 AM 81

    resampling problem

    i have a widget factory that changed something in the process. the question is whether the improvement is significant or not. The difference is 8597 and for sample a, n=5; for sample b n =2.

    i have tried to tackle the problem using the resamplingstats examples ["battery life"].
    using traditional statistics i get a p of 0.32 using a 2-tailed 2 test [assuming equal variances], using PAST and Gerry's stat tools [Excel add-in].

    applying statistics 101 i get a much lower p =o.11.
    is this correct?

    see code below!

    henk



    COPY(74734 70396 77854 55009 69342) a
    COPY(69213 52527 . . .) b
    CONCAT a b c
    REPEAT 15000
       SAMPLE 5 c d
       SAMPLE 5 c e
       MEAN d dd
       MEAN e ee
       SUBTRACT dd ee f
       SCORE f scrboard
    END
    COUNT scrboard >= 8597 k
    DIVIDE k 15000 prob
    PRINT prob

  •  07-14-2009, 9:14 AM 82 in reply to 81

    Re: resampling problem

    There are two issues I see in your program:

    1. When you do the subtraction, sometimes (about 1/2 the time) the difference will be negative. Thus your test for >= 8597 rejects those negative numbers, giving you a one-tailed test. To get a two-tailed test, you must take the absolute value of the differences before comparing. When you do, the result doubles to about p = 0.23, as you can see with this modified program:

    COPY(74734 70396 77854 55009 69342) a
    COPY(69213 52527 . . .) b
    CONCAT a b c
    REPEAT 15000
       SAMPLE 5 c d
       SAMPLE 5 c e
       MEAN d dd
       MEAN e ee
       SUBTRACT dd ee f
       SCORE f scrboard
    END
    ABS scrboard scrboardAbs     '<=== Compute Absolute values
    COUNT scrboardAbs >= 8597 k
    DIVIDE k 15000 prob
    PRINT prob


    2. The second issue is to decide why you have three NaNs in the second sample. There is no need for the two sample sizes to be equal unless your real sample sizes were equal. If you are using NaNs just to make the sample sizes equal because they were equal in the "battery life" example, then that would be incorrect. From your description of the problem, I would guess that perhaps sample "b" should exclude the NaNs and be of size two. Then "e" should be of size 2 also. That would lead to a program like this one, which raises the p to about 0.25:

    COPY(74734 70396 77854 55009 69342) a
    COPY(69213 52527) b    '<== No NaNs, sample size = 2
    CONCAT a b c
    REPEAT 15000
       SAMPLE 5 c d
       SAMPLE 2 c e        '<== Sample size = 2
       MEAN d dd
       MEAN e ee
       SUBTRACT dd ee f
       SCORE f scrboard
    END
    ABS scrboard scrboardAbs     '<=== Compute Absolute values
    COUNT scrboardAbs >= 8597 k
    DIVIDE k 15000 prob
    PRINT prob


  •  07-14-2009, 1:48 PM 83 in reply to 82

    Re: resampling problem

    Great! Thanks, it looks easy now, yet I could not find the problems myself. Problem solved. Interesting to see the difference with traditional statistics. The data are from a chemical factory in China that is trying to reduce its greenhouse gas emissions.

    Thanks again!
    Henk
View as RSS news feed in XML
Powered by Community Server, by Telligent Systems