Community Server

The platform that enables you to build rich, interactive communities
Welcome to Community Server Sign in | Join | Help
in Search

Correlating Multiple Sets of Vectors

Last post 12-19-2008, 9:32 AM by Random Walker. 9 replies.
Sort Posts: Previous Next
  •  12-17-2008, 11:58 AM 62

    Correlating Multiple Sets of Vectors

    Here is a general programming question that has me perplexed. I work with a lot of naturally gathered data, which means that I often have missing data points. The "CORR" command that is built in to Statistics101 does not like this very much as it returns NaN if I correlate two vectors that have any missing data points at all (even if they are of equal length).

    Now I would like it to do pairwise deletion. I came up with a fix for this problem using the following subroutine:

    NEWCMD CORREL var1 var2 r
       ADD var1 var2 C
       SUBTRACT C var2 A2
       SUBTRACT C var1 B2
       CLEAN A2
       CLEAN B2
       CORR A2 B2 r
    END

    This subroutine takes the two vectors of interest, adds them together to form a third vector. In doing so, any pair of elements which have at least one missing data point (NaN) will now have a missing data point. All pairs without missing data points are stored in this third vector (C) as Var1 + Var2. To get them back to original form, I simply subtract Var2 from C to get Var1 (with all pairwise missing now marked as NaN). And to get Var2, I subtract Var1 from C. I now have two vectors (A2 & B2) with all pairs that have a missing data point marked as NaN on both elements of the pair. Using the clean command I can reduce these two vectors to only the elements that have no missingness at the pair level. Then I use the CORR command to get a correlation (r).

    So this is great for when I have two variables with missing data that I want to know the correlation for. I double-checked this subroutine with standard statistical packages (e.g. SAS) and confirmed the results.

    However, this is where I am stuck. Much of my research involves correlating one variable with a large set of other variables. Or, even more complex, one set of variables with another set of variables.

    For example, say I have 3 personality scales and I want to correlate them with 3 IQ test scores. This yields 9 total correlations I am interested in. As of right now, I would have to write the CORREL command for each of these. This isn't so bad when you have 9 total. But currently I am working with a data set that includes 100 personality scales and 81 other measures. I'd like to compute all 8100 correlations (using the CORREL command I made above) and store them in one vector.

    I've already created a roundabout way of doing this, but I wonder if there is an easier way.

    I'd like a command that does this:
    Count the number of variables in "set #1". Count the number of variables in "set #2". Compute all possible correlations between the variables in set #1 and set#2 (with respect to missing data) and store all the output in 1 vector.

    Any ideas how I can write a subroutine that does this?

    Thanks in advance if you made it this far.

    Sherman

  •  12-18-2008, 8:48 AM 64 in reply to 62

    Re: Correlating Multiple Sets of Vectors

    Sherman,

    The CLEAN command, when given multiple arguments, treats them all element by element like you want. In other words, if there is a NaN at element three in arg1, then element three of all the arguments will be removed. If you don't need to save the original contents of the vectors, you can just do this:

    CLEAN var2 var2
    CORR var1 var2 r


    It works for any number of arguments. If you need to save the original contents of the inputs, you can reduce your subroutine to this:

    NEWCMD CORREL var1 var2 r
       COPY var1 var1Copy
       COPY var2 var2Copy
       CLEAN var1Copy var2Copy
       CORR var1Copy var2Copy r
    END


    Re your second question, if you can get all your data into two vectors, one containing the data for all the independent variables and one for all the dependent variables, then you can use a nested loop to traverse it. Say that each data set (e.g., the data for one personality test and/or one IQ test) has the same number of data points, S. In other words, the data for personality test one, P1 has S data points, personality test 2, P2 has S data points, etc. Also, the data for IQ test one, Q1 has S data points, for IQ test two, Q2 has S data points, etc.  If they have different numbers of data points, it will be more complicated.

    And, assume that you have the data in vectors P and Q, such that P is a concatenation of P1, P2, P3,..., Pm, while Q is a concatenation of Q1, Q2, ...Qn

    Then you can use something like this to go through it:


    'Computes correlation constants for all combinations of the datasets in vec1
    'with the datasets in vec2. Removes missing data (NaN) in a coordinated way
    'from the datasets being compared.
    'Inputs:
    '  vec1: vector containing some number of data sets all of equal size, dataSize.
    '  vec2: vector containing some number of data sets all of equal size, dataSize.
    '  dataSize: the number of elements in each data set.
    'Outputs:
    '  r     correlation constants for all combinations.
    NEWCMD CROSSCORR vec1 vec2 dataSetSize r @statistics \
      ?"Computes correlation constants for all combinations of the datasets in vec1 with the datasets in vec2.
       CLEAR r
       SIZE vec1 vec1Size
       SIZE vec2 vec2Size
       LET numberDataSets1 = vec1Size / dataSetSize
       LET numberDataSets2 = vec2Size / dataSetSize
       FOREACH dataSet1 1,numberDataSets1
          LET startIndex1 = (dataSet1 - 1) * dataSetSize + 1
          LET endIndex1 = startIndex1 + dataSetSize - 1
          TAKE vec1 startIndex1,endIndex1 dataSetVec1
          FOREACH dataSet2 1,numberDataSets2
             COPY dataSetVec1 dataSetVec1Copy   'Need copy so CLEAN won't remove NaNs from dataSetVec1
             LET startIndex2 = (dataSet2 - 1) * dataSetSize + 1
             LET endIndex2 = startIndex2 + dataSetSize - 1
             TAKE vec2 startIndex2,endIndex2 dataSetVec2
             CLEAN dataSetVec1Copy dataSetVec2
             CORR dataSetVec1Copy dataSetVec2 rTrial
             SCORE rTrial r
          END
       END
    END


    'EXAMPLE PROGRAM:
    '
    'Some simple test data
    COPY 1,10 11,20  P            'two data sets
    COPY 10,1 31,40 30,21 Q       'three data sets
    COPY 10 dataSize              'size of one data set

    CROSSCORR P Q dataSize r
    ROUND 3 r rRounded
    PRINT rRounded


    Here's the output:
    rRounded: (-1.0 1.0 -1.0 -1.0 1.0 -1.0)

    Hope that helps,

    John
  •  12-18-2008, 10:24 AM 65 in reply to 64

    Re: Correlating Multiple Sets of Vectors

    Hi John,

    Thanks for the reply. RE: the CORREL command, my issue isn't that I need to keep a copy of the original data, it is that I have data that looks like this:

    Var1  Var2
    7         9
    8         4
    Na      4
    16       12
    3         Na

    Each of the scores in this data set are paired, so if I just perform the CLEAN command I will make scores paired that don't belong together. Additionally, if one variable has more missing data points than another variable, the simple clean won't leave them with equal lengths.

    My little subroutine does this:

    Var1  Var2   Var1+Var2
    7         9             16
    8         4             12
    Na      4              Na
    16       12            28
    3        Na            Na

    Var1  Var2   Var1+Var2     SubtractVar2      SubtractVar1
    7         9             16                    7                          9
    8         4             12                    8                          4
    Na      4              Na                 NA                      NA
    16       12            28                  16                        12
    3        Na            Na                 NA                      NA

    Now clean to get:

    SubVar2    SubVar1
    7               9
    8               4
    16            12

    Now correlate these two vectors.


    Anyhow, huge thanks for the idea of putting all the data into one vector. I will have to think about how I implement this to do what I want.

    For what it's worth, here is my "roundabout" way of getting what I want...maybe it will give you more insight as to what I am trying to do and maybe you can think of a simpler way for me to write it.

    I am trying to compute the 6700 correlations between the RBQ items (rbq1--rbq67) and the CAQ items (caq1--caq100).

    READ file "C:\\test2.txt"  SID rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100
    PRINT SID rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100
    COPY 1000 numtrials 'Set the number of simulations you would like to run
    NEWCMD CORREL A B r
       ADD A B C
       SUBTRACT C B A2
       SUBTRACT C A B2
       CLEAN A2
       CLEAN B2
       CORR A2 B2 r
    END
    GLOBAL caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100
    NEWCMD MultR caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100 rbqvar rvec1
       CORREL caq1 rbqvar r1
       CORREL caq2 rbqvar r2
       CORREL caq3 rbqvar r3
       CORREL caq4 rbqvar r4
       CORREL caq5 rbqvar r5
       CORREL caq6 rbqvar r6
       CORREL caq7 rbqvar r7
       CORREL caq8 rbqvar r8
       CORREL caq9 rbqvar r9
       CORREL caq10 rbqvar r10
       CORREL caq11 rbqvar r11
       CORREL caq12 rbqvar r12
       CORREL caq13 rbqvar r13
       CORREL caq14 rbqvar r14
       CORREL caq15 rbqvar r15
       CORREL caq16 rbqvar r16
       CORREL caq17 rbqvar r17
       CORREL caq18 rbqvar r18
       CORREL caq19 rbqvar r19
       CORREL caq20 rbqvar r20
       CORREL caq21 rbqvar r21
       CORREL caq22 rbqvar r22
       CORREL caq23 rbqvar r23
       CORREL caq24 rbqvar r24
       CORREL caq25 rbqvar r25
       CORREL caq26 rbqvar r26
       CORREL caq27 rbqvar r27
       CORREL caq28 rbqvar r28
       CORREL caq29 rbqvar r29
       CORREL caq30 rbqvar r30
       CORREL caq31 rbqvar r31
       CORREL caq32 rbqvar r32
       CORREL caq33 rbqvar r33
       CORREL caq34 rbqvar r34
       CORREL caq35 rbqvar r35
       CORREL caq36 rbqvar r36
       CORREL caq37 rbqvar r37
       CORREL caq38 rbqvar r38
       CORREL caq39 rbqvar r39
       CORREL caq40 rbqvar r40
       CORREL caq41 rbqvar r41
       CORREL caq42 rbqvar r42
       CORREL caq43 rbqvar r43
       CORREL caq44 rbqvar r44
       CORREL caq45 rbqvar r45
       CORREL caq46 rbqvar r46
       CORREL caq47 rbqvar r47
       CORREL caq48 rbqvar r48
       CORREL caq49 rbqvar r49
       CORREL caq50 rbqvar r50
       CORREL caq51 rbqvar r51
       CORREL caq52 rbqvar r52
       CORREL caq53 rbqvar r53
       CORREL caq54 rbqvar r54
       CORREL caq55 rbqvar r55
       CORREL caq56 rbqvar r56
       CORREL caq57 rbqvar r57
       CORREL caq58 rbqvar r58
       CORREL caq59 rbqvar r59
       CORREL caq60 rbqvar r60
       CORREL caq61 rbqvar r61
       CORREL caq62 rbqvar r62
       CORREL caq63 rbqvar r63
       CORREL caq64 rbqvar r64
       CORREL caq65 rbqvar r65
       CORREL caq66 rbqvar r66
       CORREL caq67 rbqvar r67
       CORREL caq68 rbqvar r68
       CORREL caq69 rbqvar r69
       CORREL caq70 rbqvar r70
       CORREL caq71 rbqvar r71
       CORREL caq72 rbqvar r72
       CORREL caq73 rbqvar r73
       CORREL caq74 rbqvar r74
       CORREL caq75 rbqvar r75
       CORREL caq76 rbqvar r76
       CORREL caq77 rbqvar r77
       CORREL caq78 rbqvar r78
       CORREL caq79 rbqvar r79
       CORREL caq80 rbqvar r80
       CORREL caq81 rbqvar r81
       CORREL caq82 rbqvar r82
       CORREL caq83 rbqvar r83
       CORREL caq84 rbqvar r84
       CORREL caq85 rbqvar r85
       CORREL caq86 rbqvar r86
       CORREL caq87 rbqvar r87
       CORREL caq88 rbqvar r88
       CORREL caq89 rbqvar r89
       CORREL caq90 rbqvar r90
       CORREL caq91 rbqvar r91
       CORREL caq92 rbqvar r92
       CORREL caq93 rbqvar r93
       CORREL caq94 rbqvar r94
       CORREL caq95 rbqvar r95
       CORREL caq96 rbqvar r96
       CORREL caq97 rbqvar r97
       CORREL caq98 rbqvar r98
       CORREL caq99 rbqvar r99
       CORREL caq100 rbqvar r100
       CONCAT r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 r32 r33 r34 r35 r36 r37 r38 r39 r40 r41 r42 r43 r44 r45 r46 r47 r48 r49 r50 r51 r52 r53 r54 r55 r56 r57 r58 r59 r60 r61 r62 r63 r64 r65 r66 r67 r68 r69 r70 r71 r72 r73 r74 r75 r76 r77 r78 r79 r80 r81 r82 r83 r84 r85 r86 r87 r88 r89 r90 r91 r92 r93 r94 r95 r96 r97 r98 r99 r100 rvec1
    END
    'Now I run MultR for each of the 67 RBQ items to give me 67 vectors of 100 correlations each
    'Here is an example of just one; at the end, I concatenate all of the 67 vectors to make one vector of
    '6700 correlations
    MultR caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100 rbq1 rvec1

    Thanks for your help...you have given me something to think about.

    Best,

    Sherman
  •  12-18-2008, 12:46 PM 66 in reply to 65

    Re: Correlating Multiple Sets of Vectors

    Sherman,

    Try it. It works. Here's proof:

    DATA (7 8 nan 16 3) var1
    DATA (9 4 4 12 nan) var2
    CLEAN var1 var2
    PRINT var1 var2


    Result:

    var1: (7.0 8.0 16.0)
    var2: (9.0 4.0 12.0)


    I'll Let you know if I can think of a way to help simplify your program.

    Regards,

    John
  •  12-18-2008, 2:02 PM 67 in reply to 66

    Re: Correlating Multiple Sets of Vectors

    Random Walker:
    Sherman,

    Try it. It works. Here's proof:

    DATA (7 8 nan 16 3) var1
    DATA (9 4 4 12 nan) var2
    CLEAN var1 var2
    PRINT var1 var2


    Result:

    var1: (7.0 8.0 16.0)
    var2: (9.0 4.0 12.0)


    I'll Let you know if I can think of a way to help simplify your program.

    Regards,

    John


    Wow. This does work. Thanks John!

    Sherman
  •  12-18-2008, 4:38 PM 68 in reply to 67

    Re: Correlating Multiple Sets of Vectors

    Sherman,

    Here's a program that I think will do everything that the one you posted will do, using my CROSSCORR subroutine (repeated here). I didn't have data to test it, but let me know how it goes for you.

    John


    'Computes correlation constants for all combinations of the datasets in vec1
    'with the datasets in vec2. Removes missing data (NaN) in a coordinated way
    'from the datasets being compared.
    'Inputs:
    '  vec1: vector containing some number of data sets all of equal size, dataSetSize.
    '  vec2: vector containing some number of data sets all of equal size, dataSetSize.
    '  dataSetSize: the number of elements in each data set.
    'Outputs:
    '  r     correlation constants for all combinations.
    NEWCMD CROSSCORR vec1 vec2 dataSetSize r @statistics \
      ?"Computes correlation constants for all combinations of the datasets in vec1 with the datasets in vec2.
       CLEAR r
       SIZE vec1 vec1Size
       SIZE vec2 vec2Size
       LET numberDataSets1 = vec1Size / dataSetSize
       LET numberDataSets2 = vec2Size / dataSetSize
       FOREACH dataSet1 1,numberDataSets1
          LET startIndex1 = (dataSet1 - 1) * dataSetSize + 1
          LET endIndex1 = startIndex1 + dataSetSize - 1
          TAKE vec1 startIndex1,endIndex1 dataSetVec1
          FOREACH dataSet2 1,numberDataSets2
             COPY dataSetVec1 dataSetVec1Copy   'Need copy so CLEAN won't remove NaNs from dataSetVec1
             LET startIndex2 = (dataSet2 - 1) * dataSetSize + 1
             LET endIndex2 = startIndex2 + dataSetSize - 1
             TAKE vec2 startIndex2,endIndex2 dataSetVec2
             CLEAN dataSetVec1Copy dataSetVec2
             CORR dataSetVec1Copy dataSetVec2 rTrial
             SCORE rTrial r
          END
       END
    END

    READ file "C:\\test2.txt"  SID rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100

    PRINT SID rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100

    'Concatenate all the caqs into a single caqVector:
    COPY caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100 caqVector

    'Concatenate all the caqs into a single rbqVector:
    COPY rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 rbqVector

    'Now, get the answer:
    SIZE rbq1 dataSetSize   'Assumes all are same size
    CROSSCORR caqVector rbqVector dataSetSize rVec
    PRINT rVec


  •  12-18-2008, 7:47 PM 69 in reply to 68

    Re: Correlating Multiple Sets of Vectors

    Hi John,

    I tested your program and it seems to be perfect...it matches the output of my (lengthier) program exactly (except for some reason at the 17th (last) decimal place...I'm not sure why that would be but if that is the only difference it really isn't a difference at all).

    But now that you have proven so helpful (and no good deed goes unpunished ;) ) I have another question.

    The program you built does EXACTLY what I want the 1st have of my program to do. Which is to calculate the number of observed correlations between two sets of variables above a certain criterion (say > |.15| ) and/or compute the average absolute value of that 100 x 67 variable vector.

    In the second half of my program I am trying to figure out the probability of observing the average absolute value if my data were just random.

    That is, if the relationship between data set #1 and data set #2 was just random, what is the probability of getting the average absolute value of from the 6700 correlations that I got?

    In my old roundabout program, I used a coordinated shuffle and then ran the series of MULTR like I had originally done. I need to use the coordinated shuffle rather than the ordinary shuffle because RBQ1 must stay with its associated RBQx's but I'd like to randomly pair an observed set of RBQ vectors with an observed CAQ vector. I repeated this 1000 times and computed the average absolute r for each of those trials to form a distribution of average absolute r's, to which I compare my observed and get the resulting probability.

    Is there some way to implement this fantastically new procedure that you built to build this distribution?

    Here is the coded I have been using:

    NEWCMD SHUFFLECOORD variable #"variable {variable}" @"coordinated operations" ?"Coordinated shuffle, in place, of two or more vectors"
       ARGCOUNT numberOfArgs
       IF numberOfArgs > 1
          SIZE variable vecSize
          COPY 1,vecSize positions
          SHUFFLE positions positions
          TAKE variable positions variable
          FOREACH argNum 1,numberOfArgs
             GETARG argNum arg
             TAKE arg positions arg
          END
       ELSE
          PRINT numberOfArgs
          DEBUG "ERROR: Incorrect number of arguments in SHUFFLECOORD."
       END
    END
    REPEAT numtrials 'This part begins the actual simulation
       SHUFFLECOORD rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64
       MultR caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100 rbq1 rvec1

    'This MultR repeats for each RBQ item (i actually have 64, not 67 but it doesn't matter).

       CONCAT rvec1 rvec2 rvec3 rvec4 rvec5 rvec6 rvec7 rvec8 rvec9 rvec10 rvec11 rvec12 rvec13 rvec14 rvec15 rvec16 rvec17 rvec18 rvec19 rvec20 rvec21 rvec22 rvec23 rvec24 rvec25 rvec26 rvec27 rvec28 rvec29 rvec30 rvec31 rvec32 rvec33 rvec34 rvec35 rvec36 rvec37 rvec38 rvec39 rvec40 rvec41 rvec42 rvec43 rvec44 rvec45 rvec46 rvec47 rvec48 rvec49 rvec50 rvec51 rvec52 rvec53 rvec54 rvec55 rvec56 rvec57 rvec58 rvec59 rvec60 rvec61 rvec62 rvec63 rvec64 rsimvecALL 'This command puts all values of r into one vector called rsimvecALL 'This command puts all values of r into one vector called rsimvec
       ABS rsimvecALL rsimvec_abs 'Turns the values of rs into absolute values
       COUNT rsimvec_abs >= rtest Sim_Sig
       MEAN rsimvec_abs meanabsr_sim
       SCORE meanabsr_sim meanabsr_sims
       SCORE Sim_Sig Sim_Sigs
    END
    MEAN meanabsr_sims meansimmeans
    STDEV meanabsr_sims SDsimmeans
    MEAN Sim_Sigs Sim_SigsMean
    STDEV Sim_Sigs Sim_SigsSD
    COUNT Sim_Sigs >= Obs_Sig SigChance
    DIVIDE SigChance numtrials SigChanceProb
    COUNT meanabsr_sims >= meanabsr_obs test
    DIVIDE test numtrials prob
    PERCENTILE meanabsr_sims (95) percentile95
    PRINT meansimmeans SDsimmeans prob SigChanceProb Sim_SigsMean Sim_SigsSD percentile95
    HISTOGRAM meanabsr_sims
    HISTOGRAM Sim_Sigs


    Thanks again. You are amazing.

    Sherman

  •  12-18-2008, 9:06 PM 70 in reply to 69

    Re: Correlating Multiple Sets of Vectors

    Sherman,

    I think the following will do what you want.

    John
    -------------------------------------

    '... Here would go the earlier program that calculated the first vector of correlation constants.

    'Then, here's the "rest of the story":

    COPY 1000 numtrials 'Set the number of simulations you would like to run

    REPEAT numtrials 'This part begins the actual simulation
       SHUFFLECOORD rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64
       CONCAT rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbqVectorShuffled
       CROSSCORR caqVector rbqVectorShuffled dataSetSize rsimvecALL
       ABS rsimvecALL rsimvec_abs 'Turns the values of rs into absolute values
       COUNT rsimvec_abs >= rtest Sim_Sig
       MEAN rsimvec_abs meanabsr_sim
       SCORE meanabsr_sim meanabsr_sims
       SCORE Sim_Sig Sim_Sigs
       CLEAR rsimvecALL  'Clear the r vector in prep for next go-round
    END
    MEAN meanabsr_sims meansimmeans
    STDEV meanabsr_sims SDsimmeans
    MEAN Sim_Sigs Sim_SigsMean
    STDEV Sim_Sigs Sim_SigsSD
    COUNT Sim_Sigs >= Obs_Sig SigChance
    DIVIDE SigChance numtrials SigChanceProb
    COUNT meanabsr_sims >= meanabsr_obs test
    DIVIDE test numtrials prob
    PERCENTILE meanabsr_sims (95) percentile95
    PRINT meansimmeans SDsimmeans prob SigChanceProb Sim_SigsMean Sim_SigsSD percentile95
    HISTOGRAM meanabsr_sims
    HISTOGRAM Sim_Sigs


  •  12-18-2008, 10:34 PM 71 in reply to 70

    Re: Correlating Multiple Sets of Vectors

    John,

    Thanks again. This is fantastic. Saves me about 7 minutes of run time (from 20 to 13) with about 160 data points.

    Do you have any pointers for when I can learn to write these subroutines better? I've tried to figure out what each command is doing in some subroutines your have written (e.g. coordshuf) but I get lost every time.

    Thanks a ton.

    Sherman
  •  12-19-2008, 9:32 AM 72 in reply to 71

    Re: Correlating Multiple Sets of Vectors

    Sherman,

    Glad I was able to help.

    Here are some hints to that might help to understand the existing subroutines:
    •  Make sure you know what each individual command is supposed to do. The help docs describe each command in detail with examples of their use.
    • The "Special Techniques" section at the end of the tutorial text discusses some of the concepts used in the subroutines including the SHUFFLECOORD subroutine.
    • To understand how a specific subroutine works, you can write a short program that uses it or copy an example from the doc and then use the Statistics101 built-in debugger to step through each command in the subroutine and watch how the variables change.

    You can always contact me by email, or post here if you have specific questions.

    Regards,

    John
View as RSS news feed in XML
Powered by Community Server, by Telligent Systems