|
|
Correlating Multiple Sets of Vectors
-
12-17-2008, 11:58 AM |
-
Sherman
-
-
-
Joined on 10-23-2008
-
-
Posts 12
-
-
|
Correlating Multiple Sets of Vectors
Here is a general programming question that has me perplexed. I work with a lot of naturally gathered data, which means that I often have missing data points. The "CORR" command that is built in to Statistics101 does not like this very much as it returns NaN if I correlate two vectors that have any missing data points at all (even if they are of equal length).
Now I would like it to do pairwise deletion. I came up with a fix for this problem using the following subroutine:
NEWCMD CORREL var1 var2 r ADD var1 var2 C SUBTRACT C var2 A2 SUBTRACT C var1 B2 CLEAN A2 CLEAN B2 CORR A2 B2 r END
This subroutine takes the two vectors of interest, adds them together to form a third vector. In doing so, any pair of elements which have at least one missing data point (NaN) will now have a missing data point. All pairs without missing data points are stored in this third vector (C) as Var1 + Var2. To get them back to original form, I simply subtract Var2 from C to get Var1 (with all pairwise missing now marked as NaN). And to get Var2, I subtract Var1 from C. I now have two vectors (A2 & B2) with all pairs that have a missing data point marked as NaN on both elements of the pair. Using the clean command I can reduce these two vectors to only the elements that have no missingness at the pair level. Then I use the CORR command to get a correlation (r).
So this is great for when I have two variables with missing data that I want to know the correlation for. I double-checked this subroutine with standard statistical packages (e.g. SAS) and confirmed the results.
However, this is where I am stuck. Much of my research involves correlating one variable with a large set of other variables. Or, even more complex, one set of variables with another set of variables.
For example, say I have 3 personality scales and I want to correlate them with 3 IQ test scores. This yields 9 total correlations I am interested in. As of right now, I would have to write the CORREL command for each of these. This isn't so bad when you have 9 total. But currently I am working with a data set that includes 100 personality scales and 81 other measures. I'd like to compute all 8100 correlations (using the CORREL command I made above) and store them in one vector.
I've already created a roundabout way of doing this, but I wonder if there is an easier way.
I'd like a command that does this: Count the number of variables in "set #1". Count the number of variables in "set #2". Compute all possible correlations between the variables in set #1 and set#2 (with respect to missing data) and store all the output in 1 vector.
Any ideas how I can write a subroutine that does this?
Thanks in advance if you made it this far.
Sherman
|
|
-
12-18-2008, 8:48 AM |
-
Random Walker
-
-
-
Joined on 05-15-2006
-
-
Posts 45
-
-
|
Re: Correlating Multiple Sets of Vectors
Sherman,
The CLEAN command, when given multiple arguments, treats them all element by element like you want. In other words, if there is a NaN at element three in arg1, then element three of all the arguments will be removed. If you don't need to save the original contents of the vectors, you can just do this:
CLEAN var2 var2 CORR var1 var2 r
It works for any number of arguments. If you need to save the original contents of the inputs, you can reduce your subroutine to this:
NEWCMD CORREL var1 var2 r COPY var1 var1Copy COPY var2 var2Copy CLEAN var1Copy var2Copy CORR var1Copy var2Copy r END
Re your second question, if you can get all your data into two vectors, one containing the data for all the independent variables and one for all the dependent variables, then you can use a nested loop to traverse it. Say that each data set (e.g., the data for one personality test and/or one IQ test) has the same number of data points, S. In other words, the data for personality test one, P1 has S data points, personality test 2, P2 has S data points, etc. Also, the data for IQ test one, Q1 has S data points, for IQ test two, Q2 has S data points, etc. If they have different numbers of data points, it will be more complicated.
And, assume that you have the data in vectors P and Q, such that P is a concatenation of P1, P2, P3,..., Pm, while Q is a concatenation of Q1, Q2, ...Qn
Then you can use something like this to go through it:
'Computes correlation constants for all combinations of the datasets in vec1 'with the datasets in vec2. Removes missing data (NaN) in a coordinated way 'from the datasets being compared. 'Inputs: ' vec1: vector containing some number of data sets all of equal size, dataSize. ' vec2: vector containing some number of data sets all of equal size, dataSize. ' dataSize: the number of elements in each data set. 'Outputs: ' r correlation constants for all combinations. NEWCMD CROSSCORR vec1 vec2 dataSetSize r @statistics \ ?"Computes correlation constants for all combinations of the datasets in vec1 with the datasets in vec2. CLEAR r SIZE vec1 vec1Size SIZE vec2 vec2Size LET numberDataSets1 = vec1Size / dataSetSize LET numberDataSets2 = vec2Size / dataSetSize FOREACH dataSet1 1,numberDataSets1 LET startIndex1 = (dataSet1 - 1) * dataSetSize + 1 LET endIndex1 = startIndex1 + dataSetSize - 1 TAKE vec1 startIndex1,endIndex1 dataSetVec1 FOREACH dataSet2 1,numberDataSets2 COPY dataSetVec1 dataSetVec1Copy 'Need copy so CLEAN won't remove NaNs from dataSetVec1 LET startIndex2 = (dataSet2 - 1) * dataSetSize + 1 LET endIndex2 = startIndex2 + dataSetSize - 1 TAKE vec2 startIndex2,endIndex2 dataSetVec2 CLEAN dataSetVec1Copy dataSetVec2 CORR dataSetVec1Copy dataSetVec2 rTrial SCORE rTrial r END END END
'EXAMPLE PROGRAM: ' 'Some simple test data COPY 1,10 11,20 P 'two data sets COPY 10,1 31,40 30,21 Q 'three data sets COPY 10 dataSize 'size of one data set
CROSSCORR P Q dataSize r ROUND 3 r rRounded PRINT rRounded
Here's the output: rRounded: (-1.0 1.0 -1.0 -1.0 1.0 -1.0)
Hope that helps,
John
|
|
-
12-18-2008, 10:24 AM |
-
Sherman
-
-
-
Joined on 10-23-2008
-
-
Posts 12
-
-
|
Re: Correlating Multiple Sets of Vectors
Hi John,
Thanks for the reply. RE: the CORREL command, my issue isn't that I
need to keep a copy of the original data, it is that I have data that
looks like this:
Var1 Var2
7 9
8 4
Na 4
16 12
3 Na
Each of the scores in this data set are paired, so if I just perform
the CLEAN command I will make scores paired that don't belong together.
Additionally, if one variable has more missing data points than another
variable, the simple clean won't leave them with equal lengths.
My little subroutine does this:
Var1 Var2 Var1+Var2
7 9 16
8 4 12
Na 4 Na
16 12 28
3 Na Na
Var1 Var2 Var1+Var2 SubtractVar2 SubtractVar1
7 9 16 7 9
8 4 12 8 4
Na 4 Na NA NA
16 12 28 16 12
3 Na Na NA NA
Now clean to get:
SubVar2 SubVar1
7 9
8 4
16 12
Now correlate these two vectors.
Anyhow, huge thanks for the idea of putting all the data into one
vector. I will have to think about how I implement this to do what I
want.
For what it's worth, here is my "roundabout" way of getting what I
want...maybe it will give you more insight as to what I am trying to do
and maybe you can think of a simpler way for me to write it.
I am trying to compute the 6700 correlations between the RBQ items (rbq1--rbq67) and the CAQ items (caq1--caq100).
READ file "C:\\test2.txt" SID rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8
rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20
rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32
rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44
rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56
rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 caq1
caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14
caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26
caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38
caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50
caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62
caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74
caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86
caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98
caq99 caq100
PRINT SID rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11
rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23
rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35
rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47
rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59
rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 caq1 caq2 caq3 caq4
caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16
caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28
caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40
caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52
caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64
caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76
caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88
caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100
COPY 1000 numtrials 'Set the number of simulations you would like to run
NEWCMD CORREL A B r
ADD A B C
SUBTRACT C B A2
SUBTRACT C A B2
CLEAN A2
CLEAN B2
CORR A2 B2 r
END
GLOBAL caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12
caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24
caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36
caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48
caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60
caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72
caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84
caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96
caq97 caq98 caq99 caq100
NEWCMD MultR caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11
caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23
caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35
caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47
caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59
caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71
caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83
caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95
caq96 caq97 caq98 caq99 caq100 rbqvar rvec1
CORREL caq1 rbqvar r1
CORREL caq2 rbqvar r2
CORREL caq3 rbqvar r3
CORREL caq4 rbqvar r4
CORREL caq5 rbqvar r5
CORREL caq6 rbqvar r6
CORREL caq7 rbqvar r7
CORREL caq8 rbqvar r8
CORREL caq9 rbqvar r9
CORREL caq10 rbqvar r10
CORREL caq11 rbqvar r11
CORREL caq12 rbqvar r12
CORREL caq13 rbqvar r13
CORREL caq14 rbqvar r14
CORREL caq15 rbqvar r15
CORREL caq16 rbqvar r16
CORREL caq17 rbqvar r17
CORREL caq18 rbqvar r18
CORREL caq19 rbqvar r19
CORREL caq20 rbqvar r20
CORREL caq21 rbqvar r21
CORREL caq22 rbqvar r22
CORREL caq23 rbqvar r23
CORREL caq24 rbqvar r24
CORREL caq25 rbqvar r25
CORREL caq26 rbqvar r26
CORREL caq27 rbqvar r27
CORREL caq28 rbqvar r28
CORREL caq29 rbqvar r29
CORREL caq30 rbqvar r30
CORREL caq31 rbqvar r31
CORREL caq32 rbqvar r32
CORREL caq33 rbqvar r33
CORREL caq34 rbqvar r34
CORREL caq35 rbqvar r35
CORREL caq36 rbqvar r36
CORREL caq37 rbqvar r37
CORREL caq38 rbqvar r38
CORREL caq39 rbqvar r39
CORREL caq40 rbqvar r40
CORREL caq41 rbqvar r41
CORREL caq42 rbqvar r42
CORREL caq43 rbqvar r43
CORREL caq44 rbqvar r44
CORREL caq45 rbqvar r45
CORREL caq46 rbqvar r46
CORREL caq47 rbqvar r47
CORREL caq48 rbqvar r48
CORREL caq49 rbqvar r49
CORREL caq50 rbqvar r50
CORREL caq51 rbqvar r51
CORREL caq52 rbqvar r52
CORREL caq53 rbqvar r53
CORREL caq54 rbqvar r54
CORREL caq55 rbqvar r55
CORREL caq56 rbqvar r56
CORREL caq57 rbqvar r57
CORREL caq58 rbqvar r58
CORREL caq59 rbqvar r59
CORREL caq60 rbqvar r60
CORREL caq61 rbqvar r61
CORREL caq62 rbqvar r62
CORREL caq63 rbqvar r63
CORREL caq64 rbqvar r64
CORREL caq65 rbqvar r65
CORREL caq66 rbqvar r66
CORREL caq67 rbqvar r67
CORREL caq68 rbqvar r68
CORREL caq69 rbqvar r69
CORREL caq70 rbqvar r70
CORREL caq71 rbqvar r71
CORREL caq72 rbqvar r72
CORREL caq73 rbqvar r73
CORREL caq74 rbqvar r74
CORREL caq75 rbqvar r75
CORREL caq76 rbqvar r76
CORREL caq77 rbqvar r77
CORREL caq78 rbqvar r78
CORREL caq79 rbqvar r79
CORREL caq80 rbqvar r80
CORREL caq81 rbqvar r81
CORREL caq82 rbqvar r82
CORREL caq83 rbqvar r83
CORREL caq84 rbqvar r84
CORREL caq85 rbqvar r85
CORREL caq86 rbqvar r86
CORREL caq87 rbqvar r87
CORREL caq88 rbqvar r88
CORREL caq89 rbqvar r89
CORREL caq90 rbqvar r90
CORREL caq91 rbqvar r91
CORREL caq92 rbqvar r92
CORREL caq93 rbqvar r93
CORREL caq94 rbqvar r94
CORREL caq95 rbqvar r95
CORREL caq96 rbqvar r96
CORREL caq97 rbqvar r97
CORREL caq98 rbqvar r98
CORREL caq99 rbqvar r99
CORREL caq100 rbqvar r100
CONCAT r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17
r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 r32 r33 r34 r35
r36 r37 r38 r39 r40 r41 r42 r43 r44 r45 r46 r47 r48 r49 r50 r51 r52 r53
r54 r55 r56 r57 r58 r59 r60 r61 r62 r63 r64 r65 r66 r67 r68 r69 r70 r71
r72 r73 r74 r75 r76 r77 r78 r79 r80 r81 r82 r83 r84 r85 r86 r87 r88 r89
r90 r91 r92 r93 r94 r95 r96 r97 r98 r99 r100 rvec1
END
'Now I run MultR for each of the 67 RBQ items to give me 67 vectors of 100 correlations each
'Here is an example of just one; at the end, I concatenate all of the 67 vectors to make one vector of
'6700 correlations
MultR caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12
caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24
caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36
caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48
caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60
caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72
caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84
caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96
caq97 caq98 caq99 caq100 rbq1 rvec1
Thanks for your help...you have given me something to think about.
Best,
Sherman
|
|
-
12-18-2008, 12:46 PM |
-
Random Walker
-
-
-
Joined on 05-15-2006
-
-
Posts 45
-
-
|
Re: Correlating Multiple Sets of Vectors
Sherman,
Try it. It works. Here's proof:
DATA (7 8 nan 16 3) var1 DATA (9 4 4 12 nan) var2 CLEAN var1 var2 PRINT var1 var2
Result:
var1: (7.0 8.0 16.0) var2: (9.0 4.0 12.0)
I'll Let you know if I can think of a way to help simplify your program.
Regards,
John
|
|
-
12-18-2008, 2:02 PM |
-
Sherman
-
-
-
Joined on 10-23-2008
-
-
Posts 12
-
-
|
Re: Correlating Multiple Sets of Vectors
Random Walker:Sherman,
Try it. It works. Here's proof:
DATA (7 8 nan 16 3) var1 DATA (9 4 4 12 nan) var2 CLEAN var1 var2 PRINT var1 var2
Result:
var1: (7.0 8.0 16.0) var2: (9.0 4.0 12.0)
I'll Let you know if I can think of a way to help simplify your program.
Regards,
John
Wow. This does work. Thanks John! Sherman
|
|
-
12-18-2008, 4:38 PM |
-
Random Walker
-
-
-
Joined on 05-15-2006
-
-
Posts 45
-
-
|
Re: Correlating Multiple Sets of Vectors
Sherman,
Here's a program that I think will do everything that the one you posted will do, using my CROSSCORR subroutine (repeated here). I didn't have data to test it, but let me know how it goes for you.
John
'Computes correlation constants for all combinations of the datasets in vec1 'with the datasets in vec2. Removes missing data (NaN) in a coordinated way 'from the datasets being compared. 'Inputs: ' vec1: vector containing some number of data sets all of equal size, dataSetSize. ' vec2: vector containing some number of data sets all of equal size, dataSetSize. ' dataSetSize: the number of elements in each data set. 'Outputs: ' r correlation constants for all combinations. NEWCMD CROSSCORR vec1 vec2 dataSetSize r @statistics \ ?"Computes correlation constants for all combinations of the datasets in vec1 with the datasets in vec2. CLEAR r SIZE vec1 vec1Size SIZE vec2 vec2Size LET numberDataSets1 = vec1Size / dataSetSize LET numberDataSets2 = vec2Size / dataSetSize FOREACH dataSet1 1,numberDataSets1 LET startIndex1 = (dataSet1 - 1) * dataSetSize + 1 LET endIndex1 = startIndex1 + dataSetSize - 1 TAKE vec1 startIndex1,endIndex1 dataSetVec1 FOREACH dataSet2 1,numberDataSets2 COPY dataSetVec1 dataSetVec1Copy 'Need copy so CLEAN won't remove NaNs from dataSetVec1 LET startIndex2 = (dataSet2 - 1) * dataSetSize + 1 LET endIndex2 = startIndex2 + dataSetSize - 1 TAKE vec2 startIndex2,endIndex2 dataSetVec2 CLEAN dataSetVec1Copy dataSetVec2 CORR dataSetVec1Copy dataSetVec2 rTrial SCORE rTrial r END END END
READ file "C:\\test2.txt" SID rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100
PRINT SID rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100
'Concatenate all the caqs into a single caqVector: COPY caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100 caqVector
'Concatenate all the caqs into a single rbqVector: COPY rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbq65 rbq66 rbq67 rbqVector
'Now, get the answer: SIZE rbq1 dataSetSize 'Assumes all are same size CROSSCORR caqVector rbqVector dataSetSize rVec PRINT rVec
|
|
-
12-18-2008, 7:47 PM |
-
Sherman
-
-
-
Joined on 10-23-2008
-
-
Posts 12
-
-
|
Re: Correlating Multiple Sets of Vectors
Hi John,
I tested your program and it seems to be perfect...it matches the output of my (lengthier) program exactly (except for some reason at the 17th (last) decimal place...I'm not sure why that would be but if that is the only difference it really isn't a difference at all).
But now that you have proven so helpful (and no good deed goes unpunished ;) ) I have another question.
The program you built does EXACTLY what I want the 1st have of my program to do. Which is to calculate the number of observed correlations between two sets of variables above a certain criterion (say > |.15| ) and/or compute the average absolute value of that 100 x 67 variable vector.
In the second half of my program I am trying to figure out the probability of observing the average absolute value if my data were just random.
That is, if the relationship between data set #1 and data set #2 was just random, what is the probability of getting the average absolute value of from the 6700 correlations that I got?
In my old roundabout program, I used a coordinated shuffle and then ran the series of MULTR like I had originally done. I need to use the coordinated shuffle rather than the ordinary shuffle because RBQ1 must stay with its associated RBQx's but I'd like to randomly pair an observed set of RBQ vectors with an observed CAQ vector. I repeated this 1000 times and computed the average absolute r for each of those trials to form a distribution of average absolute r's, to which I compare my observed and get the resulting probability.
Is there some way to implement this fantastically new procedure that you built to build this distribution?
Here is the coded I have been using:
NEWCMD SHUFFLECOORD variable #"variable {variable}" @"coordinated operations" ?"Coordinated shuffle, in place, of two or more vectors" ARGCOUNT numberOfArgs IF numberOfArgs > 1 SIZE variable vecSize COPY 1,vecSize positions SHUFFLE positions positions TAKE variable positions variable FOREACH argNum 1,numberOfArgs GETARG argNum arg TAKE arg positions arg END ELSE PRINT numberOfArgs DEBUG "ERROR: Incorrect number of arguments in SHUFFLECOORD." END END REPEAT numtrials 'This part begins the actual simulation SHUFFLECOORD rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 MultR caq1 caq2 caq3 caq4 caq5 caq6 caq7 caq8 caq9 caq10 caq11 caq12 caq13 caq14 caq15 caq16 caq17 caq18 caq19 caq20 caq21 caq22 caq23 caq24 caq25 caq26 caq27 caq28 caq29 caq30 caq31 caq32 caq33 caq34 caq35 caq36 caq37 caq38 caq39 caq40 caq41 caq42 caq43 caq44 caq45 caq46 caq47 caq48 caq49 caq50 caq51 caq52 caq53 caq54 caq55 caq56 caq57 caq58 caq59 caq60 caq61 caq62 caq63 caq64 caq65 caq66 caq67 caq68 caq69 caq70 caq71 caq72 caq73 caq74 caq75 caq76 caq77 caq78 caq79 caq80 caq81 caq82 caq83 caq84 caq85 caq86 caq87 caq88 caq89 caq90 caq91 caq92 caq93 caq94 caq95 caq96 caq97 caq98 caq99 caq100 rbq1 rvec1
'This MultR repeats for each RBQ item (i actually have 64, not 67 but it doesn't matter).
CONCAT rvec1 rvec2 rvec3 rvec4 rvec5 rvec6 rvec7 rvec8 rvec9 rvec10 rvec11 rvec12 rvec13 rvec14 rvec15 rvec16 rvec17 rvec18 rvec19 rvec20 rvec21 rvec22 rvec23 rvec24 rvec25 rvec26 rvec27 rvec28 rvec29 rvec30 rvec31 rvec32 rvec33 rvec34 rvec35 rvec36 rvec37 rvec38 rvec39 rvec40 rvec41 rvec42 rvec43 rvec44 rvec45 rvec46 rvec47 rvec48 rvec49 rvec50 rvec51 rvec52 rvec53 rvec54 rvec55 rvec56 rvec57 rvec58 rvec59 rvec60 rvec61 rvec62 rvec63 rvec64 rsimvecALL 'This command puts all values of r into one vector called rsimvecALL 'This command puts all values of r into one vector called rsimvec ABS rsimvecALL rsimvec_abs 'Turns the values of rs into absolute values COUNT rsimvec_abs >= rtest Sim_Sig MEAN rsimvec_abs meanabsr_sim SCORE meanabsr_sim meanabsr_sims SCORE Sim_Sig Sim_Sigs END MEAN meanabsr_sims meansimmeans STDEV meanabsr_sims SDsimmeans MEAN Sim_Sigs Sim_SigsMean STDEV Sim_Sigs Sim_SigsSD COUNT Sim_Sigs >= Obs_Sig SigChance DIVIDE SigChance numtrials SigChanceProb COUNT meanabsr_sims >= meanabsr_obs test DIVIDE test numtrials prob PERCENTILE meanabsr_sims (95) percentile95 PRINT meansimmeans SDsimmeans prob SigChanceProb Sim_SigsMean Sim_SigsSD percentile95 HISTOGRAM meanabsr_sims HISTOGRAM Sim_Sigs
Thanks again. You are amazing.
Sherman
|
|
-
12-18-2008, 9:06 PM |
-
Random Walker
-
-
-
Joined on 05-15-2006
-
-
Posts 45
-
-
|
Re: Correlating Multiple Sets of Vectors
Sherman,
I think the following will do what you want.
John -------------------------------------
'... Here would go the earlier program that calculated the first vector of correlation constants.
'Then, here's the "rest of the story":
COPY 1000 numtrials 'Set the number of simulations you would like to run
REPEAT numtrials 'This part begins the actual simulation SHUFFLECOORD rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 CONCAT rbq1 rbq2 rbq3 rbq4 rbq5 rbq6 rbq7 rbq8 rbq9 rbq10 rbq11 rbq12 rbq13 rbq14 rbq15 rbq16 rbq17 rbq18 rbq19 rbq20 rbq21 rbq22 rbq23 rbq24 rbq25 rbq26 rbq27 rbq28 rbq29 rbq30 rbq31 rbq32 rbq33 rbq34 rbq35 rbq36 rbq37 rbq38 rbq39 rbq40 rbq41 rbq42 rbq43 rbq44 rbq45 rbq46 rbq47 rbq48 rbq49 rbq50 rbq51 rbq52 rbq53 rbq54 rbq55 rbq56 rbq57 rbq58 rbq59 rbq60 rbq61 rbq62 rbq63 rbq64 rbqVectorShuffled CROSSCORR caqVector rbqVectorShuffled dataSetSize rsimvecALL ABS rsimvecALL rsimvec_abs 'Turns the values of rs into absolute values COUNT rsimvec_abs >= rtest Sim_Sig MEAN rsimvec_abs meanabsr_sim SCORE meanabsr_sim meanabsr_sims SCORE Sim_Sig Sim_Sigs CLEAR rsimvecALL 'Clear the r vector in prep for next go-round END MEAN meanabsr_sims meansimmeans STDEV meanabsr_sims SDsimmeans MEAN Sim_Sigs Sim_SigsMean STDEV Sim_Sigs Sim_SigsSD COUNT Sim_Sigs >= Obs_Sig SigChance DIVIDE SigChance numtrials SigChanceProb COUNT meanabsr_sims >= meanabsr_obs test DIVIDE test numtrials prob PERCENTILE meanabsr_sims (95) percentile95 PRINT meansimmeans SDsimmeans prob SigChanceProb Sim_SigsMean Sim_SigsSD percentile95 HISTOGRAM meanabsr_sims HISTOGRAM Sim_Sigs
|
|
-
12-18-2008, 10:34 PM |
-
Sherman
-
-
-
Joined on 10-23-2008
-
-
Posts 12
-
-
|
Re: Correlating Multiple Sets of Vectors
John,
Thanks again. This is fantastic. Saves me about 7 minutes of run time (from 20 to 13) with about 160 data points.
Do you have any pointers for when I can learn to write these subroutines better? I've tried to figure out what each command is doing in some subroutines your have written (e.g. coordshuf) but I get lost every time.
Thanks a ton.
Sherman
|
|
-
12-19-2008, 9:32 AM |
-
Random Walker
-
-
-
Joined on 05-15-2006
-
-
Posts 45
-
-
|
Re: Correlating Multiple Sets of Vectors
Sherman, Glad I was able to help. Here are some hints to that might help to understand the existing subroutines: - Make sure you know what each individual command is supposed to do. The help docs describe each command in detail with examples of their use.
- The "Special Techniques" section at the end of the tutorial text discusses some of the concepts used in the subroutines including the SHUFFLECOORD subroutine.
- To understand how a specific subroutine works, you can write a short program that uses it or copy an example from the doc and then use the Statistics101 built-in debugger to step through each command in the subroutine and watch how the variables change.
You can always contact me by email, or post here if you have specific questions. Regards, John
|
|
|
|