| |
 |
|
|
Science Forum Index » Statistics - Education Forum » Sorting a correlation matrix with R-statistics
Page 1 of 1
|
| Author |
Message |
| Guest |
Posted: Sat Feb 03, 2007 8:56 am |
|
|
|
|
I am looking for a way to sort my matrix and find an easy way to
locate the highest 25 correlations in a matrix of 1000 by 1000
variables. I want to produce a list of the correlations, from the
strongest correlation to the weakest. E.g. if the variables are called
x1, x2, x3, .., xn, then there might be a list:
var1,var2,corr
x23,x748,0.972
x171,x21,-0.962
x555,x34,0.961
.....
I use R statistics. Can someone please help me with this? Thank you
very very much!
the link to the program: http://www.r-project.org/ |
|
|
| Back to top |
|
| Marc Schwartz |
Posted: Sat Feb 03, 2007 4:44 pm |
|
|
|
Guest
|
Erkki.Komulainen@Helsinki.Fi.INVALID wrote:
Quote: maureeze@gmail.com wrote:
:I am looking for a way to sort my matrix and find an easy way to
:locate the highest 25 correlations in a matrix of 1000 by 1000
:variables. I want to produce a list of the correlations, from the
:strongest correlation to the weakest. E.g. if the variables are called
 1, x2, x3, .., xn, then there might be a list:
I would save the correlation matrix as a file. Its columns would then be
stacked into three colums: col1 = the code of x-variable, col 2 = the
code of y-variable and col 3 = the correlation. I would read this
file into a suitable programme and do the sorting. This can be done with
SPSS and I think that should be possible with similar products.
HTH
Erkki
First to the OP, please post R specific queries to the r-help e-mail
list. Information on that is on the web page you cited in your post.
Second, to the respondent, are you suggesting that R is not an
appropriate application for this simple task? Please.
To provide an example, using the 'swiss' dataset, which is available in R:
Quote: cor(swiss)
Fertility Agriculture Examination Education
Fertility 1.0000000 0.35307918 -0.6458827 -0.66378886
Agriculture 0.3530792 1.00000000 -0.6865422 -0.63952252
Examination -0.6458827 -0.68654221 1.0000000 0.69841530
Education -0.6637889 -0.63952252 0.6984153 1.00000000
Catholic 0.4636847 0.40109505 -0.5727418 -0.15385892
Infant.Mortality 0.4165560 -0.06085861 -0.1140216 -0.09932185
Catholic Infant.Mortality
Fertility 0.4636847 0.41655603
Agriculture 0.4010951 -0.06085861
Examination -0.5727418 -0.11402160
Education -0.1538589 -0.09932185
Catholic 1.0000000 0.17549591
Infant.Mortality 0.1754959 1.00000000
# Create a 3 column data frame from the results
Quote: DF <- as.data.frame.table(cor(swiss))
DF
Var1 Var2 Freq
1 Fertility Fertility 1.00000000
2 Agriculture Fertility 0.35307918
3 Examination Fertility -0.64588271
4 Education Fertility -0.66378886
5 Catholic Fertility 0.46368470
6 Infant.Mortality Fertility 0.41655603
7 Fertility Agriculture 0.35307918
8 Agriculture Agriculture 1.00000000
9 Examination Agriculture -0.68654221
10 Education Agriculture -0.63952252
11 Catholic Agriculture 0.40109505
12 Infant.Mortality Agriculture -0.06085861
13 Fertility Examination -0.64588271
14 Agriculture Examination -0.68654221
15 Examination Examination 1.00000000
16 Education Examination 0.69841530
17 Catholic Examination -0.57274181
18 Infant.Mortality Examination -0.11402160
19 Fertility Education -0.66378886
20 Agriculture Education -0.63952252
21 Examination Education 0.69841530
22 Education Education 1.00000000
23 Catholic Education -0.15385892
24 Infant.Mortality Education -0.09932185
25 Fertility Catholic 0.46368470
26 Agriculture Catholic 0.40109505
27 Examination Catholic -0.57274181
28 Education Catholic -0.15385892
29 Catholic Catholic 1.00000000
30 Infant.Mortality Catholic 0.17549591
31 Fertility Infant.Mortality 0.41655603
32 Agriculture Infant.Mortality -0.06085861
33 Examination Infant.Mortality -0.11402160
34 Education Infant.Mortality -0.09932185
35 Catholic Infant.Mortality 0.17549591
36 Infant.Mortality Infant.Mortality 1.00000000
# Now sort the above in decreasing order
# of the correlation coefficient
Quote: with(DF[order(Freq, decreasing = TRUE), ]
Var1 Var2 Freq
1 Fertility Fertility 1.00000000
8 Agriculture Agriculture 1.00000000
15 Examination Examination 1.00000000
22 Education Education 1.00000000
29 Catholic Catholic 1.00000000
36 Infant.Mortality Infant.Mortality 1.00000000
16 Education Examination 0.69841530
21 Examination Education 0.69841530
5 Catholic Fertility 0.46368470
25 Fertility Catholic 0.46368470
6 Infant.Mortality Fertility 0.41655603
31 Fertility Infant.Mortality 0.41655603
11 Catholic Agriculture 0.40109505
26 Agriculture Catholic 0.40109505
2 Agriculture Fertility 0.35307918
7 Fertility Agriculture 0.35307918
30 Infant.Mortality Catholic 0.17549591
35 Catholic Infant.Mortality 0.17549591
12 Infant.Mortality Agriculture -0.06085861
32 Agriculture Infant.Mortality -0.06085861
24 Infant.Mortality Education -0.09932185
34 Education Infant.Mortality -0.09932185
18 Infant.Mortality Examination -0.11402160
33 Examination Infant.Mortality -0.11402160
23 Catholic Education -0.15385892
28 Education Catholic -0.15385892
17 Catholic Examination -0.57274181
27 Examination Catholic -0.57274181
10 Education Agriculture -0.63952252
20 Agriculture Education -0.63952252
3 Examination Fertility -0.64588271
13 Fertility Examination -0.64588271
4 Education Fertility -0.66378886
19 Fertility Education -0.66378886
9 Examination Agriculture -0.68654221
14 Agriculture Examination -0.68654221
# Now, just take the first 25 rows
Quote: with(DF[order(Freq, decreasing = TRUE)[1:25], ]
Var1 Var2 Freq
1 Fertility Fertility 1.00000000
8 Agriculture Agriculture 1.00000000
15 Examination Examination 1.00000000
22 Education Education 1.00000000
29 Catholic Catholic 1.00000000
36 Infant.Mortality Infant.Mortality 1.00000000
16 Education Examination 0.69841530
21 Examination Education 0.69841530
5 Catholic Fertility 0.46368470
25 Fertility Catholic 0.46368470
6 Infant.Mortality Fertility 0.41655603
31 Fertility Infant.Mortality 0.41655603
11 Catholic Agriculture 0.40109505
26 Agriculture Catholic 0.40109505
2 Agriculture Fertility 0.35307918
7 Fertility Agriculture 0.35307918
30 Infant.Mortality Catholic 0.17549591
35 Catholic Infant.Mortality 0.17549591
12 Infant.Mortality Agriculture -0.06085861
32 Agriculture Infant.Mortality -0.06085861
24 Infant.Mortality Education -0.09932185
34 Education Infant.Mortality -0.09932185
18 Infant.Mortality Examination -0.11402160
33 Examination Infant.Mortality -0.11402160
23 Catholic Education -0.15385892
See ?as.data.frame.table, ?cor, ?order and ?with
Spend some time reading "An Introduction to R", which is available with
your installation or from the main R web site.
HTH,
Marc Schwartz |
|
|
| Back to top |
|
| Marc Schwartz |
Posted: Sat Feb 03, 2007 5:50 pm |
|
|
|
Guest
|
Just a quick follow up to my own post. It looks like during the copy and
paste of code and output, some errors occurred. So here is just the
correct code and the final result. Also, for the sake of details, the
third column in the initial data transformation by as.data.frame.table()
is called 'Freq' by default. In the code below, I will rename it to
'Corr' for contextual consistency.
DF <- as.data.frame.table(cor(swiss), responseName = "Corr")
Top25 <- with(DF, DF[order(Corr, decreasing = TRUE)[1:25], ])
Quote: Top25
Var1 Var2 Corr
1 Fertility Fertility 1.00000000
8 Agriculture Agriculture 1.00000000
15 Examination Examination 1.00000000
22 Education Education 1.00000000
29 Catholic Catholic 1.00000000
36 Infant.Mortality Infant.Mortality 1.00000000
16 Education Examination 0.69841530
21 Examination Education 0.69841530
5 Catholic Fertility 0.46368470
25 Fertility Catholic 0.46368470
6 Infant.Mortality Fertility 0.41655603
31 Fertility Infant.Mortality 0.41655603
11 Catholic Agriculture 0.40109505
26 Agriculture Catholic 0.40109505
2 Agriculture Fertility 0.35307918
7 Fertility Agriculture 0.35307918
30 Infant.Mortality Catholic 0.17549591
35 Catholic Infant.Mortality 0.17549591
12 Infant.Mortality Agriculture -0.06085861
32 Agriculture Infant.Mortality -0.06085861
24 Infant.Mortality Education -0.09932185
34 Education Infant.Mortality -0.09932185
18 Infant.Mortality Examination -0.11402160
33 Examination Infant.Mortality -0.11402160
23 Catholic Education -0.15385892
Apologies for the error.
Marc Schwartz |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Wed Dec 03, 2008 9:29 pm
|
|