 |
|
| Science Forum Index » Statistics - Math Forum » Spearman Rank Correlation Discrepancy... |
|
Page 1 of 1 |
|
| Author |
Message |
| Graham Ashe... |
Posted: Fri Oct 23, 2009 11:29 pm |
|
|
|
Guest
|
I found a discrepancy between my own calculations and the following two sites when calculating the Spearman rank correlation coefficient for certain sample pairs.
http://www.wessa.net/rankcorr.wasp
http://faculty.vassar.edu/lowry/corr_rank.html
I read how to calculate the coefficient on those sites and also on Wikipedia here:
http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
The explanations were consistent and I do get the same rho as those sites for small sample pairs (like the n=10 example featured on the Wikipedia page). However, for larger sample pairs (like the n=80 one below) - when using either the formula assuming tied ranks or even the formula not assuming tied ranks - I get something a bit different.
For the sample pair below, I get a rho of 0.82529 in my calculations whereas on those two other sites, the result is approximately 0.779.
My main question therefore, is why are the results different? Are those sites doing something a bit different in practice than they say in theory? Are the formulas they explain only applicable to smaller sample sizes? I'm asking because I want to make sure the results I'm using for my research are consistent, correct and as precise as can be. Thanks.
0.148 4.9
0.291 4.3
0.362 5.3
0.737 5.7
1.247 7.6
0.5 5.2
1 5.4
0.148 4.2
0.219 5.3
0.148 5
0.5 4.9
0.958 6.2
0.148 4
0.786 5.8
0.719 5.5
0.308 5.1
0.505 5.4
0.148 4.2
0.5 5.7
0.496 5.8
0.286 4.9
0.571 5.7
0.987 6.4
0.148 4.8
0.484 5.4
0.714 5.9
0.286 6
0.505 6.1
0.348 4.9
0.286 4.8
0.286 4.8
0.571 6.5
0.571 6
2 8.1
0.665 6.8
1.143 7.1
1.514 6.4
1.501 6.8
1.604 7.7
2.529 7.2
0.308 5.4
1 5.8
0.286 4.5
0.286 2.8
0.5 6
0.5 5.7
0.5 6.2
0.148 4.4
0.148 5.7
0.357 6.7
0.148 6.2
0.308 4.9
0.148 1.9
0.308 5.2
1 6.1
0.148 3.5
1 6.4
0.308 6.4
0.148 2.9
0.308 4.5
0.148 3.9
0.308 5.2
0.148 4.2
0.148 5.3
0.148 3.1
0.148 5.5
0.434 5.8
0.148 5.6
0.148 4.5
0.148 4.7
1.143 6.8
0.148 3.5
1.143 7.1
0.219 5.3
0.148 3.2
1 5.5
0.148 3.8
0.286 4
0.286 6.1
1.143 6.2 |
|
|
| Back to top |
|
|
|
| Graham Ashe... |
Posted: Sat Oct 24, 2009 6:32 am |
|
|
|
Guest
|
[quote]Hand calculation is a great learning tool. Hand
formulae are not
necessarily the best in numerical analysis terms.
However, with the
goals you mention above, you should consider using a
stat package.
[/quote]
I considered that, but I need to incorporate the Spearman correlation into a program of my own for my research purposes. I can't "link" to a stat package for that. This is why I need to know exactly how the Spearman correlation calculations work so I can write a function of it for my program. The one I've already written (based on all the available literature I've surveyed) gives me something that - for bigger samples, at least - is different from what the packages give. Perhaps I'm the first person to have done so and the first to have discovered this discrepancy. Who does this stuff "manually" these days anyway?
[quote]For your data SPSS gives .779. I suggest you look at
the algorithms that
come with it. IIRC they are online, if you don't have
access to them
send me an email and I'll send them.
[/quote]
Okay, but I doubt it'll be any different from what all the other sites I've been to have consistently explained about how to calculate the Spearman correlation.
[quote]I suggest that you look at the boxplots, histograms,
and scatterplots
for your data.
[/quote]
Why?
[quote]Are you using Spearman because your first variable is
skewed and the
second is normalish?
[/quote]
No, I'm using it because the type of data I'm trying to correlate is ordinal. |
|
|
| Back to top |
|
|
|
| Art Kendall... |
Posted: Sat Oct 24, 2009 8:11 am |
|
|
|
Guest
|
[quote]
I'm asking because I want to make sure the results I'm using for my research are consistent, correct and as precise as can be
Hand calculation is a great learning tool. Hand formulae are not[/quote]
necessarily the best in numerical analysis terms. However, with the
goals you mention above, you should consider using a stat package.
For your data SPSS gives .779. I suggest you look at the algorithms that
come with it. IIRC they are online, if you don't have access to them
send me an email and I'll send them.
I suggest that you look at the boxplots, histograms, and scatterplots
for your data.
Are you using Spearman because your first variable is skewed and the
second is normalish?
Art Kendall
Social Research Consultants
Graham Ashe wrote:
[quote]I found a discrepancy between my own calculations and the following two sites when calculating the Spearman rank correlation coefficient for certain sample pairs.
http://www.wessa.net/rankcorr.wasp
http://faculty.vassar.edu/lowry/corr_rank.html
I read how to calculate the coefficient on those sites and also on Wikipedia here:
http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
The explanations were consistent and I do get the same rho as those sites for small sample pairs (like the n=10 example featured on the Wikipedia page). However, for larger sample pairs (like the n=80 one below) - when using either the formula assuming tied ranks or even the formula not assuming tied ranks - I get something a bit different.
For the sample pair below, I get a rho of 0.82529 in my calculations whereas on those two other sites, the result is approximately 0.779.
My main question therefore, is why are the results different? Are those sites doing something a bit different in practice than they say in theory? Are the formulas they explain only applicable to smaller sample sizes? I'm asking because I want to make sure the results I'm using for my research are consistent, correct and as precise as can be. Thanks.
0.148 4.9
0.291 4.3
0.362 5.3
0.737 5.7
1.247 7.6
0.5 5.2
1 5.4
0.148 4.2
0.219 5.3
0.148 5
0.5 4.9
0.958 6.2
0.148 4
0.786 5.8
0.719 5.5
0.308 5.1
0.505 5.4
0.148 4.2
0.5 5.7
0.496 5.8
0.286 4.9
0.571 5.7
0.987 6.4
0.148 4.8
0.484 5.4
0.714 5.9
0.286 6
0.505 6.1
0.348 4.9
0.286 4.8
0.286 4.8
0.571 6.5
0.571 6
2 8.1
0.665 6.8
1.143 7.1
1.514 6.4
1.501 6.8
1.604 7.7
2.529 7.2
0.308 5.4
1 5.8
0.286 4.5
0.286 2.8
0.5 6
0.5 5.7
0.5 6.2
0.148 4.4
0.148 5.7
0.357 6.7
0.148 6.2
0.308 4.9
0.148 1.9
0.308 5.2
1 6.1
0.148 3.5
1 6.4
0.308 6.4
0.148 2.9
0.308 4.5
0.148 3.9
0.308 5.2
0.148 4.2
0.148 5.3
0.148 3.1
0.148 5.5
0.434 5.8
0.148 5.6
0.148 4.5
0.148 4.7
1.143 6.8
0.148 3.5
1.143 7.1
0.219 5.3
0.148 3.2
1 5.5
0.148 3.8
0.286 4
0.286 6.1
1.143 6.2
[/quote] |
|
|
| Back to top |
|
|
|
| Ray Koopman... |
Posted: Sat Oct 24, 2009 10:10 am |
|
|
|
Guest
|
On Oct 24, 2:29 am, Graham Ashe <knight_arm... at (no spam) yahoo.com> wrote:
[quote]I found a discrepancy between my own calculations and the following two sites when calculating the Spearman rank correlation coefficient for certain sample pairs.
http://www.wessa.net/rankcorr.wasphttp://faculty.vassar.edu/lowry/corr_rank.html
I read how to calculate the coefficient on those sites and also on Wikipedia here:
http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
The explanations were consistent and I do get the same rho as those sites for small sample pairs (like the n=10 example featured on the Wikipedia page). However, for larger sample pairs (like the n=80 one below) - when using either the formula assuming tied ranks or even the formula not assuming tied ranks - I get something a bit different.
For the sample pair below, I get a rho of 0.82529 in my calculations whereas on those two other sites, the result is approximately 0.779.
My main question therefore, is why are the results different? Are those sites doing something a bit different in practice than they say in theory? Are the formulas they explain only applicable to smaller sample sizes? I'm asking because I want to make sure the results I'm using for my research are consistent, correct and as precise as can be. Thanks.
0.148 4.9
0.291 4.3
0.362 5.3
0.737 5.7
1.247 7.6
0.5 5.2
1 5.4
0.148 4.2
0.219 5.3
0.148 5
0.5 4.9
0.958 6.2
0.148 4
0.786 5.8
0.719 5.5
0.308 5.1
0.505 5.4
0.148 4.2
0.5 5.7
0.496 5.8
0.286 4.9
0.571 5.7
0.987 6.4
0.148 4.8
0.484 5.4
0.714 5.9
0.286 6
0.505 6.1
0.348 4.9
0.286 4.8
0.286 4.8
0.571 6.5
0.571 6
2 8.1
0.665 6.8
1.143 7.1
1.514 6.4
1.501 6.8
1.604 7.7
2.529 7.2
0.308 5.4
1 5.8
0.286 4.5
0.286 2.8
0.5 6
0.5 5.7
0.5 6.2
0.148 4.4
0.148 5.7
0.357 6.7
0.148 6.2
0.308 4.9
0.148 1.9
0.308 5.2
1 6.1
0.148 3.5
1 6.4
0.308 6.4
0.148 2.9
0.308 4.5
0.148 3.9
0.308 5.2
0.148 4.2
0.148 5.3
0.148 3.1
0.148 5.5
0.434 5.8
0.148 5.6
0.148 4.5
0.148 4.7
1.143 6.8
0.148 3.5
1.143 7.1
0.219 5.3
0.148 3.2
1 5.5
0.148 3.8
0.286 4
0.286 6.1
1.143 6.2
[/quote]
OK, let's try some diagnostics.
Are these the values you're using for ranks?
x rank y rank
0.148 12 4.9 26
0.291 34 4.3 15
0.362 44 5.3 35.5
0.737 62 5.7 48
1.247 75 7.6 78
0.5 50.5 5.2 32
1 68 5.4 39.5
0.148 12 4.2 13
0.219 24.5 5.3 35.5
0.148 12 5 29
0.5 50.5 4.9 26
0.958 64 6.2 63.5
0.148 12 4 10.5
0.786 63 5.8 52.5
0.719 61 5.5 43
0.308 38 5.1 30
0.505 54.5 5.4 39.5
0.148 12 4.2 13
0.5 50.5 5.7 48
0.496 47 5.8 52.5
0.286 29.5 4.9 26
0.571 57 5.7 48
0.987 65 6.4 67.5
0.148 12 4.8 22
0.484 46 5.4 39.5
0.714 60 5.9 55
0.286 29.5 6 57
0.505 54.5 6.1 60
0.348 42 4.9 26
0.286 29.5 4.8 22
0.286 29.5 4.8 22
0.571 57 6.5 70
0.571 57 6 57
2 79 8.1 80
0.665 59 6.8 73
1.143 72.5 7.1 75.5
1.514 77 6.4 67.5
1.501 76 6.8 73
1.604 78 7.7 79
2.529 80 7.2 77
0.308 38 5.4 39.5
1 68 5.8 52.5
0.286 29.5 4.5 18
0.286 29.5 2.8 2
0.5 50.5 6 57
0.5 50.5 5.7 48
0.5 50.5 6.2 63.5
0.148 12 4.4 16
0.148 12 5.7 48
0.357 43 6.7 71
0.148 12 6.2 63.5
0.308 38 4.9 26
0.148 12 1.9 1
0.308 38 5.2 32
1 68 6.1 60
0.148 12 3.5 6.5
1 68 6.4 67.5
0.308 38 6.4 67.5
0.148 12 2.9 3
0.308 38 4.5 18
0.148 12 3.9 9
0.308 38 5.2 32
0.148 12 4.2 13
0.148 12 5.3 35.5
0.148 12 3.1 4
0.148 12 5.5 43
0.434 45 5.8 52.5
0.148 12 5.6 45
0.148 12 4.5 18
0.148 12 4.7 20
1.143 72.5 6.8 73
0.148 12 3.5 6.5
1.143 72.5 7.1 75.5
0.219 24.5 5.3 35.5
0.148 12 3.2 5
1 68 5.5 43
0.148 12 3.8 8
0.286 29.5 4 10.5
0.286 29.5 6.1 60
1.143 72.5 6.2 63.5 |
|
|
| Back to top |
|
|
|
| Art Kendall... |
Posted: Sat Oct 24, 2009 11:10 am |
|
|
|
Guest
|
[quote]
No, I'm using it because the type of data I'm trying to correlate is ordinal.
Oh, I was looking at the data in your example.[/quote]
If you are writing your own code, then you really should look at the
algorithms used in major packages. SPSS makes all of their algorithms
available. You might check what other packages do. The big ones all
use many kinds of checks and methods to be sure procedures are stable.
Art Kendall
Graham Ashe wrote:
[quote]Hand calculation is a great learning tool. Hand
formulae are not
necessarily the best in numerical analysis terms.
However, with the
goals you mention above, you should consider using a
stat package.
I considered that, but I need to incorporate the Spearman correlation into a program of my own for my research purposes. I can't "link" to a stat package for that. This is why I need to know exactly how the Spearman correlation calculations work so I can write a function of it for my program. The one I've already written (based on all the available literature I've surveyed) gives me something that - for bigger samples, at least - is different from what the packages give. Perhaps I'm the first person to have done so and the first to have discovered this discrepancy. Who does this stuff "manually" these days anyway?
For your data SPSS gives .779. I suggest you look at
the algorithms that
come with it. IIRC they are online, if you don't have
access to them
send me an email and I'll send them.
Okay, but I doubt it'll be any different from what all the other sites I've been to have consistently explained about how to calculate the Spearman correlation.
I suggest that you look at the boxplots, histograms,
and scatterplots
for your data.
Why?
Are you using Spearman because your first variable is
skewed and the
second is normalish?
No, I'm using it because the type of data I'm trying to correlate is ordinal.
[/quote] |
|
|
| Back to top |
|
|
|
| Graham Ashe... |
Posted: Sat Oct 24, 2009 10:14 pm |
|
|
|
Guest
|
[quote]OK, let's try some diagnostics.
Are these the values you're using for ranks?
[/quote]
It turns out I misunderstood some of the instructions. I get the exact values now. One should also use the formula that assumes tied ranks. |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Fri Dec 11, 2009 2:47 am
|
|