Main Page | Report this Page
Science Forum Index  »  Space - Consult Forum  »  Is there a mininum sample size for normality tests?...
Page 1 of 1    

Is there a mininum sample size for normality tests?...

Author Message
Peter Frank...
Posted: Thu Oct 01, 2009 10:18 am
Guest
Hello,

I know there are several statistical tests for testing normality, i.e.
whether data are normally distributed so that parametric tests can be
used.

However, I wonder whether there is a minimum sample size (n number)
for these tests. I can hardly imagine that any test can judge
normality from just 3, 4 or 5 samples.
I couldn't find any information on the Web or in my statistics books
about a minimum sample size (i.e. a statement like n should be greater
than some number in order to be able to reliably use this test).

Can you tell me whether it makes sense to use normality tests with
very small sample sizes? If not, what would you consider a minimum
sample size (just roughly)?

Regards,
Peter
 
Bruce Weaver...
Posted: Tue Dec 01, 2009 9:30 am
Guest
On Dec 1, 1:22 pm, Peter Frank <peter_fran... at (no spam) yahoo.de> wrote:
[quote]Bruce Weaver <bwea... at (no spam) lakeheadu.ca> wrote:
On Oct 1, 12:18 pm, Peter Frank <peter_fran... at (no spam) yahoo.de> wrote:
Hello,

I know there are several statistical tests for testing normality, i.e.
whether data are normally distributed so that parametric tests can be
used.

However, I wonder whether there is a minimum sample size (n number)
for these tests. I can hardly imagine that any test can judge
normality from just 3, 4 or 5 samples.
I couldn't find any information on the Web or in my statistics books
about a minimum sample size (i.e. a statement like n should be greater
than some number in order to be able to reliably use this test).

Can you tell me whether it makes sense to use normality tests with
very small sample sizes? If not, what would you consider a minimum
sample size (just roughly)?

Regards,
Peter

Using a test of normality to decide if you can proceed with a
parametric test is a horrible idea.  Why?  Because...

1. The assumption that one has sampled from normally distributed
populations is most important when sample sizes are small.  As sample
sizes get larger, the sampling distribution of the mean (or of the
difference between two means, etc) converges on the normal, so the
parametric test becomes robust to non-normality of the populations.
(Look up "central limit theorem".)

2. Tests of normality, like other statistical tests, have low power
when sample sizes are small, and high power when sample sizes are
large.

Putting 1 and 2 together, your test of normality has no power to
detect important departures from normality when sample sizes are
small.  And it has far too much power when sample sizes are larger--
i.e., it throws up the red flag warning you of non-normality for
departures from normality that don't matter, given the sample sizes.

I think a much better "test" of whether you can use a parametric test
is to simply ask yourself if it is fair and honest (i.e., not
misleading) to use means and standard deviations descriptively.  (I
think it was Rich Ulrich, a regular in these groups, who made this
suggestion, once upon a time.)

Does this apply to the sample data themselves or to the population
this sample comes from? If by looking at the sample data alone, I
would for example come to the conclusion that means and standard
deviations are not good measurements to describe the data, should I
then use a non-parametric test?

What if the sample data (with a small N per group, e.g. N=15) are
clearly not normally distributed, but I know that the parameter
measure is normally distributed in larger populations, should I then
use a parametric or a non-parametric test?

Peter

P.S.: Sorry for replying to such an old thread but I thought it fits
in here.
[/quote]
Bland & Altman say something about this in their recent note on
"analysis of continuous data from small samples". Here's the link:

http://www.bmj.com/cgi/content/full/338/apr06_1/a3166

Notice the final paragraph, which says:

"The aversion to parametric methods for small samples may arise from
the inability to assess the distribution shape when there are so few
observations. How can we tell whether data follow a normal
distribution if we have only a few observations? The answer is that we
have not only the data to be analysed, but usually also experience of
other sets of measurements of the same thing. In addition, general
experience tells us that body size measurements are usually
approximately normal, as are the logarithms of many blood
concentrations and the square roots of counts."

--
Bruce Weaver
bweaver at (no spam) lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."
 
Peter Frank...
Posted: Tue Dec 01, 2009 1:22 pm
Guest
Bruce Weaver <bweaver at (no spam) lakeheadu.ca> wrote:

[quote]On Oct 1, 12:18 pm, Peter Frank <peter_fran... at (no spam) yahoo.de> wrote:
Hello,

I know there are several statistical tests for testing normality, i.e.
whether data are normally distributed so that parametric tests can be
used.

However, I wonder whether there is a minimum sample size (n number)
for these tests. I can hardly imagine that any test can judge
normality from just 3, 4 or 5 samples.
I couldn't find any information on the Web or in my statistics books
about a minimum sample size (i.e. a statement like n should be greater
than some number in order to be able to reliably use this test).

Can you tell me whether it makes sense to use normality tests with
very small sample sizes? If not, what would you consider a minimum
sample size (just roughly)?

Regards,
Peter


Using a test of normality to decide if you can proceed with a
parametric test is a horrible idea. Why? Because...

1. The assumption that one has sampled from normally distributed
populations is most important when sample sizes are small. As sample
sizes get larger, the sampling distribution of the mean (or of the
difference between two means, etc) converges on the normal, so the
parametric test becomes robust to non-normality of the populations.
(Look up "central limit theorem".)

2. Tests of normality, like other statistical tests, have low power
when sample sizes are small, and high power when sample sizes are
large.

Putting 1 and 2 together, your test of normality has no power to
detect important departures from normality when sample sizes are
small. And it has far too much power when sample sizes are larger--
i.e., it throws up the red flag warning you of non-normality for
departures from normality that don't matter, given the sample sizes.

I think a much better "test" of whether you can use a parametric test
is to simply ask yourself if it is fair and honest (i.e., not
misleading) to use means and standard deviations descriptively. (I
think it was Rich Ulrich, a regular in these groups, who made this
suggestion, once upon a time.)
[/quote]
Does this apply to the sample data themselves or to the population
this sample comes from? If by looking at the sample data alone, I
would for example come to the conclusion that means and standard
deviations are not good measurements to describe the data, should I
then use a non-parametric test?

What if the sample data (with a small N per group, e.g. N=15) are
clearly not normally distributed, but I know that the parameter
measure is normally distributed in larger populations, should I then
use a parametric or a non-parametric test?

Peter

P.S.: Sorry for replying to such an old thread but I thought it fits
in here.
 
Peter Frank...
Posted: Tue Dec 01, 2009 5:44 pm
Guest
Bruce Weaver wrote:

[quote]On Dec 1, 1:22 pm, Peter Frank <peter_fran... at (no spam) yahoo.de> wrote:
Bruce Weaver <bwea... at (no spam) lakeheadu.ca> wrote:
On Oct 1, 12:18 pm, Peter Frank <peter_fran... at (no spam) yahoo.de> wrote:
Hello,

I know there are several statistical tests for testing normality, i.e.
whether data are normally distributed so that parametric tests can be
used.

However, I wonder whether there is a minimum sample size (n number)
for these tests. I can hardly imagine that any test can judge
normality from just 3, 4 or 5 samples.
I couldn't find any information on the Web or in my statistics books
about a minimum sample size (i.e. a statement like n should be greater
than some number in order to be able to reliably use this test).

Can you tell me whether it makes sense to use normality tests with
very small sample sizes? If not, what would you consider a minimum
sample size (just roughly)?

Regards,
Peter

Using a test of normality to decide if you can proceed with a
parametric test is a horrible idea.  Why?  Because...

1. The assumption that one has sampled from normally distributed
populations is most important when sample sizes are small.  As sample
sizes get larger, the sampling distribution of the mean (or of the
difference between two means, etc) converges on the normal, so the
parametric test becomes robust to non-normality of the populations.
(Look up "central limit theorem".)

2. Tests of normality, like other statistical tests, have low power
when sample sizes are small, and high power when sample sizes are
large.

Putting 1 and 2 together, your test of normality has no power to
detect important departures from normality when sample sizes are
small.  And it has far too much power when sample sizes are larger--
i.e., it throws up the red flag warning you of non-normality for
departures from normality that don't matter, given the sample sizes.

I think a much better "test" of whether you can use a parametric test
is to simply ask yourself if it is fair and honest (i.e., not
misleading) to use means and standard deviations descriptively.  (I
think it was Rich Ulrich, a regular in these groups, who made this
suggestion, once upon a time.)

Does this apply to the sample data themselves or to the population
this sample comes from? If by looking at the sample data alone, I
would for example come to the conclusion that means and standard
deviations are not good measurements to describe the data, should I
then use a non-parametric test?

What if the sample data (with a small N per group, e.g. N=15) are
clearly not normally distributed, but I know that the parameter
measured is normally distributed in larger populations, should I then
use a parametric or a non-parametric test?

Peter

P.S.: Sorry for replying to such an old thread but I thought it fits
in here.

Bland & Altman say something about this in their recent note on
"analysis of continuous data from small samples". Here's the link:

http://www.bmj.com/cgi/content/full/338/apr06_1/a3166

Notice the final paragraph, which says:

"The aversion to parametric methods for small samples may arise from
the inability to assess the distribution shape when there are so few
observations. How can we tell whether data follow a normal
distribution if we have only a few observations? The answer is that we
have not only the data to be analysed, but usually also experience of
other sets of measurements of the same thing. In addition, general
experience tells us that body size measurements are usually
approximately normal, as are the logarithms of many blood
concentrations and the square roots of counts."
[/quote]
Thanks a lot for this link. I am actually asking from a biomedical
perspective, so this information definitely helps. I just wasn't sure
whether it was legitimate to base the selection of a statistical test
on scientific experience or knowledge gained from similar measurements
rather than mere statistical criteria.

It seems like small sample sizes are actually quite a problem, as they
do occur in biological and medical sciences.

I later found another reference on the web (written by Harvey
Motulsky) that also addresses this problem:

"CHOOSING BETWEEN PARAMETRIC AND NONPARAMETRIC TESTS: THE HARD CASES

It is not always easy to decide whether a sample comes from a Gaussian
population. Consider these points:

* If you collect many data points (over a hundred or so), you
can look at the distribution of data and it will be fairly obvious
whether the distribution is approximately bell shaped. A formal
statistical test (Kolmogorov-Smirnoff test, not explained in this
book) can be used to test whether the distribution of the data differs
significantly from a Gaussian distribution. With few data points, it
is difficult to tell whether the data are Gaussian by inspection, and
the formal test has little power to discriminate between Gaussian and
non-Gaussian distributions.
* You should look at previous data as well. Remember, what
matters is the distribution of the overall population, not the
distribution of your sample. In deciding whether a population is
Gaussian, look at all available data, not just data in the current
experiment.
* Consider the source of scatter. When the scatter comes from
the sum of numerous sources (with no one source contributing most of
the scatter), you expect to find a roughly Gaussian distribution.
When in doubt, some people choose a parametric test (because
they aren't sure the Gaussian assumption is violated), and others
choose a nonparametric test (because they aren't sure the Gaussian
assumption is met)."

The second point has pretty much the same statement as the final
paragraph by Bland & Altman.

Regards,
Peter
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Thu Dec 10, 2009 2:46 am