Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Education Forum  »  Newbie: Telling if two distributions have the same mean
Page 1 of 1    
Author Message
Hugh
Posted: Tue Feb 05, 2008 3:03 am
Guest
Hi,

I'm a bit new to statistics, so please forgive me if this is a
very basic question or if I am posting to the wrong place. (Also, if
I use the wrong terminology, please, forgive that, too)

I have two normal distributions with the same mean and standard
deviation. If I do a T-Test in R I can't tell if the distributions
have the same mean or not. If I run the same test on different
samples from the same normal distribution a thousand times, then the
T-
Test p-value is roughly uniform.

I think this means that while the T-Test is very good at telling
if two samples have a different mean it can't help you decide if they
have the same mean.

This is unfortunate for me because I need to know if the means are
the same. Can any one tell me how I might do that? Also, I really
need to check if n means are all the same - I've tried the anova in R
and it has the same problem.

Thank you very much for you time,

Hugh.
Bruce Weaver
Posted: Tue Feb 05, 2008 4:33 am
Guest
On Feb 5, 8:03 am, Hugh <hughl...@gmail.com> wrote:
Quote:
Hi,

I'm a bit new to statistics, so please forgive me if this is a
very basic question or if I am posting to the wrong place. (Also, if
I use the wrong terminology, please, forgive that, too)

I have two normal distributions with the same mean and standard
deviation. If I do a T-Test in R I can't tell if the distributions
have the same mean or not. If I run the same test on different
samples from the same normal distribution a thousand times, then the
T-
Test p-value is roughly uniform.

I think this means that while the T-Test is very good at telling
if two samples have a different mean it can't help you decide if they
have the same mean.

This is unfortunate for me because I need to know if the means are
the same. Can any one tell me how I might do that? Also, I really
need to check if n means are all the same - I've tried the anova in R
and it has the same problem.

Thank you very much for you time,

Hugh.


Do a search on "equivalence testing" or "bioequivalence".

--
Bruce Weaver
bweaver@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."
Allen McIntosh
Posted: Tue Feb 05, 2008 10:40 am
Guest
Hugh wrote:

Quote:
If
I use the wrong terminology, please, forgive that, too
Your terminology isn't perfect, but the sense of your question is still

clear.
Quote:
...
I think this means that while the T-Test is very good at telling
if two samples have a different mean it can't help you decide if they
have the same mean.
This is unfortunate for me because I need to know if the means are
the same. Can any one tell me how I might do that?

You can never decide for certain that two populations have the same
mean. Consider the following:

Population A: N(0,1)
Population B: N(10^-10,1)

How big a sample do you think you would need to detect this difference?
(In Real Life, considerations of non-normality are going to become
important long before the sample size becomes large enough.)

What the t-test does give you is a probabilistic guarantee that is a
function of the size of the difference. Look up the discussion of
"power" in a good stats textbook.
Hugh
Posted: Tue Feb 05, 2008 11:07 am
Guest
Hi,

Thank you very much for the quick replies, I'm very impressed.
I've been bashing my head against this for a while, so it's fantastic
to find people responding so readily. Thank you :-)

Maybe if I explain a little more about what I'm doing that might
help. Some of your answers above were a bit over my head Smile
I have N distributions (not sure if that's the right word). I can
take values from each to build up samples, but generating values is
extremely expensive. I need to decide which distribution has the
lowest mean with the fewest possible samples.

So, essentially I am racing the distributions, adding values to
the samples from each. Whenever a distribution looks (by a t-test) to
be worse than one of the others then it gets knocked out of the race
(I'm using a bonferroni adjustment in some experiments). The idea is
to keep adding values to the samples from each distribution still in
the race until only one is left.

But, since there are likely to be some distributions that are very
close to each other (or maybe identical) I need to 'drop out' early
when I detect that the samples from the set of distributions still in
the race are nearly equal (to some parameter), otherwise I'd end
growing the samples forever. So, my algorithm can return some number
of the distributions as being 'best or good enough'.

I was initially hoping that the anova would tell me if the
remaining candidates were roughly equal, but as you point out, it
doesn't.

The distributions are roughly normal - after application of an
alpha trim and a box-cox transform.
My probability theory isn't very good - probably :-)

Does any of that help to see what I'm fumbling with?

Cheers,

Hugh.
Herman Rubin
Posted: Tue Feb 05, 2008 2:26 pm
Guest
In article <6dd70f4a-4a08-4cc4-8319-81841aba8764@k2g2000hse.googlegroups.com>,
Hugh <hughleat@gmail.com> wrote:
Quote:
Hi,

I'm a bit new to statistics, so please forgive me if this is a
very basic question or if I am posting to the wrong place. (Also, if
I use the wrong terminology, please, forgive that, too)

I have two normal distributions with the same mean and standard
deviation. If I do a T-Test in R I can't tell if the distributions
have the same mean or not. If I run the same test on different
samples from the same normal distribution a thousand times, then the
T-
Test p-value is roughly uniform.

If the null hypothesis is exactly true, the p-value
should be exactly uniform. There is nothing wrong
with this.

Quote:
I think this means that while the T-Test is very good at telling
if two samples have a different mean it can't help you decide if they
have the same mean.

This is unfortunate for me because I need to know if the means are
the same. Can any one tell me how I might do that? Also, I really
need to check if n means are all the same - I've tried the anova in R
and it has the same problem.

There is no way that this can be decided without a Bayesian
approach. If this is your real problem, you need a VERY
strong reason to believe the means are exactly the same.
In this case, a Bayes approach will have one accept the
means being equal if the sample statistic is O(sqrt(log(n))),
the precise value depending on the form of the loss function
for improper acceptance. If you mean "sufficiently close"
instead of "equal", the problem is more complicated, unless
sufficient close is very close.

Unlike what another poster said, normality is not an overly
important criterion.

How well do you understand probability? This is more important
than a background in statistical methods to understanding
statistics. The essence of statistical decision making is:

It is necessary to consider all consequences of
the proposed action in all states of nature.







--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Phil Holman
Posted: Sun Feb 10, 2008 11:50 am
Guest
"Hugh" <hughleat@gmail.com> wrote in message
news:698f8c44-684e-46e0-9d93-9c77477f83d3@n20g2000hsh.googlegroups.com...
Quote:
Hi,

Thank you very much for the quick replies, I'm very impressed.
I've been bashing my head against this for a while, so it's fantastic
to find people responding so readily. Thank you :-)

Maybe if I explain a little more about what I'm doing that might
help. Some of your answers above were a bit over my head Smile
I have N distributions (not sure if that's the right word). I can
take values from each to build up samples, but generating values is
extremely expensive. I need to decide which distribution has the
lowest mean with the fewest possible samples.

So, essentially I am racing the distributions, adding values to
the samples from each. Whenever a distribution looks (by a t-test) to
be worse than one of the others then it gets knocked out of the race
(I'm using a bonferroni adjustment in some experiments). The idea is
to keep adding values to the samples from each distribution still in
the race until only one is left.

But, since there are likely to be some distributions that are very
close to each other (or maybe identical) I need to 'drop out' early
when I detect that the samples from the set of distributions still in
the race are nearly equal (to some parameter), otherwise I'd end
growing the samples forever. So, my algorithm can return some number
of the distributions as being 'best or good enough'.

I was initially hoping that the anova would tell me if the
remaining candidates were roughly equal, but as you point out, it
doesn't.

The distributions are roughly normal - after application of an
alpha trim and a box-cox transform.
My probability theory isn't very good - probably :-)

Does any of that help to see what I'm fumbling with?

The closer the means, the more values you have to add to get a

statistically significant difference. You could lower the level of
significance but this increases the likelihood of error.

The consequences of an error and the expense of running the test should
be weighed in deciding the amount of resources you want to expend.

Is there something wrong with just selecting the lowest mean?

Phil H
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sun Sep 07, 2008 7:41 am