Main Page | Report this Page
 
   
Science Forum Index  »  Space - Consult Forum  »  Multiple tests correction and combining p-value.
Page 1 of 1    
Author Message
bub
Posted: Tue Jan 02, 2007 9:53 pm
Guest
As a non-statistician I have recently use correlation tests, and I am a
bit confuse about the need for me to use multiple test correction for
control increase in type I errors and/or combining p-value. The
following question probably show how weak I am in stats, but I have not
really find yet a clear literature (for me at least!) about the choice
to use or not these tools.

I have recently study two correlation between gene content in each gene
from a gene family "GF" (7 genes) and a biological trait, say trait
A. The 7 genes are expected to show a high level of dependency (i.e. If
there are no measurement error or saturation effect, the background
theory tell us that all the genes are expected to show similar patterns
of evolution, so if we observe a correlation between gene content in
gene 1 and trait A for example, we expect strongly to see this
correlation in the 6 remaining genes).

I have used non-parametric correlation test (Spearman) in order to
explore the "hypothesis" (correlation between gene content and trait
A). I am not really interested by observing or not a relationship in
one particular gene, but by what the 7 genes together could tell me. In
fact this remark highlight the concept that I really misunderstand:
"hypothesis". Indeed, I wonder whether I test:
1/ seven different (but dependent) hypotheses:
"Null hypothese 1: no correlation between gene content in gene 1 and
trait A"
"Null hypothese 2: no correlation between gene content in gene 2 and
trait A"
....
"Null hypothese 7: no correlation between gene content in gene 7 and
trait A"

2/ OR if I test only one global hypothesis:
"Null hypothese: no correlation between gene content in the gene
family "GF" and trait A"
And the seven genes correspond to 7 sub-samples with which the
relationship is tested.

First I don't really know in which case I am, do I test several
hypothesis or do I test only one hypothesis? If you could help me to
understand this basic thing I guess it will be really helpful to me!!
If I am in the first case, I guess that I have to use multiple
correction in order to control increase of type I error. But could I
try to combine p-value, or combining methods are dedicated only to
combine outputs of multiple test of the same hypothesis?

Second, if I am in case 1/ (p-value adjustement for several hypotheses
tested), I know that some methods deal with dependency (Holm;
Benjamini-Hochberg ...) but does these methods are able to deal with
very high level of dependency as it is in my case? Is this high level
of dependency is an argument to not use mutiple tests, or to interpret
the results very carrefuly?

Third, if I am in case 2/ (combining probability from "dependent"
tests of the same hypothesis). I know that classical method (as
Fisher's one) don't deal with dependency, but I know also that some
method try to deal with dependency. But in this last case, I guess I
have to know the structure of the correlation between the tests, or to
estimate it from the data or by resampling methods. But in my case,
despite that I know that the dependency is very strong between genes in
this family, I have no accurate idea about the correlation structure
between these tests. Despite the small number of test in my study (7),
is it possible to study this correlation structure directly from my
data (and independently of the background theory about these genes) or
by resampling?

Thanks for the help!
Anon.
Posted: Wed Jan 03, 2007 5:08 am
Guest
bub wrote:
Quote:
As a non-statistician I have recently use correlation tests, and I am a
bit confuse about the need for me to use multiple test correction for
control increase in type I errors and/or combining p-value. The
following question probably show how weak I am in stats, but I have not
really find yet a clear literature (for me at least!) about the choice
to use or not these tools.

I have recently study two correlation between gene content in each gene
from a gene family "GF" (7 genes) and a biological trait, say trait
A. The 7 genes are expected to show a high level of dependency (i.e. If
there are no measurement error or saturation effect, the background
theory tell us that all the genes are expected to show similar patterns
of evolution, so if we observe a correlation between gene content in
gene 1 and trait A for example, we expect strongly to see this
correlation in the 6 remaining genes).

I'm not sure what you mean by gene content. Are you just saying that

different genes lead to different phenotypes? Or are you talking about
an effect of copy number?

Quote:
I have used non-parametric correlation test (Spearman) in order to
explore the "hypothesis" (correlation between gene content and trait
A). I am not really interested by observing or not a relationship in
one particular gene, but by what the 7 genes together could tell me. In
fact this remark highlight the concept that I really misunderstand:
"hypothesis". Indeed, I wonder whether I test:

I'm not sure I would test anything. If the background theory says that
all of the genes have an effect, then it might not make any sense to
test whether there is an effect: you could just be testing whether you
have enough data to detect the effect!

I assume you have a decent amount of data, so you could just do an
ANOVA, and report the effect sizes: the larger the effect size, the more
important the allele is. This gets round the whole multiple testing
problem, and also gives you information that should be more interesting:
whether most of the variation is caused by one or two genes, for
example, or if they are all having similar effects.

HTH

Bob

--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
bub
Posted: Wed Jan 03, 2007 8:26 pm
Guest
Quote:
I'm not sure what you mean by gene content. Are you just saying that
different genes lead to different phenotypes? Or are you talking about
an effect of copy number?

Sorry for the misuse of "gene-content"! I was speaking about nucleotide
content (intra-gene) which could be correlated to a phenotype.

Quote:
I'm not sure I would test anything. If the background theory says that
all of the genes have an effect, then it might not make any sense to
test whether there is an effect: you could just be testing whether you
have enough data to detect the effect!

In fact the theory doesn't predict an effect, but predict that if there
is an effect observed in one gene, it is very likely to observe also an
effect in the other genes (considering that there are no
measurement/sequencing error and no saturation ... but unfortunatly,
the saturation effect is probably quite high). Indeed, the specific
evolution of these genes predict to have probably shapped in the same
way the nucleotide content of these genes (which seems to be true in
regards of my data). So I guess it is a form of dependence between the
outputs of the several tests done on the 7 genes.

By the way, the study of the same correlation between the concatenation
of the 7 genes and the phenotype is significant at 5% level ... So the
use individual test (one correlation test per gene) is not obvious
here. But when we have designed the study we where expecting that the
high level of saturation in at least two genes could add too much noise
in the study of correlation with concatenation ... so we have run in
parallele individual tests (one test per gene). And in fact I am not
sure that these individuals test are really informative given that we
have a significant correlation with the concatenation, but I have
presented this example because I am more interested by the statistical
issue adressed here (see my initial message, in particular: do I test
the same hypothesis or several hypothesis??)
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Mon Oct 13, 2008 3:06 am