Main Page | Report this Page
 
   
Science Forum Index  »  Bio Evolution Forum  »  Priors for branch lengths in Bayesian phylogenetics
Page 1 of 1    
Author Message
Graham Jones
Posted: Thu Feb 15, 2007 11:11 am
Guest
I am not a biologist - my main expertise is in pattern recognition. I have
recently become interested in bioinformatics. I have read Joe Felsenstein's
`Inferring Phylogenies', and some online articles on Bayesian MCMC
approaches, in particular `Branch-Length Prior Influences Bayesian Posterior
Probability of Phylogeny' by Yang and Rannala, Syst. Biol. 54(3):455–470,
2005.

I am trying to figure out what biologists are thinking (if anything!) when
they choose a prior for branch lengths in Bayesian phylogenetics.

Quoting from Yang and Rannala:

"The uniform prior with a large upper bound such as 10 or 100 is often
advocated as a `non-informative' or `diffuse' prior for branch lengths.
However, such a prior causes inflated clade probabilities and is one of the
worst in this regard. Exponential priors with small means appear
preferable."

Well, never mind about inflated clade probabilities for the moment. Suppose
you are about to do a phylogenetic analysis, and you've chosen
your species and your (aligned) sites. To keep things simple, suppose that
for these sites and these species, you believe the same rate of mutation is
occurring in each site. Assuming a uniform prior with an upper bound of 10
or more means (among other things) that 4 or more mutations per site,
per branch is more likely than less than 4. That doesn't make any sense at
all to me. If you thought that much mutation was going on, you'd choose
different sites or different species, wouldn't you?

But the `exponential priors with small means' which Yang and Rannala favour
don't seem any better. It seems from Figure 6(b) in their paper that
in order to get `sensible' posterior probabilities for clades (ie not very
close to 0 or 1), they have to choose the mean of the exponential prior to
be around 0.0001. That seems way too small too me. It means you need about
10,000 sites before you expect to see any mutation at all in a branch.
Again, you wouldn't use those sites and species if you thought that was the
case, would you?

I have yet to see a serious attempt to elicit priors from biologists based
on what they actually know about branch lengths. Can someone point me to
such a thing?


--
Graham Jones
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Wed Dec 03, 2008 10:58 pm