| |
 |
|
|
Science Forum Index » Bio Evolution Forum » Question about the Shannon "entropy" of genomes...
Page 1 of 1
|
| Author |
Message |
| Doug Wedel... |
Posted: Sun Jul 13, 2008 8:05 pm |
|
|
|
Guest
|
Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures" of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind. |
|
|
| Back to top |
|
| Steven Sullivan... |
Posted: Mon Jul 14, 2008 7:16 pm |
|
|
|
Guest
|
Doug Wedel <dougwedel at (no spam) earthlink.net> wrote:
Quote: Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures" of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.
look up 'codon bias' for one level of redundancy
Also look up 'sequence logos', Tom Schneider s work primarily, which have been used for years
to represent DNA/protein sequence in terms of Shannon Entropy.
http://www-lmmb.ncifcrf.gov/~toms/
--
-S
A wise man, therefore, proportions his belief to the evidence. -- David Hume, "On Miracles"
(1748) |
|
|
| Back to top |
|
| Graham Jones... |
Posted: Tue Jul 15, 2008 10:03 am |
|
|
|
Guest
|
"Doug Wedel" <dougwedel at (no spam) earthlink.net> wrote in message
news:g5eqau$1oak$1 at (no spam) darwin.ediacara.org...
Quote: Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures"
of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.
Three search terms you may find useful:
Codon usage bias
GC-content
puffer-fish junk-dna
Graham |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Sat Nov 22, 2008 4:19 pm
|
|