Main Page | Report this Page
 
   
Science Forum Index  »  Bio Evolution Forum  »  Question about the Shannon "entropy" of genomes...
Page 1 of 1    
Author Message
Doug Wedel...
Posted: Sun Jul 13, 2008 8:05 pm
Guest
Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures" of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.
Steven Sullivan...
Posted: Mon Jul 14, 2008 7:16 pm
Guest
Doug Wedel <dougwedel at (no spam) earthlink.net> wrote:
Quote:
Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures" of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.

look up 'codon bias' for one level of redundancy

Also look up 'sequence logos', Tom Schneider s work primarily, which have been used for years
to represent DNA/protein sequence in terms of Shannon Entropy.

http://www-lmmb.ncifcrf.gov/~toms/




--
-S
A wise man, therefore, proportions his belief to the evidence. -- David Hume, "On Miracles"
(1748)
Graham Jones...
Posted: Tue Jul 15, 2008 10:03 am
Guest
"Doug Wedel" <dougwedel at (no spam) earthlink.net> wrote in message
news:g5eqau$1oak$1 at (no spam) darwin.ediacara.org...
Quote:
Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures"
of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.


Three search terms you may find useful:

Codon usage bias
GC-content
puffer-fish junk-dna


Graham
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sat Nov 22, 2008 4:19 pm