 |
|
| Science Forum Index » Compression Forum » Entropy... |
|
Page 4 of 6 Goto page Previous 1, 2, 3, 4, 5, 6 Next |
|
| Author |
Message |
| Mok-Kong Shen... |
Posted: Fri Sep 11, 2009 12:52 pm |
|
|
|
Guest
|
Thomas Richter wrote:
[quote:b68a1c9355]You are still on the wrong track. This is *not* the entropy we talk
about. For the entropy we talk about, we *do* know the alphabet the
message is made of, and we do know the probabilities and their
dependencies. It is *not* about interpreting the message, only getting
the symbols it is made of. The symbols could be Chinese, and all you get
is a table of all possible symbols in chinese, and their (conditioned,
unconditioned...) probabilities, i.e. a description of the source (a
chinese monkey, if you like).
[/quote:b68a1c9355]
I have a layman's question here: In the character strings of
natural languages there are more or less strong correlations,
as the statistical data on digrams and n-grams show. If I
don't err, such correlations are not considered in Shannon's
formula. If this is ture, wouldn't the validity of the result
be a problem?
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Fri Sep 11, 2009 12:56 pm |
|
|
|
Guest
|
Mok-Kong Shen wrote:
[quote:f66ce754af]Thomas Richter wrote:
You can define the Kolmogorov complexity on finite sequences *given*
that you have a specific Turing machine. It is the size of the
shortest program *on this machine* that regenerates the message.
The problem is that this definition is always relative to the machine,
*unlike* - and this is the important part - you define it on infinite
strings. *Then* the machine doesn't matter anymore.
The problem with the finite-sized input - and why it makes little
sense there - is simply that for every finite sized message you can
define a Turing machine which generates this message simply by a
one-step program: Put the message into the state-apparatus of the
Turing machine.
(Aka: The "Lena compress" version of Kolmogorov complexity. True,
working, but useless).
Independence from the machine on infinite messages:
This is simply because any Turing machine can emulate any other Turing
machine by a finite-sized program, and the contribution of a
finite-sized program to an infinite string doesn't matter.
As layman I rather doubt that letting any kind of machine (real or
theoretical) to work on an infinite string could have a sense.
I mean by definition of "infinite", the work could never get to an
end (no matter what the processing speed is) and thus no result
would ever be available.
[/quote:f66ce754af]
You can still get the next output symbol, in the same sense that you can
get the next symbol from the digits of Pi. Pi can be represented as an
infinite decimal fraction, yet it makes sense to speak about Pi in
total. That is, it is at least "potentially infinite" in the sense that
you can always get the next symbol, and it never terminates, and the
length is not finite ad-hoc.
So long,
Thomas |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Fri Sep 11, 2009 1:07 pm |
|
|
|
Guest
|
Thomas Richter wrote:
[quote:2bcd8275cb]Mok-Kong Shen wrote:
Thomas Richter wrote:
You can define the Kolmogorov complexity on finite sequences *given*
that you have a specific Turing machine. It is the size of the
shortest program *on this machine* that regenerates the message.
The problem is that this definition is always relative to the
machine, *unlike* - and this is the important part - you define it on
infinite strings. *Then* the machine doesn't matter anymore.
The problem with the finite-sized input - and why it makes little
sense there - is simply that for every finite sized message you can
define a Turing machine which generates this message simply by a
one-step program: Put the message into the state-apparatus of the
Turing machine.
(Aka: The "Lena compress" version of Kolmogorov complexity. True,
working, but useless).
Independence from the machine on infinite messages:
This is simply because any Turing machine can emulate any other
Turing machine by a finite-sized program, and the contribution of a
finite-sized program to an infinite string doesn't matter.
As layman I rather doubt that letting any kind of machine (real or
theoretical) to work on an infinite string could have a sense.
I mean by definition of "infinite", the work could never get to an
end (no matter what the processing speed is) and thus no result
would ever be available.
You can still get the next output symbol, in the same sense that you can
get the next symbol from the digits of Pi. Pi can be represented as an
infinite decimal fraction, yet it makes sense to speak about Pi in
total. That is, it is at least "potentially infinite" in the sense that
you can always get the next symbol, and it never terminates, and the
length is not finite ad-hoc.
[/quote:2bcd8275cb]
OK, but when do you take the result from the machine? At the
moment you take the result, only a finite number of symbols
have been processed. So you "always" get results from a finite
string, never from an infinite string, although you may choose
to have that finite string to be as long as you like, if you
spend correspondingly long time.
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Fri Sep 11, 2009 2:22 pm |
|
|
|
Guest
|
Mok-Kong Shen wrote:
[quote:7e8fca0532]Thomas Richter wrote:
You are still on the wrong track. This is *not* the entropy we talk
about. For the entropy we talk about, we *do* know the alphabet the
message is made of, and we do know the probabilities and their
dependencies. It is *not* about interpreting the message, only getting
the symbols it is made of. The symbols could be Chinese, and all you
get is a table of all possible symbols in chinese, and their
(conditioned, unconditioned...) probabilities, i.e. a description of
the source (a chinese monkey, if you like).
I have a layman's question here: In the character strings of
natural languages there are more or less strong correlations,
as the statistical data on digrams and n-grams show. If I
don't err, such correlations are not considered in Shannon's
formula.
[/quote:7e8fca0532]
There is no "the formula". There is an n-order entropy definition, and
that definition would of course capture the correlation. Zero-order
entropy, of course, does not. Or, to put it differently, if you have a
source with correlation, increasing the order of the entropy measurement
will decrease the entropy. Or rather, if you have an nth-order model,
you should measure entropy as n-th order entropy, obviously.
So long,
Thomas |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Fri Sep 11, 2009 2:26 pm |
|
|
|
Guest
|
Mok-Kong Shen wrote:
[quote:74013c55c6]You can still get the next output symbol, in the same sense that you
can get the next symbol from the digits of Pi. Pi can be represented
as an infinite decimal fraction, yet it makes sense to speak about Pi
in total. That is, it is at least "potentially infinite" in the sense
that you can always get the next symbol, and it never terminates, and
the length is not finite ad-hoc.
OK, but when do you take the result from the machine?
[/quote:74013c55c6]
Whenever you have enough digits you're satisfied with. Same as for Pi.
At some point, it is precise enough.
[quote:74013c55c6]At the
moment you take the result, only a finite number of symbols
have been processed. So you "always" get results from a finite
string, never from an infinite string, although you may choose
to have that finite string to be as long as you like, if you
spend correspondingly long time.
[/quote:74013c55c6]
The difference is the "potentially infinite", or, very formally, the
relative position of the "quantors" in the statement.
Infinite string means: There is a Turing machine T such that for any
message of size N...
and the point is, N is not known and not specified in advance. You can
choose, without modifying the machine. You never get a final result, and
you never get a final result for Pi either. It just goes on.
So long,
Thomas |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Fri Sep 11, 2009 3:23 pm |
|
|
|
Guest
|
Thomas Richter wrote:
[quote:b929848949]Mok-Kong Shen wrote:
I have a layman's question here: In the character strings of
natural languages there are more or less strong correlations,
as the statistical data on digrams and n-grams show. If I
don't err, such correlations are not considered in Shannon's
formula.
There is no "the formula". There is an n-order entropy definition, and
that definition would of course capture the correlation. Zero-order
entropy, of course, does not. Or, to put it differently, if you have a
source with correlation, increasing the order of the entropy measurement
will decrease the entropy. Or rather, if you have an nth-order model,
you should measure entropy as n-th order entropy, obviously.
[/quote:b929848949]
Thank you for pointing out that to me. In the books I read only
the O-th order formula is given, together with estimate of entropy
for English, although English, like all natural languages,
certainly has rather strong correlations. Maybe the estimate was
actually computed using a higher order formula, I don't know.
But anyway, that's bad of the authors in my humble view.
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Mark Adler... |
Posted: Sat Sep 12, 2009 1:52 pm |
|
|
|
Guest
|
On 2009-09-11 00:23:32 -0700, Thomas Richter <thor at (no spam) math.tu-berlin.de> said:
[quote:b9a26f8fee]Well, we now enter the realm of "existence or not". Basically,
Kolmogorov complexity exists in the same sense dragons exist. As an
abstract concept. You cannot potentially measure it, or design an
algorithm for it, you can only use it as a mathematical abstractum to
prove theorems about it.
[/quote:b9a26f8fee]
You can compute the Kolmogorov complexity for many short strings and a
given machine description. The theorem just says that you can't
compute it for *all* strings. On the other hand, I have never seen
*any* dragons.
Since you refer to it not being "useful", your definition of "abstract"
might be "not useable in a commercial application", in which case I
agree that I can't think of how you would use Kolmogorov complexity to
make money beyond a grant for mathematical research. However it is not
abstract in my sense which is "it never actually exists", since it does
exist all the time and is computable sometimes.
Mark |
|
|
| Back to top |
|
|
|
| Niels Fröhling... |
Posted: Sun Sep 13, 2009 7:23 am |
|
|
|
Guest
|
Mok-Kong Shen wrote:
[quote:6151d86f03]Thank you for your post. As you said, knowledge is essential in
order to make sense out of a message. Since the knowledge of a
person reading a message can be widely different from person to
person, perhaps also dependent on the situation a person is
currently in, an issue that interests me is whether it could
be feasible to somehow say something about the amount of
knowledge that a person has and that is relevant/helpful for
understanding a specific message.
[/quote:6151d86f03]
Practically it's not helpful, because you can't determine which knowledge (of
all the knowledges) [or which rules of all rules you can construct] applies to
understand a message without knowing the blackbox's knowledge [rule-set] first.
In some esoteric way you could say that if you perfectly know the sender, you
could complete his messages before he does; which does not occur. At least not
in big scale.
You may asymtotically gain understanding [reverse-engineering the blackbox's
function by the means of analysing the signal] if the other one would be a
constant, which they are not ... insofar others will always not be understood
very well, just sufficient enough to not make life wordless.
There are people with brain damage that loose the common learned associativity
of words and apply other associations, you most often can't understand and talk
to those people anymore based on prior knowledge. You have to begin and
understand them again, if you're patient enough.
[quote:6151d86f03]I mean that, if one could
succeed to do such evaluations even very roughly (say, on an
ordinal scale), that data might be interesting for diverse uses.
[/quote:6151d86f03]
I don't really understand which evaluation you really want to conduct. You
want to determine if you can "understand" every other? Or you want to determine
if every other is "understandable" by one other? Based on knowledge?
"Data" itself is meaningless, all "data" is identical in it's complexity and
structure. We as human just ought to give identifiers or markers, tags to
specific pieces of data because it's a convenient and successful way to exchange
information. We use more identifiers than strictly necessary to protect the
messages from loss, and to compensate that we are not incredibly complex and
just can be that much complicated. The convention of meaning is arbitrary, not
absolute; and it's learned and changed as well over generations. Once lost or
not completely understood or extended, new identifiers will be assigned,
arbitrary as well.
So basically the response to your idea (evaluation of knowledge) I would give
is that you just possibly can evaluate how much someone follows or recognizes a
actual common convention (or codification), but that's all.
Ciao
Niels |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Sun Sep 13, 2009 8:00 am |
|
|
|
Guest
|
Niels Fröhling wrote:
[quote:b80b2a424f]Mok-Kong Shen wrote:
[snip][/quote:b80b2a424f]
[quote:b80b2a424f]I don't really understand which evaluation you really want to conduct.
You want to determine if you can "understand" every other? Or you want
to determine if every other is "understandable" by one other? Based on
knowledge?
[/quote:b80b2a424f]
If I write a certain sentence to someone, I like to estimate whether
the sense/meaning I have in mind will be fully or only partially
captured by him. For I may under circumstances need to re-formulate
that sentence in order to get my information well through. For example,
a sentence that I write to a professional colleague may have to be
rewritten in a quite different way when sent to a friend who knows
little of my field of profession. That means I have to estimate the
knowledge of my partner. As I argued, his resources may also play
a role.
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Sun Sep 13, 2009 9:25 am |
|
|
|
Guest
|
Mok-Kong Shen wrote:
[quote:c91aa85cd0]Thomas Richter wrote:
You can define the Kolmogorov complexity on finite sequences *given*
that you have a specific Turing machine. It is the size of the
shortest program *on this machine* that regenerates the message.
The problem is that this definition is always relative to the machine,
*unlike* - and this is the important part - you define it on infinite
strings. *Then* the machine doesn't matter anymore.
The problem with the finite-sized input - and why it makes little
sense there - is simply that for every finite sized message you can
define a Turing machine which generates this message simply by a
one-step program: Put the message into the state-apparatus of the
Turing machine.
(Aka: The "Lena compress" version of Kolmogorov complexity. True,
working, but useless).
Independence from the machine on infinite messages:
This is simply because any Turing machine can emulate any other Turing
machine by a finite-sized program, and the contribution of a
finite-sized program to an infinite string doesn't matter.
I am ignorant of the deep theory of Kolmogorov. Looking superficially
however at the definition given in a book I just got from a library:
T denotes a given Turing machine that generates a binary
sequence x_n of length n. Let P(T) be the program with
which the Universal Turing machine simulates T. Then we
define the Kolmogorov complexity of a sequence x_n by
k(x_n) = min { |P(T)| | T generates x_n }
I have a layman's question: Wouldn't in the "Lena compress" case,
with the message being put into the state-apparatus of T, |P(T)|
be of the order of the length of the message?
[/quote:c91aa85cd0]
From another source, I find the definition:
k(x_n) = min { | P | | U(P) = x_n }
But here U is a universal Turing machine, not the special
Turing machine T with its state purposedly set to suit
"Lena compress". Either way, it seems therefore highly
likely that you erred with your claim that finite-sized
input makes little sense for Kolmogorov complexity.
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Tue Sep 15, 2009 2:35 am |
|
|
|
Guest
|
Mok-Kong Shen wrote:
[quote:ae08ccc8d7]
From another source, I find the definition:
k(x_n) = min { | P | | U(P) = x_n }
But here U is a universal Turing machine, not the special
Turing machine T with its state purposedly set to suit
"Lena compress".
[/quote:ae08ccc8d7]
You seem confused. There is no "the universal turing machine", there are
many. In fact, the machine you're typing in is such a machine. A machine
that has a special instruction to generate "lena" could also very well
be a universal turing machine. Thus, the above definition is of course
*still* machine dependent, and not "less" dependent.
[quote:ae08ccc8d7]Either way, it seems therefore highly
likely that you erred with your claim that finite-sized
input makes little sense for Kolmogorov complexity.
[/quote:ae08ccc8d7]
Nope. The above still makes no sense. There is still for any finite
input sequence A a universal turing machine that can generate A by a
one-size instruction tape.
So long,
Thomas |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Tue Sep 15, 2009 8:59 am |
|
|
|
Guest
|
Thomas Richter wrote:
[quote:7dd67950d5]Mok-Kong Shen wrote:
From another source, I find the definition:
k(x_n) = min { | P | | U(P) = x_n }
But here U is a universal Turing machine, not the special
Turing machine T with its state purposedly set to suit
"Lena compress".
You seem confused. There is no "the universal turing machine", there are
many. In fact, the machine you're typing in is such a machine. A machine
that has a special instruction to generate "lena" could also very well
be a universal turing machine. Thus, the above definition is of course
*still* machine dependent, and not "less" dependent.
[/quote:7dd67950d5]
I don't yet understand. Please kindly explain how a machine with
its state specially set to suit "Lena Compress" could nonetheless
be "universal".
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Tue Sep 15, 2009 11:01 am |
|
|
|
Guest
|
Mok-Kong Shen wrote:
[quote:b567db953f]Thomas Richter wrote:
Mok-Kong Shen wrote:
From another source, I find the definition:
k(x_n) = min { | P | | U(P) = x_n }
But here U is a universal Turing machine, not the special
Turing machine T with its state purposedly set to suit
"Lena compress".
You seem confused. There is no "the universal turing machine", there are
many. In fact, the machine you're typing in is such a machine. A machine
that has a special instruction to generate "lena" could also very well
be a universal turing machine. Thus, the above definition is of course
*still* machine dependent, and not "less" dependent.
I don't yet understand. Please kindly explain how a machine with
its state specially set to suit "Lena Compress" could nonetheless
be "universal".
[/quote:b567db953f]
"Universal" means nothing more than "being able to emulate any other
Turing machine" by a suitable program. Here is a receipt how to build a
universal "lena" Turing machine U2 from a given "universal" Turing
machine U1:
Consider that U1 has the command set c1...cn. Then, define the
instruction set of U2 d1...d(n+1):
The instruction dk has the encoding 00 ck for k<=n. The instruction
d(n+1) has the encoding 01.
The Turing machine U2 then has an additional internal state that
"decodes the pre-word".
The new instruction 01 then writes Lena to the tape.
U2 is clearly universal as it can do everything U1 can, just with longer
"instructions".
Greetings,
Thomas |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Tue Sep 15, 2009 11:12 am |
|
|
|
Guest
|
Thomas Richter wrote:
[quote:389d0684cf]Mok-Kong Shen wrote:
Thomas Richter wrote:
Mok-Kong Shen wrote:
From another source, I find the definition:
k(x_n) = min { | P | | U(P) = x_n }
But here U is a universal Turing machine, not the special
Turing machine T with its state purposedly set to suit
"Lena compress".
You seem confused. There is no "the universal turing machine", there are
many. In fact, the machine you're typing in is such a machine. A machine
that has a special instruction to generate "lena" could also very well
be a universal turing machine. Thus, the above definition is of course
*still* machine dependent, and not "less" dependent.
I don't yet understand. Please kindly explain how a machine with
its state specially set to suit "Lena Compress" could nonetheless
be "universal".
"Universal" means nothing more than "being able to emulate any other
Turing machine" by a suitable program. Here is a receipt how to build a
universal "lena" Turing machine U2 from a given "universal" Turing
machine U1:
Consider that U1 has the command set c1...cn. Then, define the
instruction set of U2 d1...d(n+1):
The instruction dk has the encoding 00 ck for k<=n. The instruction
d(n+1) has the encoding 01.
The Turing machine U2 then has an additional internal state that
"decodes the pre-word".
The new instruction 01 then writes Lena to the tape.
U2 is clearly universal as it can do everything U1 can, just with longer
"instructions".
[/quote:389d0684cf]
OK. Let me aks your favour to do one thing: In all the books
I have looked at, Komolgorov complexity is defined for finite
strings and no where is any inappropriateness of finite
input mentioned. So your claim that for finite input Komolgorov
complexity makes little sense must be something new in the
field. Could you please give a reference to a book or a journal
that supports your claim, or is it a yet unpublished scientific
result of your own?
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Tue Sep 15, 2009 2:21 pm |
|
|
|
Guest
|
Mok-Kong Shen wrote:
[quote:c339236e0b]OK. Let me aks your favour to do one thing: In all the books
I have looked at, Komolgorov complexity is defined for finite
strings and no where is any inappropriateness of finite
input mentioned.
[/quote:c339236e0b]
None of the above stuff is new, it's just conclusions from the obvious.
While I generally don't appreciate Wikipedia for things like this, just
go there. You'll find there exactly the same elementary result I stated
above, namely that the complexity is language dependent, and that for
any two languages the complexity of a given string differs by a constant
(depending on the languages). The specific "lena" example is this
invariance theorem, just turned around. By making the machine pretty
smart, one can make the constant c as large as the language, and then
the complexity of *a specific* string very short.
For any *finite* string the complexity isn't a function of the input
string itself. However, if you approach the limit of longer and longer
strings, this constant becomes irrelevant, but only then.
There is no big science behind that, really.
[quote:c339236e0b]So your claim that for finite input Komolgorov
complexity makes little sense must be something new in the
field. Could you please give a reference to a book or a journal
that supports your claim, or is it a yet unpublished scientific
result of your own?
[/quote:c339236e0b]
Definitely not.
For example, go into Ray Solomonoff's work "A Preliminary Report on a
General Theory of Inductive Inference," Report V-131, Zator Co.,
Cambridge, Ma. Feb 4, 1960." cited there. Go to section 4, and read it.
It states exactly what I say - the constant in the above equation
becomes irrelevant for long strings, and only then - and *then* the
definition is machine-independent. Otherwise, it depends on the machine,
clearly. That I can set the complexity to one for a *specific* message
is a trivial corollary from the invariance theorem. Specific means here:
For any message A there exists a Turing machine U such that C_U(A) = 1.
quite contrary to:
There is a Turing machine U such that for any message A C_U(A) = 1.
which is a different statement, and clearly wrong. As I might have said
already, it depends on the position of the quantors. As the description
of a Turing machine is by definition finite, the second statement is
clearly wrong since I can always set the message just considerably
longer than the description of the machine to be on the safe side. This
does not hold for the first statement.
Really, it's just about reading the statement carefully.
So long,
Thomas |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Sun Nov 29, 2009 9:41 am
|
|