 |
|
| Science Forum Index » Compression Forum » Huffman & Entropy Question... |
|
Page 2 of 3 Goto page Previous 1, 2, 3 Next |
|
| Author |
Message |
| glen herrmannsfeldt... |
Posted: Fri Oct 16, 2009 10:18 am |
|
|
|
Guest
|
Thomas Richter <thor at (no spam) math.tu-berlin.de> wrote:
[quote:f85b00fea6]biject schrieb:
If you have the whole file of concern there are no
"guesses" you can calculate the probabilities of each
symbol exactly.
NO! Damnit, no! You also confuse probabilities with relative frequencies.
You can make up a model that fits to those frequencies, but these *aren't*
probabilities.
[/quote:f85b00fea6]
That is true if you require causality (cause comes before effect).
Without causality, how do you tell the difference between frequency
and probability? Assume that you have all the possible samples
for the given system.
In the signal processing world, one can design causal filters,
where the output only depends on the current and previous values
for a signal, or acausal ones where future values are also available.
In the case of a signal on disk, (such as a CD) one can easily
generate non-causal filters.
-- glen |
|
|
| Back to top |
|
|
|
| biject... |
Posted: Fri Oct 16, 2009 10:41 am |
|
|
|
Guest
|
On Oct 16, 10:15 am, Thomas Richter <t... at (no spam) math.tu-berlin.de> wrote:
[quote:47e82cbf55]biject schrieb:
If you have the whole file of concern there are no
"guesses" you can calculate the probabilities of each
symbol exactly.
NO! Damnit, no! You also confuse probabilities with relative frequencies.
You can make up a model that fits to those frequencies, but these *aren't*
probabilities.
So long,
Thomas
[/quote:47e82cbf55]
If you bothered actually reading what I wrote. You
would have noticed I mentioned the set of files that
are all possible permutations of a given file. So for
files of that set it is like knowing the individual
probabilities.
Since any permutation is as likely as any other.
The counts would give you the probabilities for the
best arithmetic compressor for the set.
Again that doesn't mean some of the files can't be
made smaller using another method. But what ever
method you choose for the set as a whole any other
method will increase the lengths of other files.
Since a bijective arithmetic will be the best.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link" |
|
|
| Back to top |
|
|
|
| Niels Fröhling... |
Posted: Fri Oct 16, 2009 9:49 pm |
|
|
|
Guest
|
Thomas Richter wrote:
[quote:f711f73cd0]Oh, I don't allow to count artificial constructed sources according
an entropy. My emphasis is more about an equivalent to "know
real-Pi^tm", which was observed, then approximated, then modelled,
then absolutely determined, and now the verify a circle in reality
though the model Pi, instead of verifying Pi as before. Taking the
number "5" and saying "this constant is '5', exactly and always", is
not the sort of answer I thought of. :)
Sure, I know what you mean. Probably not what you said to begin with,
but clearly, I understand perfectly. And you have my answer. No, I
don't. But that is what a model is: It isn't proven. It is made.
[/quote:f711f73cd0]
Hehe.
I thought it's of utter importance that you say what you've said. So nobody
get's the impression entropy is some god-defined measurable constant. It's a
quantifier of (one of) our specific assumptions in a specific probabilistic context.
[quote:f711f73cd0]The fun part is: It nevertheless works, because my (or rather, those of
other smart people designing compressors) guesses are often "good enough
to make it work".
[/quote:f711f73cd0]
Yes, of course. As with everything: limit awareness is essencial to true
understanding and that includes the limits of "understandable" itself too,
ignorance leads to dogmatism.
[quote:f711f73cd0]So long,
Thomas
PS: Yes, it seems strange indeed how mathematicians think. But again,
step back for a while - abstract models like probability are so in our
minds we no longer see that they are abstract. Here's an even simpler
example:
Ever seen a two?
I mean the number, not the digit.
[/quote:f711f73cd0]
Sure, we had a nice coffee yesterday ... but then, I get also visits from
Infinity. :D
Ciao
Niels |
|
|
| Back to top |
|
|
|
| Willem... |
Posted: Sun Oct 18, 2009 6:07 am |
|
|
|
Guest
|
Thomas Richter wrote:
) Again, sorry to object, but no.
)
) Here's a more concrete example to show you what I mean.
)
) Consider a fair coin, and we can assume(!) that p(A) = p(B) = 0.5.
) Of course, not exactly, but close to that. Now, I threw that coin four times, and I got:
)
) B B A B
)
) We now know everything. This is the file: B B A B.
) This is what I want to compress. Does this mean that p(B) = 3/4 and p(A) = 1/4 ??
No, but it *does* mean that p'(A) = 1/4 and p'(B) = 3/4
p'(A) = The probability that a symbol in that specific file is A.
(Same for p'(B) )
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Sun Oct 18, 2009 9:38 am |
|
|
|
Guest
|
biject schrieb:
[quote:05ee9e0206]On Oct 16, 10:15 am, Thomas Richter <t... at (no spam) math.tu-berlin.de> wrote:
biject schrieb:
If you have the whole file of concern there are no
"guesses" you can calculate the probabilities of each
symbol exactly.
NO! Damnit, no! You also confuse probabilities with relative frequencies.
You can make up a model that fits to those frequencies, but these *aren't*
probabilities.
So long,
Thomas
If you bothered actually reading what I wrote. You
would have noticed I mentioned the set of files that
are all possible permutations of a given file. So for
files of that set it is like knowing the individual
probabilities.
[/quote:05ee9e0206]
Again, sorry to object, but no.
Here's a more concrete example to show you what I mean.
Consider a fair coin, and we can assume(!) that p(A) = p(B) = 0.5.
Of course, not exactly, but close to that. Now, I threw that coin four times, and I got:
B B A B
We now know everything. This is the file: B B A B.
This is what I want to compress. Does this mean that p(B) = 3/4 and p(A) = 1/4 ??
So long,
Thomas |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Mon Oct 19, 2009 12:44 am |
|
|
|
Guest
|
Willem schrieb:
[quote]Thomas Richter wrote:
) Again, sorry to object, but no.
)
) Here's a more concrete example to show you what I mean.
)
) Consider a fair coin, and we can assume(!) that p(A) = p(B) = 0.5.
) Of course, not exactly, but close to that. Now, I threw that coin four times, and I got:
)
) B B A B
)
) We now know everything. This is the file: B B A B.
) This is what I want to compress. Does this mean that p(B) = 3/4 and p(A) = 1/4 ??
No, but it *does* mean that p'(A) = 1/4 and p'(B) = 3/4
p'(A) = The probability that a symbol in that specific file is A.
(Same for p'(B) )
[/quote]
No, that kind of thing is a relative frequency, not a probability. You
would have a probability if you ask for the probability of the event
getting an A or a B from the file when picking a symbol from it "at
random", i.e. you throw a dice, then pick the item at the indicated
position. However, at this point, you have again(!) assumed (and not
measured) probabilities, namely that your dice is a fair dice, i.e. the
probability of picking the symbol at position i does not depend on i and
is 1/4.
Thus, again, probabilities aren't measured, but assumed. You have a
model from the process how you pick a symbol from the file.
So long,
Thomas |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Sun Nov 08, 2009 6:02 am |
|
|
|
Guest
|
On Oct 14, 7:50 am, Thomas Richter <t... at (no spam) math.tu-berlin.de> wrote:
[quote]sebastian schrieb:
Clearly, if you have a proper model, typical strings are the most likely strings generated by the source, and thus you have *usually* the compression ratio Huffman grants you, but in the unlikely event of an atypical string you might have no compression at all, and rather an expansion.
Of course, expansion only occurs from to the overhead of transmitting
to model. In compressing the symbols alone, Huffman encoding
guarantees an output less than or equal to the input size.
That wasn't my point, though. My point was rather: "NO, IT DOESN'T".
Example: use the following Huffman code:
A -> 1
B -> 01
C -> 000
D -> 001
Does this Huffman code ensure "compression"?
Not at all. If I feed it by *any* message I like to, *most* of the
messages will be expanded, so it clearly doesn't ensure it.
The reason is that this works only well if we really have a random
source with p(A) = 1/2, p(B) = 1/4, p(C)=p(D)=1/8, at least
approximately. If the source derives from that, compression will be
lower, or even an expansion.
However, *even for such a source*, messages like
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
are *not* impossible. Just unlikely. So you do get, at best,
"compression on average" over a huge number of input messages. You do
not get a compression "for a message generated by the random source".
You only get that for "typical messages". The above message would be
rather a-typical, but not impossible. And hence, even for a random
source that *fits* the Huffman code, Huffman doesn't *ensure*
compression. It just makes *compression likely to happen*.
That was the point I was trying to make.
People here often mix the concepts of probabilities, relative
frequencies, models, random sources, specific sequences, random
sequences, etc. and hence create an immense noise level.
So long,
Thomas
[/quote]
I apologize for reviving an old discussion, but I just wanted to set
the record straight. The Huffman algorithm is, in fact, guaranteed to
produce an output LESS THAN or EQUAL TO the original size. As an
example (sorry, I'm not very good at formulating mathematical proofs),
if you generate a 256 byte file with every possible permutation just
once, every symbol gets assigned an 8 bit code. Here's the output of a
program given that particular input set:
Huffman Test
File: '256.txt'
....reading file...
Uncompressed size: 256 bytes
....compressing...
*** Begin Debugging Log ***
0xd7 : 10100000
0xd8 : 10100001
'b' : 10100010
'Z' : 10100011
'(' : 10100100
0xf3 : 10100101
0xa0 : 10100110
0xf5 : 10100111
',' : 10101000
0x12 : 10101001
0xc1 : 10101010
0x83 : 10101011
0x1e : 10101100
0xaf : 10101101
0x15 : 10101110
0xb7 : 10101111
'j' : 10110000
'R' : 10110001
'4' : 10110010
0xf0 : 10110011
'c' : 10110100
'Y' : 10110101
0xe0 : 10110110
0xe1 : 10110111
0xc2 : 10111000
0x87 : 10111001
0 : 10111010
0xd9 : 10111011
'!' : 10111100
0x9e : 10111101
'z' : 10111110
'B' : 10111111
0xad : 10000000
0x8 : 10000001
0x4 : 10000010
0x1 : 10000011
'v' : 10000100
'F' : 10000101
'i' : 10000110
'S' : 10000111
0xb8 : 10001000
0xd : 10001001
'8' : 10001010
0xac : 10001011
0x99 : 10001100
0xf7 : 10001101
'u' : 10001110
'G' : 10001111
'=' : 10010000
0xbe : 10010001
0x8d : 10010010
0xfa : 10010011
'p' : 10010100
'L' : 10010101
'x' : 10010110
'D' : 10010111
'2' : 10011000
0x11 : 10011001
0xda : 10011010
0xdb : 10011011
'q' : 10011100
'K' : 10011101
0x91 : 10011110
0xf9 : 10011111
0x9c : 11100000
0xdf : 11100001
0xdd : 11100010
0xde : 11100011
0x3 : 11100100
0xd0 : 11100101
'y' : 11100110
'C' : 11100111
0x10 : 11101000
0xb2 : 11101001
'a' : 11101010
'[' : 11101011
0xe5 : 11101100
0xe2 : 11101101
0x94 : 11101110
0x98 : 11101111
0xbc : 11110000
0x19 : 11110001
0x82 : 11110010
'~' : 11110011
0x7f : 11110100
0x80 : 11110101
't' : 11110110
'H' : 11110111
'#' : 11111000
0xb6 : 11111001
'_' : 11111010
'^' : 11111011
0xe6 : 11111100
0xe7 : 11111101
0xc : 11111110
0xee : 11111111
'9' : 11000000
0xa9 : 11000001
0x5 : 11000010
0x6 : 11000011
0x81 : 11000100
0xfd : 11000101
'w' : 11000110
'E' : 11000111
'?' : 11001000
0xdc : 11001001
0x84 : 11001010
0x88 : 11001011
0xcd : 11001100
0xa6 : 11001101
0xc3 : 11001110
0x8b : 11001111
'}' : 11010000
'7' : 11010001
0xab : 11010010
0xd6 : 11010011
0xb9 : 11010100
0x16 : 11010101
'e' : 11010110
'W' : 11010111
0x92 : 11011000
0x8e : 11011001
0xc8 : 11011010
0x9f : 11011011
0xa1 : 11011100
''' : 11011101
's' : 11011110
'I' : 11011111
'1' : 00100000
'-' : 00100001
0xc5 : 00100010
0x93 : 00100011
0x95 : 00100100
0xf8 : 00100101
'{' : 00100110
'A' : 00100111
0x8c : 00101000
0x90 : 00101001
'h' : 00101010
'T' : 00101011
0xe : 00101100
0xb0 : 00101101
0xd4 : 00101110
0xd5 : 00101111
0xeb : 00110000
0xe8 : 00110001
'k' : 00110010
'Q' : 00110011
'5' : 00110100
0xb3 : 00110101
0xe3 : 00110110
0xe4 : 00110111
'r' : 00111000
'J' : 00111001
'm' : 00111010
'O' : 00111011
0x20 : 00111100
0x14 : 00111101
0xcc : 00111110
'.' : 00111111
0xa8 : 00000000
0xef : 00000001
'/' : 00000010
0xb4 : 00000011
0x9 : 00000100
0xae : 00000101
'>' : 00000110
';' : 00000111
0xc0 : 00001000
0xbf : 00001001
0xca : 00001010
0xa2 : 00001011
0x1f : 00001100
']' : 00001101
'g' : 00001110
'U' : 00001111
0x17 : 00010000
0xbb : 00010001
0xff : 00010010
0xfe : 00010011
'&' : 00010100
0x13 : 00010101
0x89 : 00010110
0xfb : 00010111
0x9a : 00011000
0x96 : 00011001
'o' : 00011010
'M' : 00011011
0xb : 00011100
0x1c : 00011101
'<' : 00011110
0x1a : 00011111
0xec : 01100000
0xed : 01100001
0x2 : 01100010
0xd3 : 01100011
0xa4 : 01100100
0xf2 : 01100101
0xcf : 01100110
':' : 01100111
0xcb : 01101000
'*' : 01101001
'n' : 01101010
'N' : 01101011
'|' : 01101100
' at (no spam) ' : 01101101
')' : 01101110
0xb5 : 01101111
'+' : 01110000
0xa5 : 01110001
0xf : 01110010
0xb1 : 01110011
'3' : 01110100
0xa7 : 01110101
0x85 : 01110110
0xfc : 01110111
0x8a : 01111000
0x86 : 01111001
0xa3 : 01111010
'%' : 01111011
0xa : 01111100
0x7 : 01111101
0xce : 01111110
'6' : 01111111
0xc4 : 01000000
0x8f : 01000001
'd' : 01000010
'X' : 01000011
0xd1 : 01000100
0xd2 : 01000101
'$' : 01000110
0xf4 : 01000111
0xe9 : 01001000
0xea : 01001001
0x18 : 01001010
0xba : 01001011
0xc6 : 01001100
0x97 : 01001101
'`' : 01001110
'\' : 01001111
'f' : 01010000
'V' : 01010001
0x1b : 01010010
0xbd : 01010011
'0' : 01010100
0xf1 : 01010101
0x9d : 01010110
0xf6 : 01010111
0xc7 : 01011000
0x9b : 01011001
0xaa : 01011010
0x1d : 01011011
0xc9 : 01011100
'"' : 01011101
'l' : 01011110
'P' : 01011111
Model: 63.875 bytes (511 bits)
Symbols: 256 bytes
Accumulator: 4 bytes
Total overhead: 323.875 bytes
*** End of Debugging Log ***
256 bytes processed in 0.36 seconds
Compressed size: 580 bytes (2.3e+02%)
....decompressing...
580 bytes processed in 0 seconds
....verifying...
Result: pass
In fact, the algorithm automatically adjusts itself in such a way that
no matter what the frequency of any given symbol is, a minimal code is
chosen such that the size of the input is never exceeded. It's an
interesting property that sets it apart from all others (that I know
of). If you still have any doubt, just imagine a 'two-bit' machine and
visualize every possible arrangement of the Huffman tree for a given
input, which really isn't too difficult.
And just to touch on the subject of entropy, I'm going to reiterate
what Thomas has said (although somewhat differently): Entropy has no
real bearing on the 'reality' of the state of a system. It is simply a
measure derived from whatever analysis is being applied to it, eg: the
'relative' order of the system as viewed by the observer. In other
words: Entropy is not 'absolute'. |
|
|
| Back to top |
|
|
|
| biject... |
Posted: Sun Nov 08, 2009 10:28 am |
|
|
|
Guest
|
On Nov 8, 9:02 am, sebastian <sebastianga... at (no spam) gmail.com> wrote:
[quote]I apologize for reviving an old discussion, but I just wanted to set
the record straight. The Huffman algorithm is, in fact, guaranteed to
produce an output LESS THAN or EQUAL TO the original size.
[/quote]
That only true if you don't include the information in
the table that tells which tree your using.
In fact if you don't include the stats simple arithmetic
coding will always match or bet huffman.
The main reason for this is that several files will
compress to the same exact file. So you need to know
which tree or stats to use so the file will uncompress
to the file you want.
Its when you include this data which is needed for
a general decompressor that you assumtions fall apart.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link" |
|
|
| Back to top |
|
|
|
| glen herrmannsfeldt... |
Posted: Sun Nov 08, 2009 12:27 pm |
|
|
|
Guest
|
sebastian <sebastiangarth at (no spam) gmail.com> wrote:
(snip)
[quote]I apologize for reviving an old discussion, but I just wanted to set
the record straight. The Huffman algorithm is, in fact, guaranteed to
produce an output LESS THAN or EQUAL TO the original size. As an
example (sorry, I'm not very good at formulating mathematical proofs),
if you generate a 256 byte file with every possible permutation just
once, every symbol gets assigned an 8 bit code.
[/quote]
Yes, but as was previously noted if you add the table to the
compressed file then it can get larger. That is more significant
for smaller source files.
I still like the compression system used by Ultrium tape drives.
There is one bit per block, probably in the block header, indicating
that the block is compressed. If, during writing, the drive finds
that the compressed data is larger it writes the uncompressed data
instead and indicates that using just one bit. So, compression will
never increase the size, but might decrease it.
-- glen |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Mon Nov 09, 2009 5:49 am |
|
|
|
Guest
|
I was just elaborating on my earlier statement:
"Of course, expansion only occurs from to the overhead of transmitting
to model. In compressing the symbols alone, Huffman encoding
guarantees an output less than or equal to the input size."
But yes, for smaller files, even an extra 300 or so bytes for the
model itself may be too expensive. Arithemetic is probably one of the
most ideal encoding methods, but unfortunately the overhead of the
statistics can be much larger still (an adaptive approach could omit
that, but at the cost of overall compression, though I'm not sure what
this works out to in practice). Algorithms such as LZW don't even
require transmission of the model, being adaptive in nature, but in
any case neither LZW nor arithmetic guarantee the level of non-
expansion of input that Huffman does (making it ideal for limited
resource environments). Probably the best kind of compressor then
would be a combination of special-purpose encoders (in conjunction
with the Ultrium marking technique, say), to achieve the best overall
performance. |
|
|
| Back to top |
|
|
|
| biject... |
Posted: Mon Nov 09, 2009 10:00 am |
|
|
|
Guest
|
On Nov 9, 8:49 am, sebastian <sebastianga... at (no spam) gmail.com> wrote:
....
Not sure where to start. But the nice thing
about adaptive arithmetic is that can always
compress to the same size as a static arithmetic
if coded correctly. So there is seldom a need for
a static arithmetic unless the table is known
separate from the file.
Again this precious so called no expansion of
huffman with no table. Is not as good
as the arithmetic with no table. Since one
should view huffman as a subset of arithmetic
compression and not really something as different.
Yes there are cases but mostly likely fewer
than you could think of where a static huffman
with a table beats the arithmetic with a table
but it a rare thing since even if the table data
for an arithmetic is slightly longer the better
compression often more than makes up for it.
Especially if one realizes that it easier
to make a bijective adaptive arithmetic. Than
a bijective static huffman with a table, though
it can be done its not pretty code.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link" |
|
|
| Back to top |
|
|
|
| nightlight... |
Posted: Mon Nov 09, 2009 10:13 am |
|
|
|
Guest
|
On Nov 8, 5:27 pm, glen herrmannsfeldt <g... at (no spam) ugcs.caltech.edu> wrote:
[quote]
I still like the compression system used by Ultrium tape drives.
There is one bit per block, probably in the block header, indicating
that the block is compressed. If, during writing, the drive finds
that the compressed data is larger it writes the uncompressed data
instead and indicates that using just one bit. So, compression will
never increase the size, but might decrease it.
[/quote]
There is a missing qualifier in that conclusion "compression will
never increase the size" which is: "... beyond the 1 bit increase paid
on every block every time." |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Tue Nov 10, 2009 6:31 am |
|
|
|
Guest
|
[quote]Not sure where to start. But the nice thing about adaptive arithmetic is that can always
compress to the same size as a static arithmetic if coded correctly.[/quote]
So there is seldom a need for
a static arithmetic unless the table is known separate from the file.
Interesting. I had always thought there would be some sort of
performance hit involved in that case. Can you elaborate on how this a
bit more?
[quote]Again this precious so called no expansion of huffman with no table. Is not as good
as the arithmetic with no table. Since one should view huffman as a[/quote]
subset of arithmetic
compression and not really something as different.
I see, so the difference isn't much. But still, in an embedded
environment, say, where you have a fixed size block to work with,
Huffman is nice since it carries a guarantee in output size, not to
mention it being a very fast algorithm. But for the general case, yes,
arithmetic is definitely superior.
[quote]Especially if one realizes that it easier to make a bijective adaptive arithmetic. Than
a bijective static huffman with a table, though it can be done its not[/quote]
pretty code.
I'm not sure what exactly a "bijective" Huffman encoder would be, but
as far as I know, the original algorithm can't be improved upon, that
is, Huffman's original design is essentially "perfect" for the type of
compression that it performs (which was confirmed by Shannon, I
believe). At any rate, it would be interesting to hear what the
difference between standard and bijective Huffman would be.
Arithmetic coding is pretty cool, anyway, and it's definitely next on
my list of compression algorithms to implement (though I'm going to
use the "range coding" approach to avoid any patent infringement). Any
words of advice for someone about to embark on such a task? |
|
|
| Back to top |
|
|
|
| biject... |
Posted: Tue Nov 10, 2009 7:54 am |
|
|
|
Guest
|
On Nov 10, 9:31 am, sebastian <sebastianga... at (no spam) gmail.com> wrote:
[quote]Not sure where to start. But the nice thing about adaptive arithmetic is that can always
compress to the same size as a static arithmetic if coded correctly.
So there is seldom a need for
a static arithmetic unless the table is known separate from the file.
Interesting. I had always thought there would be some sort of
performance hit involved in that case. Can you elaborate on how this a
bit more?
[/quote]
I would suggest something like
ftp://ftp.cs.brown.edu/pub/techreports/91/cs91-03.pdf
for a start read this document one of the best
on arithmetic on net
[quote]Again this precious so called no expansion of huffman with no table. Is not as good
as the arithmetic with no table. Since one should view huffman as a
subset of arithmetic
compression and not really something as different.
I see, so the difference isn't much. But still, in an embedded
environment, say, where you have a fixed size block to work with,
Huffman is nice since it carries a guarantee in output size, not to
mention it being a very fast algorithm. But for the general case, yes,
arithmetic is definitely superior.
Especially if one realizes that it easier to make a bijective adaptive arithmetic. Than
a bijective static huffman with a table, though it can be done its not
pretty code.
I'm not sure what exactly a "bijective" Huffman encoder would be, but
as far as I know, the original algorithm can't be improved upon, that
is, Huffman's original design is essentially "perfect" for the type of
compression that it performs (which was confirmed by Shannon, I
believe). At any rate, it would be interesting to hear what the
difference between standard and bijective Huffman would be.
[/quote]
Bijective huffman is in the case of file compression
just a way of encoding it so that every file would be
a valid file that could be uncompressed and when you
huffman compress it goes back. Its a more perfect than
normal huffman because it handles how you end the file.
and makes full use of the file space. Example in normal
huffman you may end not on a byte boudary so you need to
waste space to communicate that. In bijective huffman it
does not have these kinds of problems.
[quote]Arithmetic coding is pretty cool, anyway, and it's definitely next on
my list of compression algorithms to implement (though I'm going to
use the "range coding" approach to avoid any patent infringement). Any
words of advice for someone about to embark on such a task?
[/quote]
To me range coding is not that difference you just update by waiting
till more than one bit is ready.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link" |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Tue Nov 10, 2009 8:51 am |
|
|
|
Guest
|
On Nov 10, 9:54 am, biject <biject.b... at (no spam) gmail.com> wrote:
[quote]On Nov 10, 9:31 am, sebastian <sebastianga... at (no spam) gmail.com> wrote:
Not sure where to start. But the nice thing about adaptive arithmetic is that can always
compress to the same size as a static arithmetic if coded correctly.
So there is seldom a need for
a static arithmetic unless the table is known separate from the file.
Interesting. I had always thought there would be some sort of
performance hit involved in that case. Can you elaborate on how this a
bit more?
I would suggest something like
ftp://ftp.cs.brown.edu/pub/techreports/91/cs91-03.pdf
for a start read this document one of the best
on arithmetic on net
Again this precious so called no expansion of huffman with no table. Is not as good
as the arithmetic with no table. Since one should view huffman as a
subset of arithmetic
compression and not really something as different.
I see, so the difference isn't much. But still, in an embedded
environment, say, where you have a fixed size block to work with,
Huffman is nice since it carries a guarantee in output size, not to
mention it being a very fast algorithm. But for the general case, yes,
arithmetic is definitely superior.
Especially if one realizes that it easier to make a bijective adaptive arithmetic. Than
a bijective static huffman with a table, though it can be done its not
pretty code.
I'm not sure what exactly a "bijective" Huffman encoder would be, but
as far as I know, the original algorithm can't be improved upon, that
is, Huffman's original design is essentially "perfect" for the type of
compression that it performs (which was confirmed by Shannon, I
believe). At any rate, it would be interesting to hear what the
difference between standard and bijective Huffman would be.
Bijective huffman is in the case of file compression
just a way of encoding it so that every file would be
a valid file that could be uncompressed and when you
huffman compress it goes back. Its a more perfect than
normal huffman because it handles how you end the file.
and makes full use of the file space. Example in normal
huffman you may end not on a byte boudary so you need to
waste space to communicate that. In bijective huffman it
does not have these kinds of problems.
Arithmetic coding is pretty cool, anyway, and it's definitely next on
my list of compression algorithms to implement (though I'm going to
use the "range coding" approach to avoid any patent infringement). Any
words of advice for someone about to embark on such a task?
To me range coding is not that difference you just update by waiting
till more than one bit is ready.
David A. Scott
--
My Crypto codehttp://bijective.dogma.net/crypto/scott19u.ziphttp://www..jim.com/jamesd/Kong/scott19u.zipold version
My Compression codehttp://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
[/quote]
Okay, well thanks so much for the input, David. I'll be sure to look
into that PDF before I go much farther, as well. Cheers. |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Wed Dec 09, 2009 6:06 pm
|
|