 |
|
| Science Forum Index » Compression Forum » A Revolutionary Breakthrough in Data Compression!... |
|
Page 2 of 3 Goto page Previous 1, 2, 3 Next |
|
| Author |
Message |
| Willem... |
Posted: Sat Sep 26, 2009 9:05 pm |
|
|
|
Guest
|
sebastian wrote:
)>> Get this through your head:
)
) Honestly, just because you are communicating with complete strangers
) doesn't mean you should abandon common courtesy, Willem. It's one
) thing to disagree with someone - and maybe even think they are
) completely bonkers - but that sort of tone is just really uncalled
) for.
I disagree. On several occasions, you were presented with solid
argumentation that you were wrong. You dismiss all of it with 'if only I
had some proof', meanwhile holding on to your own, unfounded, assertions.
Also, your posts contain several statements that had been previously
shown to be wrong, which clearly shows that you only read half of what
was written in this discussion, if even that.
To me, this shows that you lack the decency to listen to others.
If someone acts like that, then I drop a bit of common courtesy.
Here is, for your convenience, the closing argument once more:
Canonical encoding represents less information(*) than tree encoding.
Therefore canonical encoding takes less space, while remaining optimal.
*) One canonical encoding corresponds to multiple trees. Each of those
trees leads to exactly the same compressed size. The extra information
is equivalent to the exact choice of tree.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT |
|
|
| Back to top |
|
|
|
| nightlight... |
Posted: Sat Sep 26, 2009 9:45 pm |
|
|
|
Guest
|
On Sep 25, 10:29 pm, sebastian <sebastianga... at (no spam) gmail.com> wrote:
[quote:2f9c09ae03]What I have got is a minor improvement of the classic Huffman
algorithm...
EMIT_TREE(NODE)
IF NODE IS LEAF
EMIT_BIT(LEAF TAG)
EMIT_BYTE(NODE DATA)
ELSE
EMIT_BIT(NONLEAF TAG)
EMIT_TREE(LEFT CHILD OF NODE)
EMIT_TREE(RIGHT CHILD OF NODE)
Reconstruction from the stream is simple and unambiguous. And since a
Huffman tree is guaranteed to contain 511 or less nodes, the upper
bound for the total size of the compression model is 511 + ( 256 * 8 )
== 2559 bits == 319.875 bytes.
[/quote:2f9c09ae03]
Note first that you don't need 256*8 bits to encode permutation of 256
symbols since there are only 256! possible permutations, hence you
need only 256*(8-log(e)) ~ 256*6.56 ~ 1679 bits instead of 2048,
saving of ~369 bits. You can use Huffman code itself to encode the
permutation (e.g. use Huffman codes to encode the factorial radix
digits for the permutation), or to obtain the optimum code length of
1679 bits, use the Quantized Indexing permutation coder (you can get
the complete permutation encoding & decoding functions from the QI
source code here: http://www.1stworks.com/ref/qi.htm ; these will code
it at the same speed as Huffman, but much tighter).
Second, you don't need the 511 bits for the binary tree itself, since
like permutation, that is also a constrained sequence having ~2^511/
(511^3/2) possible patterns rather than 2^511 general binary patterns,
reducing thus the 511 bit length by the additional 1.5*log(511) ~ 13.5
bits. Huffman or arithmetic codes won't help you here, but enumerative
coder (such as the QI enumerative coder for the binary alphabet given
in the source above) will code it to the minimum length.
Third, explicitly encoding the tree as suggested, still sends more
info than needed for the decoding due to the granularity constraints
on the codes themselves for finite lenght sequences (the cannonical
method is better in this regard since it does recognize and take
advantage of this granularity constraint).
Fourth, regarding your "on average" argument, "on average" the
shortest encoding of a sequence of bytes is the plain 8 bit per symbol
binary code. |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Sun Sep 27, 2009 6:17 am |
|
|
|
Guest
|
[quote:2e28b30d34]I disagree. On several occasions, you were presented with solid argumentation that you were wrong. You dismiss all of it with 'if only I had some proof', meanwhile holding on to your own, unfounded, assertions.
...
To me, this shows that you lack the decency to listen to others. If someone acts like that, then I drop a bit of common courtesy.
[/quote:2e28b30d34]
Sorry, but failing to understand ones argument is hardly justification
for rudeness.
[quote:2e28b30d34]THe canonical method is also always optimal, unless somebody chooses to limit the maximal symbol length. This limit is by no means central to the method.
[/quote:2e28b30d34]
I've never heard that before from any descriptions of the canonical
method that I've read. If that is the case, though, then the two
methods are equally optimal, obviously.
[quote:2e28b30d34]Canonical encoding represents less information(*) than tree encoding. Therefore canonical encoding takes less space, while remaining optimal.
[/quote:2e28b30d34]
As far as the compression model goes, the canonical method has an
upper bound of 256 bytes using the standard approach, but have heard
that there are more efficient approaches that reduce this further,
though I have yet to see actual numbers stating the upper and lower
bounds for these methods. My current implementation requires 2.375 -
319.875 bytes of overhead, but from what I gather, these ranges could
also be reduced using advanced techniques (as detailed in Nighlight's
post). So I don't think the matter has been settled just yet.
What would be *really* useful here would be a canonical implementation
that could be used as a comparison.
at (no spam) nightlight: Many thanks, I'll look into those techniques. |
|
|
| Back to top |
|
|
|
| Willem... |
Posted: Sun Sep 27, 2009 10:41 am |
|
|
|
Guest
|
sebastian wrote:
)>> The canonical method is also always optimal, unless somebody chooses to limit the maximal symbol length. This limit is by no means central to the method.
)
) I've never heard that before from any descriptions of the canonical
) method that I've read. If that is the case, though, then the two
) methods are equally optimal, obviously.
Do you actually *understand* the canonical method ?
You seem to be arguing from "what you've read". That's hardly a basis for
argument. The facts I am presenting to you are based on a simple
understanding of the basic insight behind canonical encoding.
)>> Canonical encoding represents less information(*) than tree encoding. Therefore canonical encoding takes less space, while remaining optimal.
)
) As far as the compression model goes, the canonical method has an
) upper bound of 256 bytes using the standard approach,
My argument has *nothing to do* with any practical or standard approach.
It is a purely mathematical observation.
If you want to refute the argument, either address the argument or explain
what you do not understand about it. Don't start waving about some random
numbers you've read about somewhere.
) What would be *really* useful here would be a canonical implementation
) that could be used as a comparison.
No, what would be really useful is if you would first try to gain some
insight in the theoretical background behind huffman encoding and canonical
huffman.
If you don't understand it, just ask, we will be happy to explain the
basics and details of the theory. But please *do* try to understand
the theory before you start to make wild claims based on some numbers
that you've read somewhere.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Sun Sep 27, 2009 2:55 pm |
|
|
|
Guest
|
[quote:4d6763c97d]Do you actually *understand* the canonical method ?
[/quote:4d6763c97d]
Roughly: You reorder the codes first by length then lexigraphically,
start at zero, increment for each new code, left shifting if the
length increases. Do I understand the underlying mathematical
significance of the method? Not really. I can see that it makes
transmitting the model very efficient, but that's basically it. Now if
I were interested in actually implementing it, I would probably delve
into it a lot deeper. But I'm not, which is why I'm just "waving
around numbers" and resorting to "from what I've read" kinds of
statements. I know that may be hard for you dyed in the wool
mathematical geniuses to understand, but the fact is a programming
mortal such as myself has more pressing things to do than become an
expert at a method that may or *may not* be as compact as the one I am
currently using. Especially since this is a matter of, oh, say 100
bytes or so of difference!
Anyway, instead of arguing about it, I'll just concede that the
canonical method *may* yield a more compact model. If you or anyone
else can provide a concrete mathematical description (or even better,
a bit of pseudocode), great. If not, I'm not going to lose too much
sleep over it. |
|
|
| Back to top |
|
|
|
| biject... |
Posted: Mon Sep 28, 2009 3:44 am |
|
|
|
Guest
|
On Sep 28, 6:56 am, "Harold Aptroot" <harold.aptr... at (no spam) gmail.com> wrote:
[quote:f426a9f493]"sebastian" <sebastianga... at (no spam) gmail.com> wrote in message
news:0512965d-1cfd-453f-b8d8-ee313a626148 at (no spam) p36g2000vbn.googlegroups.com...
snip
Anyway, instead of arguing about it, I'll just concede that the
canonical method *may* yield a more compact model. If you or anyone
else can provide a concrete mathematical description (or even better,
a bit of pseudocode), great. If not, I'm not going to lose too much
sleep over it.
See RFC 1951, page 7http://tools.ietf.org/html/rfc1951#page-8
[/quote:f426a9f493]
I like the example at your reference.
It shows why the canonical form as
usually done is suboptimal. For the
example given in the link where you have
ABCD they show the link vector needed to
be attached as 1,2,3,3 when in reality if
doing huffman you only need 1,2 the others
the 3,3 does not need to be sent since the
3,3 is only valid huffman lengths possible
so its a waste of space to specify them.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link" |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Mon Sep 28, 2009 4:59 am |
|
|
|
Guest
|
[quote:43b4058f43]See RFC 1951, page 7
[/quote:43b4058f43]
Right, so what I got from that was that the code lengths are a minumum
of 5 bits each, since the code size is limited to 32 bits, and that
the lengths for all 256 characters are transmitted. Not to mention a
number of extra required and optional bits. Correct?
So again, my original calculation was, in fact, correct. A perfectly
optimal canonical algorithm would require a full 8 bits per symbol to
represent the code-lengths (as the maximum length would be 255 bits),
and so the total overhead would be 8 (bits for length) * 256 (total
symbols) == 256 bytes, obviously. Naturally, if you are willing to
sacrifice a bit of optimality for practical purposes (which Willem and
others have proven to be a minimal cost, for the general case), you
could use as little as 5 bits per symbol, or 160 bytes. Fine.
Needless to say, there may indeed be opportunities to compress the
compression model itself. Of course, this applies to the model I am
currently using, as well. So basically it all boils down to this: The
canonical model is more or less constant in size, and clearly
economical. The 'embedded tree' model is variant in size, but also
certainly economical (by some measure, more or less than the former).
Point is, the two are quite comparable, and thus equally viable
alternatives. I realize that I may have initially come across as
claiming that my method was somehow better than the canonical
approach, but I really didn't mean it that way. Just because you find
a clove of garlic for 3 cents cheaper across town from where you
normally shop doesn't mean it's necessarily worth the drive
(especially if the new competitors prices are subject to fluctuate
wildly!). Rather, I was merely stating that the method was 1) easier
to implement, and 2) *possibly* more compact (sometimes).
Okay, so what about the matter of decoding speed? As I said earlier,
the decoder I am using traverses the tree to output each symbol
(AFAIK, the only way possible with classical Huffman methods), and
this amounts to nearly three times slower than the encoder, which is a
very fast lookup-table based system. I honestly don't know how much
faster it would be to decode canonical codes, as I have no reference
implementation to speak of. A wild guess would be something like twice
as fast, although I'm not exactly clear what specific methods are used
to speed things up. Perhaps someone here would be more familiar with
the kind of performance to expect. In any case, if the processing rate
really is much better, I probably will switch to canonical encoding.
But as I said, without a reference, and given that the current method
I am using is reasonably fast, the idea of coding one up "just to see
for myself" simply hasn't presented itself as an urgent matter. |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Mon Sep 28, 2009 6:04 am |
|
|
|
Guest
|
[quote:9d0765b38d]If you build a huffman tree for a set of symbols, then you'll notice that at a lot of points along the way, you have a choice to make. A choice that does not actually make any difference at all on the resulting size. To be more specific: whenever you join two symbol sets, you have to assign a 0-bit to one set and a 1-bit to the other.
[/quote:9d0765b38d]
Okay, but that just sounds like you are describing specifically what
codes are chosen for each symbol, and in any case, both canonical and
classical produce 100% optimal codes (otherwise, Huffman's paper would
need to be retitled "A Method for the Construction of Minimum(ish)-
Redundancy Codes", wouldn't it?).
[quote:9d0765b38d]That's where canonical encoding comes in. As it turns out, if you only record the path length of each node, then you can reconstruct a tree, which happens to be one of the thousands of possible trees that would come out of the original method.
[/quote:9d0765b38d]
Right, and to transmit said lengths would require 8 bits for optimal,
perhaps 5 otherwise, *per symbol*, so say 256 to 160 bytes, depending
on which was chosen (for the moment setting aside possible compression
of the model itself), correct?
[quote:9d0765b38d]So, canonical encoding records less information than classical. It's quite simple to see, then, that canonical encoding takes less space.
[/quote:9d0765b38d]
Okay, but what is the *precise* upper-bound for this model (again,
neglecting post-compression for the moment)? |
|
|
| Back to top |
|
|
|
| Harold Aptroot... |
Posted: Mon Sep 28, 2009 6:56 am |
|
|
|
Guest
|
"sebastian" <sebastiangarth at (no spam) gmail.com> wrote in message
news:0512965d-1cfd-453f-b8d8-ee313a626148 at (no spam) p36g2000vbn.googlegroups.com...
<snip>
[quote:a0304607c7]Anyway, instead of arguing about it, I'll just concede that the
canonical method *may* yield a more compact model. If you or anyone
else can provide a concrete mathematical description (or even better,
a bit of pseudocode), great. If not, I'm not going to lose too much
sleep over it.
[/quote:a0304607c7]
See RFC 1951, page 7
http://tools.ietf.org/html/rfc1951#page-8 |
|
|
| Back to top |
|
|
|
| Willem... |
Posted: Mon Sep 28, 2009 7:14 am |
|
|
|
Guest
|
sebastian wrote:
)>> If you build a huffman tree for a set of symbols, then you'll notice that at a lot of points along the way, you have a choice to make. A choice that does not actually make any difference at all on the resulting size. To be more specific: whenever you join two symbol sets, you have to assign a 0-bit to one set and a 1-bit to the other.
)
) Okay, but that just sounds like you are describing specifically what
) codes are chosen for each symbol, and in any case, both canonical and
) classical produce 100% optimal codes (otherwise, Huffman's paper would
) need to be retitled "A Method for the Construction of Minimum(ish)-
) Redundancy Codes", wouldn't it?).
The *point* is that you do *not need* to describe specifically what codes
are chosen and *still* have a 100% optimal code.
)>> That's where canonical encoding comes in. As it turns out, if you only record the path length of each node, then you can reconstruct a tree, which happens to be one of the thousands of possible trees that would come out of the original method.
)
) Right, and to transmit said lengths would require 8 bits for optimal,
) perhaps 5 otherwise, *per symbol*, so say 256 to 160 bytes, depending
) on which was chosen (for the moment setting aside possible compression
) of the model itself), correct?
Only if you transmit it in a completely naive way, say if you didn't care
about a few hundred bytes of overhead. See below.
)>> So, canonical encoding records less information than classical. It's quite simple to see, then, that canonical encoding takes less space.
)
) Okay, but what is the *precise* upper-bound for this model (again,
) neglecting post-compression for the moment)?
To be honest, you kind of sound like a manager when you say that.
The precise answer is: it depends. All we can say for sure is that it
is lower than the precies upper-bound for the classical model.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Mon Sep 28, 2009 8:34 am |
|
|
|
Guest
|
[quote:59d68c8895]The *point* is that you do *not need* to describe specifically what codes are chosen and *still* have a 100% optimal code.
[/quote:59d68c8895]
Sure, that's a certainly a useful property, but then so is being able
to transmit just the symbols used and their respective positions in
the tree. For data containing N/2 unique symbols the latter, in fact,
produces *as compact* of a representation of the compression model (if
not more), than the former. That's the entire ASCII set.
[quote:59d68c8895]Only if you transmit it in a completely naive way, say if you didn't care about a few hundred bytes of overhead. See below.
[/quote:59d68c8895]
Yes, but the technique I am using is also naive (eg: no compression
applied to the model itself) and thus could probably be reduced as
well. I'm not saying that I'm certain that it can be, as I haven't yet
really looked into it, but my rationale is simply that if can be done
with canonical, then why not with this?
[quote:59d68c8895]To be honest, you kind of sound like a manager when you say that.
[/quote:59d68c8895]
Honestly, I just wanted to pin down a number because I was beginning
to doubt myself!
[quote:59d68c8895]The precise answer is: it depends. All we can say for sure is that it is lower than the precies upper-bound for the classical model.
[/quote:59d68c8895]
Which is exactly what I have been saying from the very beginning. I
don't have any doubt that the canonical method is an effective and
efficient approach to Huffman encoding. But neither do I see it as the
only reasonable way to do things. I think my method is a nice solution
in it's own right. Why don't you put it to the test yourself and see
just how well/badly it performs compared with a canonical algorithm? |
|
|
| Back to top |
|
|
|
| Willem... |
Posted: Mon Sep 28, 2009 9:41 am |
|
|
|
Guest
|
sebastian wrote:
)>> The *point* is that you do *not need* to describe specifically what codes are chosen and *still* have a 100% optimal code.
)
) Sure, that's a certainly a useful property, but then so is being able
) to transmit just the symbols used and their respective positions in
) the tree. For data containing N/2 unique symbols the latter, in fact,
) produces *as compact* of a representation of the compression model (if
) not more), than the former. That's the entire ASCII set.
In most compression schemes, there are more steps involved, and the
other symbols do get used. So just because text is ASCII doesn't mean
the huffman encoder only sees ASCII data.
That's probably the reason why they never cared to add any provisions
for sparse symbol sets to the canonical encoders.
)>> Only if you transmit it in a completely naive way, say if you didn't care about a few hundred bytes of overhead. See below.
)
) Yes, but the technique I am using is also naive (eg: no compression
) applied to the model itself) and thus could probably be reduced as
) well. I'm not saying that I'm certain that it can be, as I haven't yet
) really looked into it, but my rationale is simply that if can be done
) with canonical, then why not with this?
Simply because canonical contains *less information* and is therefore
inherently more compact. Is that really so hard to understand ?
)>> To be honest, you kind of sound like a manager when you say that.
)
) Honestly, I just wanted to pin down a number because I was beginning
) to doubt myself!
You don't have any numbers other than the worst and best case for classical
encoding, and you already have both those numbers for canonical as well.
(For canonical, both are smaller).
)>> The precise answer is: it depends. All we can say for sure is that it is lower than the precies upper-bound for the classical model.
)
) Which is exactly what I have been saying from the very beginning. I
) don't have any doubt that the canonical method is an effective and
) efficient approach to Huffman encoding. But neither do I see it as the
) only reasonable way to do things. I think my method is a nice solution
) in it's own right.
I still wonder what's new about your method ? Don't you think that
transmitting the tree was done even before canonical was invented ?
My very first huffman implementation (around 1990) transmitted the tree as
well. There is a Dr. Dobbs journal article about Huffman that details how
to transmit the tree, from that time as well (I used it as a reference).
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Mon Sep 28, 2009 11:59 am |
|
|
|
Guest
|
[quote:707cf329ee]In most compression schemes, there are more steps involved, and the other symbols do get used. So just because text is ASCII doesn't mean the huffman encoder only sees ASCII data.
[/quote:707cf329ee]
Well that's too bad, because there sure is a lot to be gained from an
undisturbed distribution.
[quote:707cf329ee]Simply because canonical contains *less information* and is therefore inherently more compact. Is that really so hard to understand ? You don't have any numbers other than the worst and best case for classical encoding, and you already have both those numbers for canonical as well. (For canonical, both are smaller).
[/quote:707cf329ee]
The real question is how you reached the conclusion that 160 is less
than 3? Look, I've offered the code up for review. I've provided solid
numbers that have proven to be correct. I've made an earnest effort to
make an honest assessment of the situation. If you want to continue
believing in fairytales, go right ahead. But I think that the most
pertinent facts have been sufficiently fleshed out.
[quote:707cf329ee]I still wonder what's new about your method ? Don't you think that transmitting the tree was done even before canonical was invented ? My very first huffman implementation (around 1990) transmitted the tree as well. There is a Dr. Dobbs journal article about Huffman that details how to transmit the tree, from that time as well (I used it as a reference).
[/quote:707cf329ee]
I've never heard as much, up to this point, no. And if so, why didn't
you mention it before? Saving it up for your coup de grāce, I suppose?
Even so, why should it even matter? Can we not first consider an
algorithm based strictly on it's merits, or shall we immediately deem
it worthless since, after all, "it's already been done before"?
Laplace discovered the black hole long before Hubble was even born,
but you don't hear anyone shouting "Been there! Done that!", either.
At any rate, a search of both Dr. Dobbs and Google turned up nothing,
so unless you can provide a more concrete reference (eg: issue#,
month, etc), I'm just going to have to assume that you are mistaken.
Respectfully,
- Sebastian Garth |
|
|
| Back to top |
|
|
|
| sebastian... |
Posted: Mon Sep 28, 2009 6:51 pm |
|
|
|
Guest
|
[quote:5a2ac83b32]Well, no. The code lengths are limited to 15. Also the lengths of all
286 literal/length symbols are sent, which includes the 256 literal
characters, the end-of-block code, and the 29 length prefixes. That is
followed by the description of another code with the code lengths for
30 distance prefix symbols. All of those code lengths are themselves
compressed using run-length compression (there are often runs of zeros
for symbols that don't appear) and Huffman coding.
Each length and distance prefix symbol implies a base value and number
of extra bits that follow the code to add to the base. The shorter
lengths and closer distances are more common, and so have no extra
bits, with more and more extra bits for the longer lengths and farther
distances.
Anyway, you are referring to the description of fixed blocks in section
3.2.6, where the Huffman code is pre-defined. The topic of this thread
is (I think) how to describe a Huffman code in the compressed stream.
That is what's done in dynamic blocks, which is described in section
3.2.7.
[/quote:5a2ac83b32]
Okay, I see (well, vaguely, at least). It's a bit of a mind-numbing
spec, anyway. It does seems odd, though, that it would go to all that
trouble and yet produce such a large compression model. But yes,
thanks for clarifying that.
[quote:5a2ac83b32]Upper bound of what?
[/quote:5a2ac83b32]
Of the transmitted compression model (in bits, say).
Anyway, this may not be the best example - I was really looking for a
reasonably tight implementation to compare with. I have looked into a
few of them, including David's bijective coders, and while they did
seem to generate the 'right' size for the compressed data, for some
reason, the generated decompressed stream was a bit scrambled. In any
event, the theoretical upper bound for the best case seems to be
around 160 bytes or so (with a 32-bit code limit). From the ones I
have tested, though (assuming the compressed output was correct), none
were more than 20 bytes smaller than my implementation (all being
small files, as well), and in most cases mine was several hundred
bytes smaller, for larger files. Which really just reinforces my
belief that the embedded tree method is a perfectly acceptable
approach. So basically, if anyone's to prove to me that canonical is
really so much better, at this point, it's probably (though not
necessarily) going to require a working program to convince me of the
fact. Either way, I think both are equally suitible for the job at
hand, so why split hairs over it? |
|
|
| Back to top |
|
|
|
| Willem... |
Posted: Tue Sep 29, 2009 10:43 am |
|
|
|
Guest
|
sebastian wrote:
)>> Simply because canonical contains *less information* and is therefore inherently more compact. Is that really so hard to understand ? You don't have any numbers other than the worst and best case for classical encoding, and you already have both those numbers for canonical as well. (For canonical, both are smaller).
)
) The real question is how you reached the conclusion that 160 is less
) than 3?
160 is for *all cases*
3 is the *best possible case*
You're comparing apples and oranges, like I've been saying all along.
What I'm saying is that the theoretical lower bound on canonical encoding
is strictly lower than the theoretical lower bound on classical encoding.
I'm not only saying that, I proved it several times. Rigorously.
The real question is why are you ignoring mathematical proofs ?
) I've never heard as much, up to this point, no. And if so, why didn't
) you mention it before? Saving it up for your coup de grāce, I suppose?
It was and is not pertinent to the discussion. As you yourself state
below.
) Even so, why should it even matter? Can we not first consider an
) algorithm based strictly on it's merits, or shall we immediately deem
) it worthless since, after all, "it's already been done before"?
Which is exactly why I did not mention it before. Sheesh.
) Laplace discovered the black hole long before Hubble was even born,
) but you don't hear anyone shouting "Been there! Done that!", either.
) At any rate, a search of both Dr. Dobbs and Google turned up nothing,
) so unless you can provide a more concrete reference (eg: issue#,
) month, etc), I'm just going to have to assume that you are mistaken.
Februari 1991, as per the very first hit on google.
Anyway, if you want actual numbers, I'll oblige.
Here's a canonical encoding scheme that I just came up with on the fly.
I'm pretty sure it is always shorter than your tree encoding:
- Sort all symbols on length, shortest first.
- Set current length to 0
Loop until all symbols have been processed (*):
While the length of the next symbol is larger than the current length:
Increase the current length by and and output a '1' bit.
Output a '0' bit.
Output the symbol.
*) You can tell when the last symbol was processed by checking what the
bit-encoding of the current symbol is. If you reached all-ones, you're
done.
For the case of 2 symbols, 'L' and 'R' this would transmit:
1 0 <bitcode for 'L'> <bitcode for 'R'>
For a total of 18 bits. That's one bit less than the tree encoding.
For the case of 256 equiprobable symbols, this would transmit:
1 1 1 1 1 1 1 1 and then 256 times (0 <a symbol>)
for a total of 289 bytes. The tree took over 300, didn't it ?
I'm sure you can work out some other cases.
(Note that this is not by any means the most optimal way of transmitting
the canonical model.)
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Tue Dec 08, 2009 8:00 am
|
|