 |
|
| Computers Forum Index » Computer Compression » Paying for assistance... |
|
Page 1 of 1 |
|
| Author |
Message |
| Einstein... |
Posted: Thu Aug 06, 2009 3:00 pm |
|
|
|
Guest
|
I am humbled that entropy does work. However I express I have
communication issues with 'normal society'. I am still having issues
understanding the scope of the thing I have invented.
Recently I hired a programmer to make this algorithm into software. I
have made it not include an actual compression utility, but instead
just break down things into a series of files which then we can
compress with the utility of our choice. I hope to have something
inside the week on this software to present some statistics, benchmark
comparisons, and the likes.
However I am lacking a coherent message for others to understand
exactly what I have made, as well as a model to show a curve on a
graph sheet of probabilities, and an understanding of the exact nature
that entropy takes on the system I have developed.
So, in this one forum most suited for understanding what I have
created, I am posting a fiscally rewarding challenge.
Please identify in industry and laymens terms what I have developed,
how it works, and some table, formula, or something strong
mathematically on how it works.
Please note this entire work is protected by Copyright 2008 and
applicable whitepaper type laws. I am the creator, cease reading if
you wish to disagree, I do not wish to debate this. All rights
reserved except as noted.
I will break down the system in full now
You take a given file, and convert to binary.
You identify the outcomes of any given 8 bit sequences (or any size
really, just 8 for the example) such as 01011100 = 1014 outcomes.
You create a dictionary of the outcomes based on a hard coded
replacement schedule, such as "most common outcome is replaced by
00000000, list this outcome first, then list next outcome"
Example on 2 bits
Hardcoded desired:
00
01
10
11
Example's statistics
00 = 1045
01 = 1135
10 = 1097
11 = 1104
Therefore the dictionary will list 01111000 as the string standing
that 01 becomes 00 in the next file, 11 becomes 01, 10 becomes 10, and
00 becomes 11. This dictionary is future appended to a command file
noting switches we do.
Next we start pulling bits out of the source file, modifying them per
the dictionary, and placing them in a new file. Repeat until ended.
The next step involves creating 3 result files and a command file. The
dictionary is permanently part of the command file, indeed the
dictionary's file can be the factual command files location.
Now we examine 2 bits in the new source file. If the outcome is a 00,
we place a 0 inside each of the three result files. If the outcome is
a 10 we place a 0 in the first two temp files, and a 1 in the last
temp file. If the outcome is a 01 we place a 0 in the first file, and
a 1 in the second file. Nothing goes in the third file in this case.
If we have a 11 result we place a 1 result in the first file, and
nothing in the 2nd or 3rd result files.
Now the part I have a hard time explaining to others.
We act as if each result file is a source file, and run the system
again. This creates 9 total result files, which for organizational
purposes I have created a 1, 2,3... 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1,
3-2, 3-3 naming system.
This process is then run one last time, resulting in the original
command file, 3 more, and then 9 more command files, which should be
counted in length of bits per file, and appended to each other to a
master command file. It also has resulted in 27 total 'result' files.
1-1-1 to 3-3-3.
Now it seems apparent to me that statistically (I may be mistaken) the
3, 3-3, and 3-3-3 file (and so forth if we were to continue this
process) will look very similar in numbers of 1 and 0 outcomes as the
very original source file. However at this stage the 1-1-1 file will
have a dramatic number of 0's versus 1 outcomes.
Yes I acknowledge also a significant size increase will occur. This of
course will vary based upon the source file's ratio balances... It
seems to me that 1-1-1 with a even balance of 0's versus 1's will end
up with a ratio of 0.996093750 to 0.003906250 in 0's versus 1's. In a
text example with an 80% to 20% ratio this seems to get more severe...
0.999997440 to 0.000002560. Of course there is incidental rounding in
there, excel will append 0's in place at the end... so this is far
from statistically 'perfect' but should suffice as an example. It
seems to me that if we had enough source material this could result in
2,560 total 1's, versus an astonishing 996,093,750 total 0's. It seems
difficult for me to expect this could not be extremely well compressed
with such a ratio.
The total size, minus the master command file, prior to compression
(of all the result files) in this latter example seems near 250% of
original size, yet with most of the results seeing a dramatic
modification of the ratio of 0's to 1's it seems that this may
actually be more compressible than the original files. Even the former
example, with a 50%/50% ratio with a net size increase to 170.2%
(rounded up) seems more compressible with the best ratio now being
99.6% to 0.4%, the next few all being better than 94% to 6%, and many
more still holding dramatic levels of ratio imbalances. This is of
course math based upon a simplistic Excel Table I have created, and
maybe you will find many errors with these conclusions, which would be
important for me to know!
The idea being is now we run an efficient compression system on the
resulting files, and those with bad compression outcomes can be left
alone (or if a whole tree, say 3-3-1, 3-3-2, 3-3-3 it could be undone
to merely 3-3), and those that do compress can be rated versus
compressing the branch above them as well (so 1-1-1, 1-1-2 and 1-1-3
can be size rated versus 1-1 to see which has the smaller file size)
with of course as required notes in the master command file.
The master command file is ultimately required, but I yawn at trying
to really score it's size in. It can be nearly universally fixed in
size, and therefore is only an issue with smaller files. Yes there are
ways to reduce it's total overhead rather than fixing it, but just
lets assume this is non-essential to my request and it is just there
so you understand this system is lossless and can be undone equaling
entirely the original source file.
The mission therefore is to make a technical explanation, a laymans
explanation, and a mathematical formula/graph/something which conveys
the exact capabilities of this system.
I would further request that some compression utilities be listed
which would be able to 'attack' this output the most efficiently,
resulting in the highest possible compression yields.
I am not to worried overall, using a simplistic huffman I typically
get decent results (not highest, but results which show compression
overall), and expect that a pretty decent system might be challenged
with a version which runs this first.
It is my extreme hope this will fit entropy like a glove, following
the curve exactly (after using the most optimum compression algorithm
following whatever it is I have created, what would this be called?)
The compensation: $200 for the best result, or $50 a category if split
up amongst different winners for 'technical description', 'lay mans
description', 'math based formula/graph/example of some sort', and
'most effective compression utility'.
I may also add cash for runner up if I think a really good effort was
tried, or if I am extremely impressed I may give an additional $200
for something that makes my jaw drop on top of giving the
compensation.
I am not rich, this represents nearly 2 weeks of starvation diet for
me, as well as a lucky pair of weeks driving truck during this damned
recession. I initially had thought I could not cover this for a few
weeks, so I am surprised that I am able to do this so quickly, prior
to software being completed.
Please understand I am tired as I type this, I am in my math mind,
which is not a clear mind for others to understand my text always, and
I am in my nerd mode, which reduces my effectiveness across the board
to the nth power. So I apologize for issues.
Please post all answers inside here, though links to outside pages are
fine, I will not open any exe's or the likes (most likely). The
challenge ends in 14 days from this posting, at which time a winner
would be announced. If no entries are suitable, or work, there will be
no prize issued, and I will post such. I understand that this
community would be a great judge if I cheated someone, and could
provide expert testimony to such an effect if someone did answer
properly and effectively and if I cheated the said person. I have a
corrections officer degree as well and this further increases my
knowledge of my full liabilities.
The Winner(s) will be contacted via email to provide how they wish the
payment, via paypal or via cashiers check.
Michael Harrington, aka Einstein |
|
|
| Back to top |
|
|
|
| MichaelHH... |
Posted: Fri Aug 07, 2009 12:32 am |
|
|
|
Guest
|
My apologies, the dictionary sort is per each subset of files from a
source file, if I make 1-1 into 1-1-1, 1-1-2, 1-1-3 then a new
dictionary is made. I had thought that this was clear.
As for the 'wasting time' it is because this has eaten my life, and I
need answers. |
|
|
| Back to top |
|
|
|
| MichaelHH... |
Posted: Fri Aug 07, 2009 1:04 am |
|
|
|
Guest
|
On Aug 6, 6:32 pm, MichaelHH <michae... at (no spam) gmail.com> wrote:
Quote: My apologies, the dictionary sort is per each subset of files from a
source file, if I make 1-1 into 1-1-1, 1-1-2, 1-1-3 then a new
dictionary is made. I had thought that this was clear.
As for the 'wasting time' it is because this has eaten my life, and I
need answers.
Another issue others may take against me without really looking at
this.
The resulting files may not be balanced in bit representation. They
may be odd numbered (or it may be necessary to increase them to
roundable to a byte) and therefore in the command file there is a
section that would identify the need and thus choose fill or no fill,
and identify the length of the filler. Since we are talking so few
added bits, it is a non-issue for size in the end run, and not germane
to the request at hand except in so far as to make sure no 'holes' are
left.
My life has been wasted by this, and I will get answers. |
|
|
| Back to top |
|
|
|
| glen herrmannsfeldt... |
Posted: Sat Aug 08, 2009 8:33 am |
|
|
|
Guest
|
Thomas Richter <thor at (no spam) math.tu-berlin.de> wrote:
(big snip)
Quote: You are looking for an answer that motivates these a posteriori,
however, you should have motivated these rules a priori, *before* you
designed these rules. For example, there are *no* such "ad-hoc"
decisions in the Huffman coding, the BWT, or the arithmetic coding.
All "replacement rules" are only-data driven, and no "magic
constants" like applying a rule a given number of times, or a
rule that is data-independent.
I might not say that for arithmetic coding, but the result is
close enough not to worry about it. Sometimes speed is more
important than getting every last bit out of a compression
algorithm. Ad hoc rules sometimes help. Otherwise, I
agree for Huffman and BWT.
-- glen |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Sat Aug 08, 2009 11:15 am |
|
|
|
Guest
|
MichaelHH wrote:
Quote: Another issue others may take against me without really looking at
this.
The resulting files may not be balanced in bit representation. They
may be odd numbered (or it may be necessary to increase them to
roundable to a byte) and therefore in the command file there is a
section that would identify the need and thus choose fill or no fill,
and identify the length of the filler. Since we are talking so few
added bits, it is a non-issue for size in the end run, and not germane
to the request at hand except in so far as to make sure no 'holes' are
left.
My life has been wasted by this, and I will get answers.
You got an answer. "Don't let your life be wasted by this." Wasn't this
explicit enough?
Look, your "method" is completely ad-hoc, at best an unproven heuristic,
at worst a complete waste of time.
Try to think about the following - are there any deeper reasons why:
- you have exactly three output files (what is so special about this
number)?
- you have exactly the given replacement rules (what is so special about
them except that you create more zeros - though the rules do not even
depend on the input file)?
- you apply the method exactly thrice. Again, what is so special about
the number three?
You are looking for an answer that motivates these a posteriori,
however, you should have motivated these rules a priori, *before* you
designed these rules. For example, there are *no* such "ad-hoc"
decisions in the Huffman coding, the BWT, or the arithmetic coding. All
"replacement rules" are only-data driven, and no "magic constants" like
applying a rule a given number of times, or a rule that is data-independent.
This is why you fail. You start from the wrong end. Compression
algorithms are designed by some *insight* into the nature of the source,
not by pulling arbitrary rules out of your nose.
So long,
Thomas |
|
|
| Back to top |
|
|
|
| MichaelHH... |
Posted: Sat Aug 08, 2009 8:42 pm |
|
|
|
Guest
|
On Aug 8, 1:15 am, Thomas Richter <t... at (no spam) math.tu-berlin.de> wrote:
Quote: MichaelHH wrote:
Another issue others may take against me without really looking at
this.
The resulting files may not be balanced in bit representation. They
may be odd numbered (or it may be necessary to increase them to
roundable to a byte) and therefore in the command file there is a
section that would identify the need and thus choose fill or no fill,
and identify the length of the filler. Since we are talking so few
added bits, it is a non-issue for size in the end run, and not germane
to the request at hand except in so far as to make sure no 'holes' are
left.
My life has been wasted by this, and I will get answers.
You got an answer. "Don't let your life be wasted by this." Wasn't this
explicit enough?
Look, your "method" is completely ad-hoc, at best an unproven heuristic,
at worst a complete waste of time.
Try to think about the following - are there any deeper reasons why:
- you have exactly three output files (what is so special about this
number)?
- you have exactly the given replacement rules (what is so special about
them except that you create more zeros - though the rules do not even
depend on the input file)?
- you apply the method exactly thrice. Again, what is so special about
the number three?
You are looking for an answer that motivates these a posteriori,
however, you should have motivated these rules a priori, *before* you
designed these rules. For example, there are *no* such "ad-hoc"
decisions in the Huffman coding, the BWT, or the arithmetic coding. All
"replacement rules" are only-data driven, and no "magic constants" like
applying a rule a given number of times, or a rule that is data-independent.
This is why you fail. You start from the wrong end. Compression
algorithms are designed by some *insight* into the nature of the source,
not by pulling arbitrary rules out of your nose.
So long,
Thomas
The efforts involving 3 bit sequences creating up to 8 files was a
horrific fail when I was hobbying this many years ago. The 2 bit
version naturally makes 3 temp files.
The 'arbitrary' run it 3 times is more a misnomer.
You can, as I said, run it any number of times on any of the resulting
files, resulting in whatever formation you desire, but ultimately to
keep it simple I chose 3 times on all.
An example I guess
00 appears 2290 times
10 appears 2351 times
01 appears 3321 times
11 appears 1510 times
----------- Sorted by frequency:
11 appears 1510 times
00 appears 2290 times
10 appears 2351 times
01 appears 3321 times
This on a small txt file of some interesting information I was reading
and wanted to save.
So text commonly has an imbalance existing. Enough of that elementary
for you of a lesson however.
In this example roughly 60% of the code is 0's and 40% is 1's, if you
did my switch.
Now it gets more impressive at 4 bits.
----------- Sorted by frequency:
1011 appears 8 times
1010 appears 15 times
1101 appears 62 times
1000 appears 86 times
1100 appears 102 times
1111 appears 134 times
1110 appears 160 times
0001 appears 162 times
1001 appears 179 times
0011 appears 231 times
0100 appears 291 times
0101 appears 316 times
0000 appears 423 times
0010 appears 572 times
0111 appears 679 times
0110 appears 1316 times
Using the replacement at the start we can get a ratio of 5131 total
1's to 13813 total 0's
Again for you this is elementary work. This is a 73% to 27% ratio
difference, and is a very optimum level for compression.
Lower is less 'nice' higher is more 'nice'. All elementary.
After my method however, file 1.1.1 will have (if I am correct) a
probability ratio of a single 1, to about 25,000 0's
File 1.1.2 = 132 to 24,867
File 1.1.3 = 132 to 24,735
File 1.2.1 = 115 to about 24,700
File 1.2.2 = 1575 to about 23,000
File 1.2.3 = 1575 to 21,603
File 1.3.1 = 123 to about 23,000
1.3.2 = 1566 to +/- 21,400
1.3.3 = 1566 to 19,921
2.1.1 = 47 to 23,138
skipping down
2.3.3 = 11,315 to 3,055
3.1.1 = 97 to 18,153
And then file 3.3.3 will have 2,626 to 7,100
The interesting aspects here are that so many files are so greatly
altered. I admit yes there is an over-all size increase, but I can
after lots of years of tinkering KNOW that if the ratio is strongly
skewed enough, it is compressible, and sometimes even wildly
compressible. I listed 13 out of 27 files... yes... But the size of
files drops from a high in the range of 25,000 to a low in the range
of 8071.
The first file alone is a possibility where just counting where the 1
is can occur even! 16 bits to stand for it's 25000 bits is rather
dramatic. Then I need to account for about 106548 bits to achieve a
show of compression, out of a growth from 100,000 bits to 231,547
bits. But lets not stop there, we KNOW there are compression utilities
that love the high skews I show. If there is 115 bits inside 24000
that are 1's, we can still count them as well. 16 bits per at worst,
and of course we know that could be diminished. Now the need to reduce
is reduced even more, to a mere 7500. So Far this covers a grand total
of near on 138,000 bits removed in exchange for about 11,000 returned.
If that.
I feel like there is an error somewhere, but I cannot find it..
Swap any values, and the count on them comes in place...
So they get swapped back.
Change the ratio of 11 to 10 and 01 outcomes, and it reduces the total
of the file 1 result to a heavier favoring of 0's and make it an even
33.3% on the top 3 results and we still have a go back and do not do
this step on this file option, thus no size increase, which could be
of a lower down tree'd of a file. We can stop when using my system is
not wise, and we can continue when it is wise. Net result is that
either the file remains the same, or it compresses, and it seems my
system can set up a tighter curve on compression... to me at least.
Now there are claims that Shannons lower bound means there is still
plenty of room to complete compression. I am hoping I have done so.
Now to whit, what I have done, in my knowledge, is provide a system
which inflates a file, but significantly alters the entire ratio not
in the whole of the data, but by isolating the data to different
streams, or files, and makes some so intristically altered as to make
them 'bait' for a compression algorithm. The size increase would be
negated by the sudden huge compressions available on a multitude of
the files.
Now as I made a challenge, the challenge is offering money, it's
because I ran out of answers. I will always have a hard time answering
you to your college/university/God level education, and if I must, I
apologize. I am asking you to help for a fiscal recompensation and if
this is not enough I will advertise elsewhere for a while, while I try
to earn money, hoping that 90% of my spare cash is enough. Yes I will
eat peanut butter sandwhiches for a month to make this payment, and to
pay my programmer. If I had a million dollars I would gladly give 250
grand to the best programmer, and 250 grand to the best explainer to
get it over with, even though probably 99.9% of the people visiting
this group would laugh at me for paying so an extreme amount of money.
How much do you value a life at though?
So please, do not beat me up for not knowing the answers. I have
already admitted this at the start, and offered to pay for the
answers. |
|
|
| Back to top |
|
|
|
| MichaelHH... |
Posted: Sun Aug 09, 2009 9:09 pm |
|
|
|
Guest
|
So Thomas,
You are essentially saying I go down to entropy with this, and there
is no chance of going worse, and you have said it is unique, but seem
to imply it is not better than other methods already existing? Or am I
misunderstanding you sir?
BTW the only thing I will need to brush up on is integers again, been
a while. I use log a lot in determining the bits required to count
something with A=log(B)/log(2). This seems pretty straight forward,
though I am sleepy so I wont be making a table asap. But I will review
this soon as I can.
Could you provide some online sources for the low bounds of entropy
also please? Perhaps also those compressors which reach this limit? |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Sun Aug 09, 2009 9:38 pm |
|
|
|
Guest
|
MichaelHH wrote:
Quote: On Aug 8, 1:15 am, Thomas Richter <t... at (no spam) math.tu-berlin.de> wrote:
MichaelHH wrote:
The efforts involving 3 bit sequences creating up to 8 files was a
horrific fail when I was hobbying this many years ago. The 2 bit
version naturally makes 3 temp files.
The 'arbitrary' run it 3 times is more a misnomer.
You can, as I said, run it any number of times on any of the resulting
files, resulting in whatever formation you desire, but ultimately to
keep it simple I chose 3 times on all.
Oh well, let's waste my time, I'm in a good mood.
First, forget about the input being 8 bit sequences. This is just
uninteresting, and forget about the dictionary, this is only a
technicality and we can assume without loss of generality that the input
is sorted by probability (you will see that this doesn't matter, in
fact). For simplicity, let's assume that your input consists of two-bit
symbols (that is, the input alphabet is {00,01,10,11}). Then you apply
the substitution rules as you describe:
00 -> 000
01 -> 001
10 -> 01
11 -> 1
where you read the output files "column-wise", i.e. the first column
belongs to the first file, and so on.
Now define
p1 := probability of finding 00
p2 := probability of finding 01
p3 := probability of finding 10
p4 := probability of finding 11
Let's further assume that the number of bits in the input is N, i.e. you
have N/2 symbols in the input.
Then, we compute the expected size of the (uncompressed) output files:
For file 1, we clearly have:
l1 = N/2 (one bit per input symbol)
For file 2:
l2 = (p1+p2+p3) * N/2 (only output for input symbols 1,2,3 = {00},{01},{10})
For file 3:
l3 = (p1+p2) * N/2 (reasoning as above).
Now, we compute the entropy of each file (zero-order). This is as much
as you can get by "compressing" them with a standard huffman or
arithmetic coder. For that, we need the probablities of finding a 0 or a
1 in file n. Let's call them p10,p11 for file 1, p20,p21 for file 2,
p30, p31 for file 3. We get:
p10 = p1+p2+p3
p11 = p4
p20 = (p1+p2)/(p1+p2+p3)
p21 = (p3) /(p1+p2+p3) (conditioned probabilities if you like)
p30 = (p1) / (p1+p2)
p31 = (p2) / (p1+p2)
From that, one can compute the entropies, h1,h2,h3. The expected
lengths of files 1,2,3 are then given by multiplying the expected
(input) bit count by the entropy of the given file, i.e.
e1 = l1 * h1 = l1 * (p10 * log(p10) + p11 * log(p11))
and ditto for files 2 and 3.
Adding e1,e2,e3 up gives the total expected size of the overall output.
Here one gets (after putting all things together):
e = N/2 * (ln(p1 + p2 + p3) p1 + ln(p1 + p2 + p3) p2 + ln(p1 + p2 + p3)
p3 + p4 ln(p4) + ln((p1+p2)/(p1+p2+p3))*p1 + ln((p1+p2)/(p1+p2+p3))*p2 +
ln((p3)/(p1+p2+p3))*p3 + ln((p1) / (p1+p2))*p1 + ln((p2)/ (p1+p2)) * p2)
This still looks fairly complicated, but it is not. Use ln(a/b) = ln(a)
- ln(b), and insert this in the above formula, and cancel terms. You get
(after some back and forth, but nothing complicated):
e = N/2 * (ln(p1)*p1 + ln(p2) * p2 + ln(p3) * p3 + ln(p4) * p4)
Looks familiar? Of course, it is the entropy of the source. IOW, had you
used the ideal huffman or arithmetic coder on the source alphabet
{00,01,10,11}, you would have gotten the same length.
So, IOW, the first step of your method is enough, and it is nothing more
than a complicated method to compress down to entropy - so at least, it
doesn't get it any worse, which is neat. You don't need any further step
right here with the given source (zero order, four symbol alphabet).
By using the binary representation of *any other* source, you'll get it
worse since your method cannot distinguish from *where* in the source
coding the bit-pairs come from. Naturally, the first two bits (MSBs) are
much more weighted than the LSBs. Thus, instead of using two bits, you
could use an eight-bit representation, and the above math would go
through as above, and in *this* ideal case, you would again compress
down to entropy.
Applying the method a second time (on such ideal sources) doesn't give
you anything because it is already at entropy level.
Quote: I feel like there is an error somewhere, but I cannot find it..
No error, but no gain either. Just too complicated.
Quote: Now there are claims that Shannons lower bound means there is still
plenty of room to complete compression. I am hoping I have done so.
Claims are wrong. In fact, huffman wastes *at most* one bit per symbol
for the number of symbols going to infinity, arithmetic is ideal in the
limiting case. So where's the "room" you talk about?
The room is in finding the proper data models for sources, i.e.
"guessing" what the symbols and their probabilities will be.
Quote: Now to whit, what I have done, in my knowledge, is provide a system
which inflates a file, but significantly alters the entire ratio not
in the whole of the data, but by isolating the data to different
streams, or files, and makes some so intristically altered as to make
them 'bait' for a compression algorithm. The size increase would be
negated by the sudden huge compressions available on a multitude of
the files.
It does, but it doesn't go below entropy. The outputs are "ideally"
compressible to a degree that would reach exactly the entropy of the
source. That is, the bias is "only good enough" for a post-compressor
(of said files) to hit the lower bound. Which is neat, but "no
improvement". Of course not. Shannon is a theorem.
Quote: Now as I made a challenge, the challenge is offering money, it's
because I ran out of answers. I will always have a hard time answering
you to your college/university/God level education, and if I must, I
apologize. I am asking you to help for a fiscal recompensation and if
this is not enough I will advertise elsewhere for a while, while I try
to earn money, hoping that 90% of my spare cash is enough. Yes I will
eat peanut butter sandwhiches for a month to make this payment, and to
pay my programmer. If I had a million dollars I would gladly give 250
grand to the best programmer, and 250 grand to the best explainer to
get it over with, even though probably 99.9% of the people visiting
this group would laugh at me for paying so an extreme amount of money.
How much do you value a life at though?
Nothing at university level here. This is quite elementary mathematics -
take a pen, fill in the gaps I left (elementary algebra), you have an
answer in 15 minutes.
Quote: So please, do not beat me up for not knowing the answers. I have
already admitted this at the start, and offered to pay for the
answers.
Spare the money. In fact, I would suggest taking the $200 and invest it
in a good book, and take some time off in the evening to get the math
right. Actually, thanks for that - it looks like a good exercise for a
compression class.
So long,
Thomas |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Mon Aug 10, 2009 12:20 pm |
|
|
|
Guest
|
MichaelHH wrote:
Quote: So Thomas,
You are essentially saying I go down to entropy with this, and there
is no chance of going worse, and you have said it is unique, but seem
to imply it is not better than other methods already existing? Or am I
misunderstanding you sir?
You understand correctly. However, this goes only if your input alphabet
is four symbols large, not 256 as you use it, in which case it will not
compress to entropy because the method misses that the distribution of
the LSBs depends on the MSBs. The result furthermore depends on
"idealized" compressors for the files 1-3, i.e. those that compress
(already) down to entropy. For any real compressor there, you will have
some loss.
Quote: Could you provide some online sources for the low bounds of entropy
also please? Perhaps also those compressors which reach this limit?
Sorry, I don't understand?
So long,
Thomas |
|
|
| Back to top |
|
|
|
| Thomas Richter... |
Posted: Mon Aug 10, 2009 3:56 pm |
|
|
|
Guest
|
Thomas Richter wrote:
Quote: Could you provide some online sources for the low bounds of entropy
also please? Perhaps also those compressors which reach this limit?
Sorry, I don't understand?
Ah, I think you want to look at the Kraft inequality. Go to the
wikipedia for it - it defines a lower bound for a code to be uniquely
decodeable.
So long,
Thomas |
|
|
| Back to top |
|
|
|
|
|
All times are GMT
The time now is Sun Mar 21, 2010 9:51 pm
|
|