| |
 |
|
|
Science Forum Index » Compression Forum » compression challenge
Page 1 of 1
|
| Author |
Message |
| mriedijk |
Posted: Tue Feb 26, 2008 11:05 am |
|
|
|
Guest
|
Hi,
I have a compression challenge:
We have a sequence of 16 different numbers in a fixed order, each
number between 0-255.
We need a lossless compression algorithm that can compress that to 4
bytes.
When decompressing, we need to split the numbers and keep the order.
Does anyone has a suggestion in what direction to look or a solution?
Regards,
Michiel |
|
|
| Back to top |
|
| Mark Adler |
Posted: Tue Feb 26, 2008 11:38 am |
|
|
|
Guest
|
On Feb 26, 1:05 pm, mriedijk <michiel.ried...@gmail.com> wrote:
Quote: We have a sequence of 16 different numbers in a fixed order, each
number between 0-255.
The only mileage you can get is out of "different", which I took to
mean that no byte value is repeated in the list of 16 numbers. That
would allow you to compress the 16 bytes to 15.9137 bytes. I took
"fixed order" to mean that the order of the numbers matters. If the
order doesn't matter, then you can get it down to 10.3824 bytes.
There would be need to be one or more very significant additional
constraints in order to get the number of possible sequences down to
2^32 (four bytes).
Mark |
|
|
| Back to top |
|
| Einstein |
Posted: Tue Feb 26, 2008 12:21 pm |
|
|
|
Guest
|
On Feb 26, 1:38 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:
Quote: On Feb 26, 1:05 pm, mriedijk <michiel.ried...@gmail.com> wrote:
We have a sequence of 16 different numbers in a fixed order, each
number between 0-255.
The only mileage you can get is out of "different", which I took to
mean that no byte value is repeated in the list of 16 numbers. That
would allow you to compress the 16 bytes to 15.9137 bytes. I took
"fixed order" to mean that the order of the numbers matters. If the
order doesn't matter, then you can get it down to 10.3824 bytes.
There would be need to be one or more very significant additional
constraints in order to get the number of possible sequences down to
2^32 (four bytes).
Mark
Initially I thought:
With a maximum of 256 outcomes, you should be able to easily manage
this. Represent first value. If remaining values are less than a given
binary representation, that representation is used next. Continue.
Worst case, you get 1, 2, 3, 4, 5, 6, 7, 8... this would mean 16
bytes. So therefore we need a command section to say less/more or we
can use this alternative where we deduce if most values are under
represented.
So 1 bit to say before each if the following number is within
[remaining bit length available / 2] then to run it. However worst
case this still creates issues with some outcomes, but statistically a
very low number of them, where you have 4 bits + 1 bit for every one
of the 16 bits. This could however be resolved with a single bit more
at the front if "Most are inside the first 1/2" and then an additional
indicator if most are inside the first "1/4". This would then mean 2+1
bits per on whatever we define most as, and worst case of 4+1 on the
remaining, plus 2 for a command section. If they landed in the 2nd
quarter we would spend 3+1 with a command of 2 bits.
We also could use a 'closer to, farther from' representation, and
normally compress, where 1 bit prior to each stored area would
indicate if the amount is up to 8 bits from the current location, or
if it is less than 5 bits from the current location. This would cover
32 outcomes close to the origin, but would still have issues overall
due to the addition of a bit to each individual.
There are other methods... I had made a count to system, where you had
4 bits that provided a count, then a bit to say "continue or end?"
where continue would add 4 more bits to the whole and then it would be
representing your number. This would handle only numbers within 16
points of the last number however, and would generally add a bit per
to the whole, not resulting in measurable compression... but the funny
thing about that is the counting can be compressed if commonly it
defaults on way or the other :P
I could manage it if I knew more info, but not the programming side.
Statistically this is somewhat a challenge, but not to much of a
challenge. |
|
|
| Back to top |
|
| mriedijk |
Posted: Sat Mar 01, 2008 2:05 am |
|
|
|
Guest
|
On Feb 26, 11:21 pm, Einstein <michae...@gmail.com> wrote:
Quote: On Feb 26, 1:38 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:
On Feb 26, 1:05 pm, mriedijk <michiel.ried...@gmail.com> wrote:
We have a sequence of 16 different numbers in a fixed order, each
number between 0-255.
The only mileage you can get is out of "different", which I took to
mean that no byte value is repeated in the list of 16 numbers. That
would allow you to compress the 16 bytes to 15.9137 bytes. I took
"fixed order" to mean that the order of the numbers matters. If the
order doesn't matter, then you can get it down to 10.3824 bytes.
There would be need to be one or more very significant additional
constraints in order to get the number of possible sequences down to
2^32 (four bytes).
Mark
Initially I thought:
With a maximum of 256 outcomes, you should be able to easily manage
this. Represent first value. If remaining values are less than a given
binary representation, that representation is used next. Continue.
Worst case, you get 1, 2, 3, 4, 5, 6, 7, 8... this would mean 16
bytes. So therefore we need a command section to say less/more or we
can use this alternative where we deduce if most values are under
represented.
So 1 bit to say before each if the following number is within
[remaining bit length available / 2] then to run it. However worst
case this still creates issues with some outcomes, but statistically a
very low number of them, where you have 4 bits + 1 bit for every one
of the 16 bits. This could however be resolved with a single bit more
at the front if "Most are inside the first 1/2" and then an additional
indicator if most are inside the first "1/4". This would then mean 2+1
bits per on whatever we define most as, and worst case of 4+1 on the
remaining, plus 2 for a command section. If they landed in the 2nd
quarter we would spend 3+1 with a command of 2 bits.
We also could use a 'closer to, farther from' representation, and
normally compress, where 1 bit prior to each stored area would
indicate if the amount is up to 8 bits from the current location, or
if it is less than 5 bits from the current location. This would cover
32 outcomes close to the origin, but would still have issues overall
due to the addition of a bit to each individual.
There are other methods... I had made a count to system, where you had
4 bits that provided a count, then a bit to say "continue or end?"
where continue would add 4 more bits to the whole and then it would be
representing your number. This would handle only numbers within 16
points of the last number however, and would generally add a bit per
to the whole, not resulting in measurable compression... but the funny
thing about that is the counting can be compressed if commonly it
defaults on way or the other :P
I could manage it if I knew more info, but not the programming side.
Statistically this is somewhat a challenge, but not to much of a
challenge.
Thanks, this gives us a direction to go after. |
|
|
| Back to top |
|
| Einstein |
Posted: Tue Mar 04, 2008 9:25 pm |
|
|
|
Guest
|
Quote: A good place for him to start would be to get the distribution of the 256
values for each byte.
No examples, just a statement
Plus the other attacks upon demonstrated examples.
At least I showed a willingness to try to help.
You sir did nothing.
Ergo I call thee troll.
Prove me wrong, one up me with solid proof, not just some quick
statement. Make it in the terms I tried to make. While I agree your
method has merit, it is only for those who understand it. Instead try
to show why you think so, how it works, and educate, dont show off.
If you do not in 48 hours I will killfile you as well as educate this
man on your method, how it would work, the problems with it, and such. |
|
|
| Back to top |
|
| MisterE |
Posted: Wed Mar 05, 2008 2:39 am |
|
|
|
Guest
|
Quote: Initially I thought:
With a maximum of 256 outcomes, you should be able to easily manage
this. Represent first value. If remaining values are less than a given
binary representation, that representation is used next. Continue.
Worst case, you get 1, 2, 3, 4, 5, 6, 7, 8... this would mean 16
bytes. So therefore we need a command section to say less/more or we
can use this alternative where we deduce if most values are under
represented.
why the fuck would you do that
Quote: So 1 bit to say before each if the following number is within
[remaining bit length available / 2] then to run it. However worst
case this still creates issues with some outcomes, but statistically a
very low number of them, where you have 4 bits + 1 bit for every one
of the 16 bits. This could however be resolved with a single bit more
at the front if "Most are inside the first 1/2" and then an additional
indicator if most are inside the first "1/4". This would then mean 2+1
bits per on whatever we define most as, and worst case of 4+1 on the
remaining, plus 2 for a command section. If they landed in the 2nd
quarter we would spend 3+1 with a command of 2 bits.
stupid
Quote: We also could use a 'closer to, farther from' representation, and
normally compress, where 1 bit prior to each stored area would
indicate if the amount is up to 8 bits from the current location, or
if it is less than 5 bits from the current location. This would cover
32 outcomes close to the origin, but would still have issues overall
due to the addition of a bit to each individual.
stupid
Quote: There are other methods... I had made a count to system, where you had
4 bits that provided a count, then a bit to say "continue or end?"
where continue would add 4 more bits to the whole and then it would be
representing your number. This would handle only numbers within 16
points of the last number however, and would generally add a bit per
to the whole, not resulting in measurable compression... but the funny
thing about that is the counting can be compressed if commonly it
defaults on way or the other
stupid
Quote: Statistically this is somewhat a challenge, but not to much of a
challenge.
You haven't done anything statistically.
You need to dervice representations from the statistical distribution of the
data, not the other way around.
A good place for him to start would be to get the distribution of the 256
values for each byte. |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Sat Sep 06, 2008 1:01 am
|
|