Main Page | Report this Page
 
   
Science Forum Index  »  Compression Forum  »  compression challenge
Page 1 of 1    
Author Message
mriedijk
Posted: Tue Feb 26, 2008 11:05 am
Guest
Hi,
I have a compression challenge:
We have a sequence of 16 different numbers in a fixed order, each
number between 0-255.
We need a lossless compression algorithm that can compress that to 4
bytes.
When decompressing, we need to split the numbers and keep the order.
Does anyone has a suggestion in what direction to look or a solution?
Regards,
Michiel
Mark Adler
Posted: Tue Feb 26, 2008 11:38 am
Guest
On Feb 26, 1:05 pm, mriedijk <michiel.ried...@gmail.com> wrote:
Quote:
We have a sequence of 16 different numbers in a fixed order, each
number between 0-255.

The only mileage you can get is out of "different", which I took to
mean that no byte value is repeated in the list of 16 numbers. That
would allow you to compress the 16 bytes to 15.9137 bytes. I took
"fixed order" to mean that the order of the numbers matters. If the
order doesn't matter, then you can get it down to 10.3824 bytes.

There would be need to be one or more very significant additional
constraints in order to get the number of possible sequences down to
2^32 (four bytes).

Mark
Einstein
Posted: Tue Feb 26, 2008 12:21 pm
Guest
On Feb 26, 1:38 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:
Quote:
On Feb 26, 1:05 pm, mriedijk <michiel.ried...@gmail.com> wrote:

We have a sequence of 16 different numbers in a fixed order, each
number between 0-255.

The only mileage you can get is out of "different", which I took to
mean that no byte value is repeated in the list of 16 numbers. That
would allow you to compress the 16 bytes to 15.9137 bytes. I took
"fixed order" to mean that the order of the numbers matters. If the
order doesn't matter, then you can get it down to 10.3824 bytes.

There would be need to be one or more very significant additional
constraints in order to get the number of possible sequences down to
2^32 (four bytes).

Mark

Initially I thought:
With a maximum of 256 outcomes, you should be able to easily manage
this. Represent first value. If remaining values are less than a given
binary representation, that representation is used next. Continue.

Worst case, you get 1, 2, 3, 4, 5, 6, 7, 8... this would mean 16
bytes. So therefore we need a command section to say less/more or we
can use this alternative where we deduce if most values are under
represented.

So 1 bit to say before each if the following number is within
[remaining bit length available / 2] then to run it. However worst
case this still creates issues with some outcomes, but statistically a
very low number of them, where you have 4 bits + 1 bit for every one
of the 16 bits. This could however be resolved with a single bit more
at the front if "Most are inside the first 1/2" and then an additional
indicator if most are inside the first "1/4". This would then mean 2+1
bits per on whatever we define most as, and worst case of 4+1 on the
remaining, plus 2 for a command section. If they landed in the 2nd
quarter we would spend 3+1 with a command of 2 bits.

We also could use a 'closer to, farther from' representation, and
normally compress, where 1 bit prior to each stored area would
indicate if the amount is up to 8 bits from the current location, or
if it is less than 5 bits from the current location. This would cover
32 outcomes close to the origin, but would still have issues overall
due to the addition of a bit to each individual.

There are other methods... I had made a count to system, where you had
4 bits that provided a count, then a bit to say "continue or end?"
where continue would add 4 more bits to the whole and then it would be
representing your number. This would handle only numbers within 16
points of the last number however, and would generally add a bit per
to the whole, not resulting in measurable compression... but the funny
thing about that is the counting can be compressed if commonly it
defaults on way or the other :P


I could manage it if I knew more info, but not the programming side.


Statistically this is somewhat a challenge, but not to much of a
challenge.
mriedijk
Posted: Sat Mar 01, 2008 2:05 am
Guest
On Feb 26, 11:21 pm, Einstein <michae...@gmail.com> wrote:
Quote:
On Feb 26, 1:38 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:



On Feb 26, 1:05 pm, mriedijk <michiel.ried...@gmail.com> wrote:

We have a sequence of 16 different numbers in a fixed order, each
number between 0-255.

The only mileage you can get is out of "different", which I took to
mean that no byte value is repeated in the list of 16 numbers.  That
would allow you to compress the 16 bytes to 15.9137 bytes.  I took
"fixed order" to mean that the order of the numbers matters.  If the
order doesn't matter, then you can get it down to 10.3824 bytes.

There would be need to be one or more very significant additional
constraints in order to get the number of possible sequences down to
2^32 (four bytes).

Mark

Initially I thought:
With a maximum of 256 outcomes, you should be able to easily manage
this. Represent first value. If remaining values are less than a given
binary representation, that representation is used next. Continue.

Worst case, you get 1, 2, 3, 4, 5, 6, 7, 8... this would mean 16
bytes. So therefore we need a command section to say less/more or we
can use this alternative where we deduce if most values are under
represented.

So 1 bit to say before each if the following number is within
[remaining bit length available / 2] then to run it. However worst
case this still creates issues with some outcomes, but statistically a
very low number of them, where you have 4 bits + 1 bit for every one
of the 16 bits. This could however be resolved with a single bit more
at the front if "Most are inside the first 1/2" and then an additional
indicator if most are inside the first "1/4". This would then mean 2+1
bits per on whatever we define most as, and worst case of 4+1 on the
remaining, plus 2 for a command section. If they landed in the 2nd
quarter we would spend 3+1 with a command of 2 bits.

We also could use a 'closer to, farther from' representation, and
normally compress, where 1 bit prior to each stored area would
indicate if the amount is up to 8 bits from the current location, or
if it is less than 5 bits from the current location. This would cover
32 outcomes close to the origin, but would still have issues overall
due to the addition of a bit to each individual.

There are other methods... I had made a count to system, where you had
4 bits that provided a count, then a bit to say "continue or end?"
where continue would add 4 more bits to the whole and then it would be
representing your number. This would handle only numbers within 16
points of the last number however, and would generally add a bit per
to the whole, not resulting in measurable compression... but the funny
thing about that is the counting can be compressed if commonly it
defaults on way or the other :P

I could manage it if I knew more info, but not the programming side.

Statistically this is somewhat a challenge, but not to much of a
challenge.

Thanks, this gives us a direction to go after.
Einstein
Posted: Tue Mar 04, 2008 9:25 pm
Guest
Quote:
A good place for him to start would be to get the distribution of the 256
values for each byte.

No examples, just a statement

Plus the other attacks upon demonstrated examples.

At least I showed a willingness to try to help.

You sir did nothing.

Ergo I call thee troll.


Prove me wrong, one up me with solid proof, not just some quick
statement. Make it in the terms I tried to make. While I agree your
method has merit, it is only for those who understand it. Instead try
to show why you think so, how it works, and educate, dont show off.

If you do not in 48 hours I will killfile you as well as educate this
man on your method, how it would work, the problems with it, and such.
MisterE
Posted: Wed Mar 05, 2008 2:39 am
Guest
Quote:
Initially I thought:
With a maximum of 256 outcomes, you should be able to easily manage
this. Represent first value. If remaining values are less than a given
binary representation, that representation is used next. Continue.

Worst case, you get 1, 2, 3, 4, 5, 6, 7, 8... this would mean 16
bytes. So therefore we need a command section to say less/more or we
can use this alternative where we deduce if most values are under
represented.

why the fuck would you do that

Quote:
So 1 bit to say before each if the following number is within
[remaining bit length available / 2] then to run it. However worst
case this still creates issues with some outcomes, but statistically a
very low number of them, where you have 4 bits + 1 bit for every one
of the 16 bits. This could however be resolved with a single bit more
at the front if "Most are inside the first 1/2" and then an additional
indicator if most are inside the first "1/4". This would then mean 2+1
bits per on whatever we define most as, and worst case of 4+1 on the
remaining, plus 2 for a command section. If they landed in the 2nd
quarter we would spend 3+1 with a command of 2 bits.

stupid

Quote:
We also could use a 'closer to, farther from' representation, and
normally compress, where 1 bit prior to each stored area would
indicate if the amount is up to 8 bits from the current location, or
if it is less than 5 bits from the current location. This would cover
32 outcomes close to the origin, but would still have issues overall
due to the addition of a bit to each individual.

stupid

Quote:
There are other methods... I had made a count to system, where you had
4 bits that provided a count, then a bit to say "continue or end?"
where continue would add 4 more bits to the whole and then it would be
representing your number. This would handle only numbers within 16
points of the last number however, and would generally add a bit per
to the whole, not resulting in measurable compression... but the funny
thing about that is the counting can be compressed if commonly it
defaults on way or the other Razz

stupid

Quote:
Statistically this is somewhat a challenge, but not to much of a
challenge.

You haven't done anything statistically.
You need to dervice representations from the statistical distribution of the
data, not the other way around.

A good place for him to start would be to get the distribution of the 256
values for each byte.
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Fri Sep 05, 2008 7:10 pm