Main Page | Report this Page
 
   
Science Forum Index  »  Compression Forum  »  Compression Paradox
Page 1 of 1    
Author Message
Guest
Posted: Sun Mar 30, 2008 12:28 pm
Hy all,
I want to open a discussion about this strange aspect of compression.
We know in a binary string of length N we have 2^N different possible
configurations.
Mathematically it is highly improbable to compress a string of N bits
in a string (for example) of N/2 bits becouse we can make a
correspondence only for 2^(N/2) strings of a 2^N strings of length N.
So the compression "difficulty" increase exponentially with the
increasing of the compression ratio and with the increasing of the
string length .
This mathematical law is absolutely in contraddiction with everyday
experiences.
When I try to compress a file I am about sure to be able to compress
it with a ratio about 50% and greater is the file more I believe I am
able to obtain a great compression ratio !
We obtain this incredible result using so simple compression algorithm
( Is possible to write down a program of 10 lines and obtain gigabytes
of incompressible data for these compressor ) also
If we construct the most powerfull compression ( uncomputable ) this
result is mathematically impossible !

How is this possible?

How can we explain this paradox?

I suggest 2 different possible answer .

-1) The today compressor are not very powerfull as compressor but they
are very specialized for our data.

-2) For a string of N bits there are not 2^N possible strings
available .


In the first point I say that the compressor are able to make a
correspondence from short string to long string becouse they are able
to indentify a subset into the 2^N possible strings of used string .
This mean that we are not able to use a lot of string , we are not
able to use , to have , to see 2^N strings.
In the second point I say that for a system receiving informations,
sending informations there is a limit and this limit is not
approximable using exponential function .
I don't know if these explanation are correct ( but I think the
working compressor can be a good proof ) but in both case we can place
a limit on the available strings .
For example I place a limit of M on the size of the available
strings .
For a string of size N the distribution of available strings become
for example not 2^N but
Min(2^N,M/N)
Watching the behaviour of this formula and what happen in the real
there are something similar.
The compressor are not able to compress "small" string and in the
small string we have an exponential behaviour so in this side we have
a real mathematical behaviour so here the compressor are not able to
compress becouse it is about impossible to compress bit strings.
After a limit we have always a more different behaviour , increasing
the string size the available strings decrease so the compressor can
compress that string better and better and this is what happen when I
have a big file I am about sure to compress it.
When we have a string of length N we have again 2^N different possible
configurations becouse we don't know the bits string configuration but
when we get a string S we know this is one string on Min(2^N,M/N)
available and not 2^N available.
Roughly speaking thinking that all 2^N strings are available we are
making the hypothesis we are working with very very incredible
powerfull systems, I don't think so.
There are also other interesting consequence of this point of view but
I am out of topic ...

Denis.
Apo
Posted: Mon Mar 31, 2008 2:30 am
Guest
On Mar 31, 1:28 am, a.den...@yahoo.com wrote:
Quote:

I suggest 2 different possible answer .

-1) The today compressor are not very powerfull as compressor but they
are very specialized for our data.

... powerfull or not, for me it's hard to say, but with some

specialization for
some data types it's more effectivive - I agree with that.
I'm sure there are still more things to do in data compression field
and
I hope somebody will find more effective way to compress data.

Quote:
-2) For a string of N bits there are not 2^N possible strings
available .

It's a basic of all binary data - definately there is 2^N
possibilities
for string of length N.


Regards

Apo
giorgio.tani
Posted: Mon Mar 31, 2008 5:44 am
Guest
Quote:
-1) The today compressor are not very powerfull as compressor but they
are very specialized for our data.
As you say, the trich is that they are specialized on compressing

widely used data structures (text, waveforms, image frames etc).
Generally, they are more powerful as they are more specialized for one
(or more) data structures, or as they are more smart, and this cost in
terms of memory and computing power.
Some compressors approaches a compression ratio quite close to the
expected (uncompressible) enthropy of the file, so are definitely
powerful.

Quote:
-2) For a string of N bits there are not 2^N possible strings
available .
There are 2^N possible strings by definition, and powerful or not

powerful compressor will perform poorly on most of them since no
definite structure can be found, but perform well on non random ones,
that are the ones meaningful for us.
And the more the file grows big, the more the compression algorithm
can learn about it and the more it can refine its strategy on the
specifical data structure, so in practice you will tend to see better
results as the file grows bigger... if the file has a sort of
structure the compression algorithm can understand, otherwise the
result will be equally poor.
Stefano Brocchi
Posted: Mon Mar 31, 2008 8:53 am
Guest
Hello,

Quote:
-2) For a string of N bits there are not 2^N possible strings
available .

Nope, there surely are 2^N possible combinations

Quote:
-1) The today compressor are not very powerfull as compressor but they
are very specialized for our data.

Yes, this is the point ! The definition of 'powerful' for a compressor
is quite strange, but you got the reason :)

The fact is this: when we define a compressor, we are also defining
what kind of input data we are willing to compress. When in zip-based
algorithms we encode an already-seen strings with a pointer, we are
stating: we are assuming our input to have some strings that repeat
themselves more likely than others, and we will reduce the size of
them by removing this redundancy. On the other side, we are also
stating 'if our data isn't made this way, our algorithm won't
compress'. But we probably don't care: in every message or data that
is usually meaningful to us, there is some kind of redundancy like
this that can be exploited. An exception could be of course already-
compressed files, but this is because the redundancies have been
already removed.

At this point, compression works because we are assuming that our
feasible inputs are only a small subset of the 2^N possible inputs,
and these can be mapped to files much shorter than N bits long. A
compressor will be implemented to 'chose' the inputs to be compressed
that most likely reflect the data of our interest. Another example:
image compression often uses assumptions of smoothness, as images that
represent something to us usually contain some areas where the
intensity of color (or of brightness) doesn't change suddenly. Since
we use this assumptions, we are saying 'we will be able to compress
images that have smooth areas; in other cases we will fail'.
Alternatively, we could also design a compressor that compresses (of a
few bits) images where close pixels are completely different one from
another, but I thing this could hardly be useful to anybody.

Hope I've been clear :)

So long,
Stefano
Einstein
Posted: Mon Mar 31, 2008 3:40 pm
Guest
It is not 2^N

You can exceed 2^N via this simple proceedure, albiet writ much larger
than my method :P

Take 3 bits for example, normally 2^3 = 8

But you can also substitute in 2^2 and 2^1 to have values as well,
thus getting (2^4)-2


The practical problem is in tracking it. Writ large.... it's not so
'difficult' to do, but admittedly the odds of 'getting it to work'
increase dramatically against you the more you wish to compress in
this direction Razz
Guest
Posted: Tue Apr 01, 2008 12:38 pm
Quote:

There are 2^N possible strings by definition, and powerful or not
powerful compressor will perform poorly on most of them since no
definite structure can be found, but perform well on non random ones,
that are the ones meaningful for us.
And the more the file grows big, the more the compression algorithm
can learn about it and the more it can refine its strategy on the
specifical data structure, so in practice you will tend to see better
results as the file grows bigger... if the file has a sort of
structure the compression algorithm can understand, otherwise the
result will be equally poor.

Yes, I agree with you but what I want to point out is that this is
mathematically wrong ... or better what this mean is that the bit
strings have not an arbitrary bit arrangement becouse in this case it
is about impossible to compress them.
So why not conclude saying the number of N bit strings length are M/N
and not 2^N .
It seem to me we make a bigger approximation to use 2^N as a number of
available string instead of M/N or other functions but not
exponential .

Denis.
Guest
Posted: Tue Apr 01, 2008 1:23 pm
On 1 Apr, 03:40, Einstein <michae...@gmail.com> wrote:
Quote:
It is not 2^N

You can exceed 2^N via this simple proceedure, albiet writ much larger
than my method :P

Take 3 bits for example, normally 2^3 = 8

But you can also substitute in 2^2 and 2^1 to have values as well,
thus getting (2^4)-2

The practical problem is in tracking it. Writ large.... it's not so
'difficult' to do, but admittedly the odds of 'getting it to work'
increase dramatically against you the more you wish to compress in
this direction Razz

?
In a string of N bits you can have 2^N possible configurations
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Tue Oct 07, 2008 12:31 am