| |
 |
|
|
Science Forum Index » Compression Forum » Random Compression, i have an idea
Page 1 of 1
|
| Author |
Message |
| yatno345 |
Posted: Tue Apr 29, 2008 5:44 pm |
|
|
|
Guest
|
Dear,
(i'm sorry with my english, i'm from Indonesia, here is my idea, don't
hesitate to correct me if i'm wrong, Thank you)
first, current compression method usually use frequently data,
redundant data, periodical data for compression (like Huffman coding,
LPC, etc). but random data doesn't have much redundant data so it's
hard to compress with current method.
i usually use kurtosis to extract property of data. random data have
kurtosis 3 or below 3. highly compressionable data its certain have
great kurtosis value.
so the idea is if we can process data that have kurtosis below 3 to
have great kurtosis value, we can compress random data.
for example :
if we have data with value $00 - $FF
kurtosis <=3 , its mean the data have distributed all over the ranged
number
if we can process data to have ranged $00 - $7F (or $00 - 3F even
better ), it can highly compressionable
my idea is,
if we have random data, for the example 8 byte data:
11001011
00101101
11001011
01110111
01110011
11010101
01101011
01110100
or if we concatenate, it become :
1100101100101101110010110111011101110011110101010110101101110100
if we can find the 8 pieces '0' position in the data with some
dynamically formula :
for example : f(x) = 3 4 6 9 10 12 15 19 (the number means position of
zero in the concatenated data). now we can swap the binary so it
result are :
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
Now the data have range $00 - $7F
if we do huffman coding it will be higher compressionable
if we can find 16 pieces of '0'
00xxxxxxx
00xxxxxxx
00xxxxxxx
00xxxxxxx
00xxxxxxx
00xxxxxxx
00xxxxxxx
00xxxxxxx
the range become $00 - $3F
its even higher compressionable
i do have experimenting, its result good, i can be compress the zipped
data, (but the header is long. the header saved the pattern of zero
position, the zipped data have kustosis below 3 usually 1.8)
THE PROBLEM IS,
if we have for example 100000 byte data
then we must find 100000 position of 'zeros' (>=100000 is even
better)
My Question is, Can Anyone help me to design formula that can find
zero position in the data?? |
|
|
| Back to top |
|
| Thomas Richter |
Posted: Wed Apr 30, 2008 6:14 am |
|
|
|
Guest
|
yatno345 wrote:
Quote: Dear,
(i'm sorry with my english, i'm from Indonesia, here is my idea, don't
hesitate to correct me if i'm wrong, Thank you)
first, current compression method usually use frequently data,
redundant data, periodical data for compression (like Huffman coding,
LPC, etc). but random data doesn't have much redundant data so it's
hard to compress with current method.
i usually use kurtosis to extract property of data. random data have
kurtosis 3 or below 3. highly compressionable data its certain have
great kurtosis value.
so the idea is if we can process data that have kurtosis below 3 to
have great kurtosis value, we can compress random data.
This is probably a misunderstanding, and your problem might be that you
confuse "random" with "iid with uniform contribution". Of course I can
compress "random iid data", how well, however, does not depend on the
Kurtosis, but on the entropy of the random distribution defining this
data. For a certain class of random distributions, your assertion might
be right, though (GGD, for example? I haven't tried to compute this, but
it sounds about right).
Quote:
for example :
if we have data with value $00 - $FF
kurtosis <=3 , its mean the data have distributed all over the ranged
number
if we can process data to have ranged $00 - $7F (or $00 - 3F even
better ), it can highly compressionable
Even if we have data in the full interval, it might be compressible. If
its values are only in a sub-interval, you can exploit this and
represent the data with less than eight bits. But this is only a
sufficient condition, not a necessary one.
Quote: my idea is,
if we have random data, for the example 8 byte data:
11001011
00101101
11001011
01110111
01110011
11010101
01101011
01110100
or if we concatenate, it become :
1100101100101101110010110111011101110011110101010110101101110100
if we can find the 8 pieces '0' position in the data with some
dynamically formula :
for example : f(x) = 3 4 6 9 10 12 15 19 (the number means position of
zero in the concatenated data). now we can swap the binary so it
result are :
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
0xxxxxxx
Now the data have range $00 - $7F
if we do huffman coding it will be higher compressionable
The question is not whether you can find such an f (a simple search over
the binary symbols will do, so f is rather trivial to construct), the
question is whether this representation is useful for your random
source. So why do you believe is this a good description for the
byte-sequence you want to compress?
Quote: i do have experimenting, its result good, i can be compress the zipped
data, (but the header is long. the header saved the pattern of zero
position, the zipped data have kustosis below 3 usually 1.
THE PROBLEM IS, can you expand your data again? (-:
Quote:
THE PROBLEM IS,
if we have for example 100000 byte data
then we must find 100000 position of 'zeros' (>=100000 is even
better)
My Question is, Can Anyone help me to design formula that can find
zero position in the data??
Where's the problem writing a simple for-loop over all bits in the
sequence and finding zeros?
However, I believe you have a conceptional error here, namely I do not
see why this is a *good* way to describe the data, i.e. why is this a
good model.
So long,
Thomas |
|
|
| Back to top |
|
| yatno345 |
Posted: Thu May 01, 2008 4:42 pm |
|
|
|
Guest
|
On Apr 30, 6:14 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
Quote: This is probably a misunderstanding, and your problem might be that you
confuse "random" with "iid with uniform contribution". Of course I can
compress "random iid data", how well, however, does not depend on the
Kurtosis, but on the entropy of the random distribution defining this
data. For a certain class of random distributions, your assertion might
be right, though (GGD, for example? I haven't tried to compute this, but
it sounds about right).
i'm sory i'm still have alot to learn,
is it enough to define random just :
* have distributed to all range (kurtosis below 3)
* not periodic
Quote:
for example :
if we have data with value $00 - $FF
kurtosis <=3 , its mean the data have distributed all over the ranged
number
if we can process data to have ranged $00 - $7F (or $00 - 3F even
better ), it can highly compressionable
Even if we have data in the full interval, it might be compressible. If
its values are only in a sub-interval, you can exploit this and
represent the data with less than eight bits. But this is only a
sufficient condition, not a necessary one.
in the full interval, it might be compressible but i think it have low
compression ratio,
my idea here is not to compress the data but to process the data to
have higher compressible
if we can process the 8-bit to have lower range (but still 8-bit) then
the occurence of the same data will be higher, then if we do huffman
(or etc) it will be higher compressible
Quote: The question is not whether you can find such an f (a simple search over
the binary symbols will do, so f is rather trivial to construct), the
question is whether this representation is useful for your random
source. So why do you believe is this a good description for the
byte-sequence you want to compress?
f is not a loops search,
yes we can find zero position with loops search,
but how can we design f to represent the sequence of the zero
position,
for example one: zero position 1 3 5 7 9 11 13 15 ...
f(n)=y(n)=y(n-1)+2;
for example two, if we have zero position 3 4 6 9 10 12 15 19 ...
how can we design f?
Quote: THE PROBLEM IS, can you expand your data again? (-:
if we have f that represent the zero position, we can return the data
to original, because i just only SWAP the binary data (refer to my
first explanation)
Quote:
THE PROBLEM IS,
if we have for example 100000 byte data
then we must find 100000 position of 'zeros' (>=100000 is even
better)
My Question is, Can Anyone help me to design formula that can find
zero position in the data??
Where's the problem writing a simple for-loop over all bits in the
sequence and finding zeros?
f is not loop search
refer to my explanation before
Quote: However, I believe you have a conceptional error here, namely I do not
see why this is a *good* way to describe the data, i.e. why is this a
good model.
maybe its missunderstanding,
i'm sorry if my explanation isn't clear
Quote: So long,
Thomas- Hide quoted text -
regards,
Supriyatno |
|
|
| Back to top |
|
| yatno345 |
Posted: Thu May 01, 2008 4:43 pm |
|
|
|
Guest
|
On Apr 30, 6:14 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
Quote: This is probably a misunderstanding, and your problem might be that you
confuse "random" with "iid with uniform contribution". Of course I can
compress "random iid data", how well, however, does not depend on the
Kurtosis, but on the entropy of the random distribution defining this
data. For a certain class of random distributions, your assertion might
be right, though (GGD, for example? I haven't tried to compute this, but
it sounds about right).
i'm sory i'm still have alot to learn,
is it enough to define random just :
* have distributed to all range (kurtosis below 3)
* not periodic
Quote:
for example :
if we have data with value $00 - $FF
kurtosis <=3 , its mean the data have distributed all over the ranged
number
if we can process data to have ranged $00 - $7F (or $00 - 3F even
better ), it can highly compressionable
Even if we have data in the full interval, it might be compressible. If
its values are only in a sub-interval, you can exploit this and
represent the data with less than eight bits. But this is only a
sufficient condition, not a necessary one.
in the full interval, it meight be compressible but i think it have
low compression ratio,
my idea here is not to compress the data but to process the data to
have higher compressible
if we can process the 8-bit to have lower range (but still 8-bit) then
the occurence of the same data will be higher, then if we do huffman
(or etc) it will be higher compressible
Quote: The question is not whether you can find such an f (a simple search over
the binary symbols will do, so f is rather trivial to construct), the
question is whether this representation is useful for your random
source. So why do you believe is this a good description for the
byte-sequence you want to compress?
f is not a loops search,
yes we can find zero position with loops search,
but how can we design f to represent the sequence of the zero
position,
for example one: zero position 1 3 5 7 9 11 13 15 ...
f(n)=y(n)=y(n-1)+2;
for example two, if we have zero position 3 4 6 9 10 12 15 19 ...
how can we design f?
Quote: THE PROBLEM IS, can you expand your data again? (-:
if we have f that represent the zero position, we can return the data
to original, because i just only SWAP the binary data (refer to my
first explanation)
Quote:
THE PROBLEM IS,
if we have for example 100000 byte data
then we must find 100000 position of 'zeros' (>=100000 is even
better)
My Question is, Can Anyone help me to design formula that can find
zero position in the data??
Where's the problem writing a simple for-loop over all bits in the
sequence and finding zeros?
f is not loop search
refer to my explanation before
Quote: However, I believe you have a conceptional error here, namely I do not
see why this is a *good* way to describe the data, i.e. why is this a
good model.
maybe its missunderstanding,
i'm sorry if my explanation isn't clear
Quote: So long,
Thomas- Hide quoted text -
regards,
Supriyatno |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Sat Aug 30, 2008 10:02 am
|
|