|
Science Forum Index » Compression Forum » decompress corrupted deflate data
Page 1 of 1
|
| Author |
Message |
| Qubeley |
Posted: Mon Mar 24, 2008 9:56 pm |
|
|
|
Guest
|
Suppose there is a large chunk of valid deflated data and then some
bits of the data are flipped, say when the deflate data is transmitted
through network, which invalidates the data as whole: if feeding this
data to a decompressor like zlib's inflate it will report error.
My problem is how to recover as much data as possible - for example,
an extreme case is all deflate blocks are valid except the last one
got some bits messed up in it so we can recover all deflate blocks
except the last one; but how about bits messed up in the middle as a
deflate block might reference strings appears in previous blocks so
this kind of corrupted data may not be recoverable.
I am curious if there is existing code that could at least tried to
recover from corrupted deflate data. Or, could I use zlib's inflate to
"try inflate" and simply discard those bad/none recoverable data? I
haven't taken a look at zlib's inflate in detail so I am here to
looking for suggestions on this problem :-} |
|
|
| Back to top |
|
| Mark Adler |
Posted: Tue Mar 25, 2008 3:29 am |
|
|
|
Guest
|
On Mar 25, 12:56 am, Qubeley <liangha...@gmail.com> wrote:
Quote: but how about bits messed up in the middle as a
deflate block might reference strings appears in previous blocks so
this kind of corrupted data may not be recoverable.
That is correct. The data after the corruption is not recoverable.
If you have some way of bounding the location and number of bits
corrupted, then you could through brute force try all combinations of
possible bit flips until one inflates correctly. However that gets
combinatorially out of hand very quickly.
If you have an error-prone channel, deflate streams can be generated
with periodic full synchronization points in order to mitigate the
damage, by allowing you to restart inflate at some point downstream
from the detected error. A better solution would be to wrap the
deflate stream with Reed-Solomon encoding which is good at correcting
bursts of errors.
The best solution is to fix your error-prone channel.
Mark |
|
|
| Back to top |
|
| Qubeley |
Posted: Tue Mar 25, 2008 4:27 am |
|
|
|
Guest
|
Thanks Mark.
The "periodic full synchronization points" you mentioned is to use
zlib's "Z_FULL_FLUSH" mode right?
Another problem is if wrapping deflate stream with ECC like Reed-
Solomon, other standard decompressors would not be able to inflate it
because the wrapped stream is not a deflate stream - pretty like the
ZLIB file format header, decompressors like WINZIP will unable to
interpret it. I have to put my deflate stream into a standard archiver
(mostly would be ZIP), so I think this may break compatibility. ZIP
file format, regardless of the CRC in each local file header, has no
other built in checksum or ECC to enable me recover data right? |
|
|
| Back to top |
|
| Mark Adler |
Posted: Tue Mar 25, 2008 7:36 am |
|
|
|
Guest
|
On Mar 25, 7:27 am, Qubeley <liangha...@gmail.com> wrote:
Quote: The "periodic full synchronization points" you mentioned is to use
zlib's "Z_FULL_FLUSH" mode right?
Right.
Quote: ZIP
file format, regardless of the CRC in each local file header, has no
other built in checksum or ECC to enable me recover data right?
Correct. However the zip format is extendible through the use of the
"extra" field in the local and/or central headers. You could define
your own extra field contents that contains parity bytes for R-S
coding to allow you to correct errors. Normal zip software would
safely ignore the mysterious extra field content. You could have
special software to perform the error correction when the need arises.
Mark |
|
|
| Back to top |
|
| Qubeley |
Posted: Tue Mar 25, 2008 6:48 pm |
|
|
|
Guest
|
| Great! The zip extra field sounds a promising solution. Thanks :-} |
|
|
| Back to top |
|
| |