| |
 |
|
|
Science Forum Index » Compression Forum » Better compression performance across time?...
Page 1 of 1
|
| Author |
Message |
| Jim Leonard... |
Posted: Sun May 11, 2008 10:30 am |
|
|
|
Guest
|
I was benchmarking some DOS-era backup programs today (research for an
essay) and noticed something that I'm having a hard time explaining
and was wondering if anyone had any ideas.
In most of the backup programs I was testing, there were generally
three compression options: None, "save time", and "save disks".
Clearly "save disks" was the highest possible compression, but "save
time" on all three of the programs I tested would achieve better
compression performance the more *time* they were given.
An example of this in practice is when each program was told to force
formatting of target disks or not. When set to "save time" and no
formatting, disk writes were very quick and compression was minimal.
But when formatting was forced on, increasing the time of disk writes,
the same "save time" setting produced significant compression ratios,
seemingly "taking advantage" of the extra time it had.
My question: How was this implemented? These are 1988-era programs,
so the only thing I can think of is that they used started compressing
using something appropriate for the time, like LZ77 or LZ78, and then
when a particular time threshold was exceeded, stopped compression,
stored the compressed portion, and then stored whatever was left over
as an uncompressed portion.
Does that sound right? If so, what algorithm family would best lend
itself to this? I'm thinking LZ77, because your dictionary is already
present (in the source buffer); you could scan it building match/
offset codes, then when the time threshold was exceeded, just flush
the rest of it with a "this is uncompressed" header or something. |
|
|
| Back to top |
|
| John Reiser... |
Posted: Sun May 11, 2008 4:39 pm |
|
|
|
Guest
|
Quote: [These are 1988-era archivers.] When set to "save time" and no
formatting, disk writes were very quick and compression was minimal.
But when formatting was forced on, increasing the time of disk writes,
the same "save time" setting produced significant compression ratios,
seemingly "taking advantage" of the extra time it had.
How was this implemented?
Floppy disks are formatted one track at a time, and the CPU
can do something else while each format command finishes.
Consult the data sheet for 82077A.
-- |
|
|
| Back to top |
|
| Jim Leonard... |
Posted: Mon May 12, 2008 4:12 am |
|
|
|
Guest
|
On May 11, 4:39 pm, John Reiser <jrei... at (no spam) BitWagon.com> wrote:
Quote: [These are 1988-era archivers.] When set to "save time" and no
formatting, disk writes were very quick and compression was minimal.
But when formatting was forced on, increasing the time of disk writes,
the same "save time" setting produced significant compression ratios,
seemingly "taking advantage" of the extra time it had.
How was this implemented?
Floppy disks are formatted one track at a time, and the CPU
can do something else while each format command finishes.
Consult the data sheet for 82077A.
That's not what I was asking. I was asking "what compression
algorithm would allow for better compression performance the more time
it was given to execute?" |
|
|
| Back to top |
|
| Jim Leonard... |
Posted: Mon May 12, 2008 6:51 am |
|
|
|
Guest
|
On May 12, 10:50 am, Phil Carmody <thefatphil_demun... at (no spam) yahoo.co.uk>
wrote:
Quote: if formatting tracks
format track
compress data
write compressed data to disk
else
write raw data to disk
endif
I see that the implementation of the algorithm (formatting and writing
floppy disks) is getting confused with what I'm actually asking.
Let's *completely forget* about writing to floppy disks and let me
instead pose my question thusly:
I have a system where there is an opportunity to compress a stream of
data before it is sent to its destination. The time I have to
compress the data is variable; that is, I do not know how much time I
have to compress before the data *must* be sent, and that time could
arrive while in the middle of the compression process. The output
stream can be broken up into variably-sized "compressed" and "non-
compressed" frames if necessary for transmission.
Given those constraints, is there an algorithm, or a particular way of
using an algorithm, that would satisfy this requirement? For example,
is there a family of algorithms where it is a design requirement to
interrupt the compression process, output what has compressed so far,
then output the remainder?
I can already think of ways to do this (see my OP) but was wondering
if there was an algorithm *specifically tuned* for this behavior. |
|
|
| Back to top |
|
| John Reiser... |
Posted: Mon May 12, 2008 10:06 am |
|
|
|
Guest
|
Jim Leonard wrote:
Quote: I was asking "what compression
algorithm would allow for better compression performance the more time
it was given to execute?"
Several compression packages have tunable parameters. The command line
form typically is '1' [or '0'] for low compression (fast), to '9' for
highest compression (slow). For instance: gzip, bzip2, upx, etc.
Changing the block size, look-back window size, or dictionary size
is one typical implementation. Changing search strategy is another.
Often for deflate-based algorithms there is more than one local choice
that works. How much effort should be devoted to finding alternatives
and choosing among them?
-- |
|
|
| Back to top |
|
| Phil Carmody... |
Posted: Mon May 12, 2008 10:50 am |
|
|
|
Guest
|
Jim Leonard <MobyGamer at (no spam) gmail.com> writes:
Quote: On May 11, 4:39 pm, John Reiser <jrei... at (no spam) BitWagon.com> wrote:
[These are 1988-era archivers.] When set to "save time" and no
formatting, disk writes were very quick and compression was minimal.
But when formatting was forced on, increasing the time of disk writes,
the same "save time" setting produced significant compression ratios,
seemingly "taking advantage" of the extra time it had.
How was this implemented?
Floppy disks are formatted one track at a time, and the CPU
can do something else while each format command finishes.
Consult the data sheet for 82077A.
That's not what I was asking. I was asking "what compression
algorithm would allow for better compression performance the more time
it was given to execute?"
The algorithm described by the following pseudo-code
if formatting tracks
format track
compress data
write compressed data to disk
else
write raw data to disk
endif
--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration |
|
|
| Back to top |
|
| John Reiser... |
Posted: Mon May 12, 2008 1:19 pm |
|
|
|
Guest
|
Jim Leonard wrote:
Quote: I have a system where there is an opportunity to compress a stream of
data before it is sent to its destination. The time I have to
compress the data is variable; that is, I do not know how much time I
have to compress before the data *must* be sent, and that time could
arrive while in the middle of the compression process. The output
stream can be broken up into variably-sized "compressed" and "non-
compressed" frames if necessary for transmission.
Many algorithms based on deflation (linear scan of the input;
"this substring also appears earlier; emit a shorter code
instead of the substring") can be modifed easily to do that.
When the allowed time is exhausted, then send the un-scanned
bytes as a non-compressed literal.
-- |
|
|
| Back to top |
|
| James Dow Allen... |
Posted: Tue May 13, 2008 12:03 am |
|
|
|
Guest
|
On May 12, 9:12 pm, Jim Leonard <MobyGa... at (no spam) gmail.com> wrote:
Quote: "what compression
algorithm would allow for better compression performance the more time
it was given to execute?"
Some image compression methods have this property, for example
IFS ("fractal") compression, at least if its modelling is complex
enough.
A quick search suggests the last post on "fractal" image
compression to this ng was in 2002! The method is very interesting
to know about, *even if* it, as many claim, is ineffective.
(I've heard Micorsoft's Encarta used this method: is that
correct?)
Very very briefly, IFS image compression is a codebook (VQ)
method in which the codebook *is* the image itself!!!
(or rather the image expanded for stability)
(If this seems impossible, think convergence through iteration).
James Dow Allen |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Sat Oct 11, 2008 6:12 pm
|
|