| |
 |
|
|
Science Forum Index » Compression Forum » compressing similar files
Page 2 of 2 Goto page Previous 1, 2
|
| Author |
Message |
| Jim Leonard |
Posted: Mon Jan 14, 2008 6:18 am |
|
|
|
Guest
|
On Jan 12, 9:21 am, Industrial One <industrial_...@hotmail.com> wrote:
Quote: Oh my god... NOBODY on a fucking COMPRESSION NEWSGROUP (supposedly
frequented by Ph.Ds) gave the right advice for a simple and OBVIOUS
task, except Jim Leonard. WOW. Next person who calls me a troll/newb
will be force-fed his own testicles.
At the risk of tasting my own testicles, I could mention that
comp.compression was created primarily as a discussion for new
compression methods and techniques, as illustrated by its charter:
http://www.faqs.org/ftp/usenet/news.announce.newgroups/comp/comp.compression
So it's not entirely unreasonable that people here would rather attack
the problem from a theoretical or analytical standpoint. 15 years
later, it is still primarily a group for discussion of methods and not
necessarily a utility support group. |
|
|
| Back to top |
|
| Industrial One |
Posted: Mon Jan 14, 2008 7:20 am |
|
|
|
Guest
|
On Jan 14, 9:18 am, Jim Leonard <MobyGa...@gmail.com> wrote:
Quote: On Jan 12, 9:21 am, Industrial One <industrial_...@hotmail.com> wrote:
At the risk of tasting my own testicles, I could mention that
comp.compression was created primarily as a discussion for new
compression methods and techniques, as illustrated by its charter:http://www.faqs.org/ftp/usenet/news.announce.newgroups/comp/comp.comp...
So it's not entirely unreasonable that people here would rather attack
the problem from a theoretical or analytical standpoint. 15 years
later, it is still primarily a group for discussion of methods and not
necessarily a utility support group.
The problem was not that people provided theoretical solutions... I
hardly give a shit how theoretical anything gets on a goddamn
programming forum. The problem was that nobody gave the OP the correct
advice, which is to use solid compression.
Quote: the problem from a theoretical or analytical standpoint. 15 years
later, it is still primarily a group for discussion of methods and not
necessarily a utility support group.
Man... it's so hard to believe this board actually existed for that
long. I'm browsing the 1991 threads right now... and I'd so laugh if I
found an academic discussion about optimal MP3 bitrates on the day I
so desperately wanted to learn about all that cool shit [Sept 2000]) |
|
|
| Back to top |
|
| Industrial One |
Posted: Mon Jan 14, 2008 2:49 pm |
|
|
|
Guest
|
On Jan 14, 5:09 pm, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote:
Quote: Industrial One wrote:
The problem was that nobody gave the OP the correct
advice, which is to use solid compression.
Because it isn't [the] correct advice. Using one dictionary for one file of
10 MB IMO will result in poor compression. When the dictionary is
Yes, and clearly, your opinion means dick.
Quote: discarded and rebuilt during compression of one such file, it will be
rebuilt for every following file as well. The windows in other
The thing is... it doesn't. Why the hell would it? Solid compression
is basically combining all files into one and compressing. If I
manually joined ten 10 MB files into one and compressed normally --
what do you think would happen? Would the dictionary keep re-adapting
and discarding itself as you claim? Whenever I think of
"comp.compression" I imagine retrocomputing, using old GZIP and shit,
and it seems accurate when I see people unable to grasp basic new
compression concepts.
Quote: compression methods also are much smaller than one such file. Please
name a compression method, that will analyze and compress chunks of 100 MB.
DoDi
LZ77. As I said, I compressed fifty 32 MB files (1.5 GB total) to 56
MB -- which without solid archiving would only compress to about 750
MB. I can send you the fuckin screenshot if you don't believe me.
Also, if I don't include one of them whose content deviate on a large
scale from the others (about half of the data different) then I bet it
would compress down to 33-35.
10 MB is nothing. As Jim said, 32-bit pointers, 2^32 = 4,294,967,296
bytes. |
|
|
| Back to top |
|
| biject |
Posted: Mon Jan 14, 2008 3:32 pm |
|
|
|
Guest
|
On Jan 14, 5:09 pm, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote:
Quote: Industrial One wrote:
The problem was that nobody gave the OP the correct
advice, which is to use solid compression.
Because it isn't a correct advice. Using one dictionary for one file of
10 MB IMO will result in poor compression. When the dictionary is
discarded and rebuilt during compression of one such file, it will be
rebuilt for every following file as well. The windows in other
compression methods also are much smaller than one such file. Please
name a compression method, that will analyze and compress chunks of 100 MB.
DoDi
bbb will do just that
http://cs.fit.edu/~mmahoney/compression/#bbb
it wouold use a singe BWT for the whole mess 100MB no problem
you may not be aware of it if you don't keep up with whats going on
in the compression field.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link" |
|
|
| Back to top |
|
| Hans-Peter Diettrich |
Posted: Mon Jan 14, 2008 8:09 pm |
|
|
|
Guest
|
Industrial One wrote:
Quote: The problem was that nobody gave the OP the correct
advice, which is to use solid compression.
Because it isn't a correct advice. Using one dictionary for one file of
10 MB IMO will result in poor compression. When the dictionary is
discarded and rebuilt during compression of one such file, it will be
rebuilt for every following file as well. The windows in other
compression methods also are much smaller than one such file. Please
name a compression method, that will analyze and compress chunks of 100 MB.
DoDi |
|
|
| Back to top |
|
| Hans-Peter Diettrich |
Posted: Mon Jan 14, 2008 8:13 pm |
|
|
|
Guest
|
Jim Leonard wrote:
Quote: 7-Zip uses 32-bit pointers, IIRC, so that would make the window 4Gig.
How that?
DoDi |
|
|
| Back to top |
|
| Jim Leonard |
Posted: Tue Jan 15, 2008 5:45 am |
|
|
|
Guest
|
On Jan 14, 6:13 pm, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote:
Quote: Jim Leonard wrote:
7-Zip uses 32-bit pointers, IIRC, so that would make the window 4Gig.
How that?
I wrote that because I had read it somewhere initially; however,
ironically, nobody has clearly defined LZMA other than a few key
details. Checking it now, it appears that 32-bit offsets are not
necessarily the norm, however the dictionary can be as large as 4G...
It would be nice if Igor or someone could write a formal definition
someday... |
|
|
| Back to top |
|
| Industrial One |
Posted: Tue Jan 15, 2008 5:59 am |
|
|
|
Guest
|
On Jan 15, 7:45 am, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote:
Quote: Thanks, I'll have a look at that.
http://i2.tinypic.com/82jpxfs.png
http://i6.tinypic.com/6u4bn09.png
Quote: 10 MB is nothing. As Jim said, 32-bit pointers, 2^32 = 4,294,967,296
bytes.
This doesn't say anything about the usage of such pointers. You also
ignore the fact, that the *total* address space is less than 4GB.
DoDi
All I know is that it works. |
|
|
| Back to top |
|
| Hans-Peter Diettrich |
Posted: Tue Jan 15, 2008 10:45 am |
|
|
|
Guest
|
Industrial One wrote:
Quote: discarded and rebuilt during compression of one such file, it will be
rebuilt for every following file as well. The windows in other
The thing is... it doesn't. Why the hell would it? Solid compression
is basically combining all files into one and compressing. If I
manually joined ten 10 MB files into one and compressed normally --
what do you think would happen? Would the dictionary keep re-adapting
and discarding itself as you claim?
That depends on the used compressor. The described behaviour results
from the implementations I studied. When you can name a compression
method, that does behave differently, please do so.
Quote: Whenever I think of
"comp.compression" I imagine retrocomputing, using old GZIP and shit,
and it seems accurate when I see people unable to grasp basic new
compression concepts.
Solid compression is nothing new, only a successor of TGZ.
Quote: compression methods also are much smaller than one such file. Please
name a compression method, that will analyze and compress chunks of 100 MB.
DoDi
LZ77. As I said, I compressed fifty 32 MB files (1.5 GB total) to 56
MB
Thanks, I'll have a look at that.
Quote: 10 MB is nothing. As Jim said, 32-bit pointers, 2^32 = 4,294,967,296
bytes.
This doesn't say anything about the usage of such pointers. You also
ignore the fact, that the *total* address space is less than 4GB.
DoDi |
|
|
| Back to top |
|
| David Portabella |
Posted: Thu Jan 24, 2008 5:20 am |
|
|
|
Guest
|
Hello,
Thanks for all the feedback.
Here there is the results of my tests:
Source:
82 files, between 5 and 11 Mb each, highly similar among them.
In total, 571 Mb.
Using zip: 275 Mb
Using tar | bzip2: 260 Mb
Using 7-Zip (32Mb dictionary, Solid block-size: 4GB): 28 Mb !!!
Using p7zip (the unix 7-Zip equivalent): 211 Mb :(
I did not see how to turn on the "solid" option in p7zip.
Maybe it is not implemented for compressing?
but at least it extracted successfully the 28Mb from created with 7-
Zip
BBB did not work for me: compressing and then decompressing produced
a file different from the original one. I guess I did something wrong.
Thanks again for the feedback,
DAvid |
|
|
| Back to top |
|
| lockecold |
Posted: Thu Jan 24, 2008 3:28 pm |
|
|
|
Guest
|
David Portabella wrote:
Quote:
I did not see how to turn on the "solid" option in p7zip.
Maybe it is not implemented for compressing?
but at least it extracted successfully the 28Mb from created with 7-
Zip
try "man 7za" :
<quote>
EXAMPLE 1
7za a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on archive.7z dir1
adds all files from directory "dir1" to archive
archive.7z using "ultra settings"
-t7z 7z archive
-m0=lzma
lzma method
-mx=9 level of compression = 9 (Ultra)
-mfb=64
number of fast bytes for LZMA = 64
-md=32m
dictionary size = 32 megabytes
-ms=on solid archive = on
</quote> |
|
|
| Back to top |
|
| Guest |
Posted: Fri Mar 07, 2008 1:57 pm |
|
|
|
|
Hey, (sorry in advance for bumping this topic),
On Jan 24, 9:20 am, David Portabella <david.portabe...@gmail.com>
wrote:
Quote:
Using 7-Zip (32Mb dictionary, Solid block-size: 4GB): 28 Mb !!!
Using p7zip (the unix 7-Zip equivalent): 211 Mb :(
I did not see how to turn on the "solid" option in p7zip.
Maybe it is not implemented for compressing?
but at least it extracted successfully the 28Mb from created with 7-
Zip
By default, I think? p7zip uses "-mx5 -ms -t7z", so typing "p7zip a
blah mydir" will compress mydir/* into blah.7z. Try reading the .CHM
in the Win32 .ZIP download (or equivalent), it gives lots of good info
(e.g. "-ms=512k" for *semi-solid* a la RAR, so you don't lose the
whole thing if corrupted or have to unpack the whole thing if only a
small part is needed). |
|
|
| Back to top |
|
| |
Page 2 of 2 Goto page Previous 1, 2
All times are GMT - 5 Hours
The time now is Sat Sep 06, 2008 12:26 am
|
|