Main Page | Report this Page
 
   
Science Forum Index  »  Compression Forum  »  Short String Compression
Page 1 of 1    
Author Message
moogie
Posted: Tue Feb 01, 2005 6:18 pm
Guest
Both are good suggestions,

I am trying to keep the filenames as unicode (2bytes per character) so
that it can work with non-english character systems. This makes the
sub strings more difficult and expensive to find and store... however
it is a good idea. Maybe i will add the most common substrings (most
likely to be file extensions) to the Huffman encoder.

The search performs partial matches on the file names and as such
searching using the compressed filename will not be possible Sad
Phil Frisbie, Jr.
Posted: Tue Feb 01, 2005 6:43 pm
Guest
moogie wrote:
Quote:
Both are good suggestions,

I am trying to keep the filenames as unicode (2bytes per character) so
that it can work with non-english character systems. This makes the
sub strings more difficult and expensive to find and store... however
it is a good idea. Maybe i will add the most common substrings (most
likely to be file extensions) to the Huffman encoder.

You can use UTF-8 and still support all unicode characters. That also has the
advantage of being able to use the normal string functions to manipulate them.

Quote:
The search performs partial matches on the file names and as such
searching using the compressed filename will not be possible Sad

--
Phil Frisbie, Jr.
Hawk Software
http://www.hawksoft.com
rep_movsd
Posted: Tue Feb 08, 2005 3:28 pm
Guest
budgetanime@mystarship.com (moogie) wrote in message news:<e353ade.0502011518.3690011f@posting.google.com>...
Quote:
Both are good suggestions,

I am trying to keep the filenames as unicode (2bytes per character) so
that it can work with non-english character systems. This makes the
sub strings more difficult and expensive to find and store... however
it is a good idea. Maybe i will add the most common substrings (most
likely to be file extensions) to the Huffman encoder.

The search performs partial matches on the file names and as such
searching using the compressed filename will not be possible Sad

My 2 bits :
Why not just keep the database on a compressed filesystem?
The windows platform supports it out of the box and Im sure Linux has
bazillions of compressed file system implementations. In fact its
possible to use the /dev/cloop device to allow any filter program to
sit between the storage and the filesystem, allowing you to use any
compressor as a base for your filesystem.

I also remember reading about a compressed file system implementaion
which was close to LZO in performance on http://compresion.ru
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sun Nov 23, 2008 3:56 am