Both are good suggestions,
I am trying to keep the filenames as unicode (2bytes per character) so
that it can work with non-english character systems. This makes the
sub strings more difficult and expensive to find and store... however
it is a good idea. Maybe i will add the most common substrings (most
likely to be file extensions) to the Huffman encoder.
The search performs partial matches on the file names and as such
searching using the compressed filename will not be possible