Main Page | Report this Page
 
   
Science Forum Index  »  Cryptography Forum  »  In need of a Crypto hash algorithm design...
Page 1 of 1    
Author Message
Albert Yang...
Posted: Sat Jun 28, 2008 1:16 am
Guest
I need a hash algorithm that produces "filename friendly" hashes; i.e.
takes any file and produces a 8 byte char of [A-Z][0-9] subset only.
While collision prevention is important 0~Z yields 35 characters and 8
bytes resulting in a 35^8 permutations.

I am working on a project, mp3 player for 3rd world countries. One of
the things I'd like to do is to have each recording generate a unique
name so when they copy from device to device, nothing unique is overridden.

While I know something like MD5 or SHA1 can do just fine, I'd prefer if
it was 8 bytes, upper case only. This will yield the best compatibility.

I am also hoping that because the device has a CODEC, that the algorithm
will be simple to program and perhaps be able to take advantage of DSP's
to generate the hash.

The last request is that this being crypto, that whatever is designed be
public domain.

While a modification on a current algorithm like SHA1 (mapping) is
probably quick and easy, I'd prefer one that was designed from the
ground up with these items in mind.

Love to hear some feedback.
Thanks.

Albert
David Eather...
Posted: Thu Jul 03, 2008 4:49 pm
Guest
Albert Yang wrote:
Quote:
I need a hash algorithm that produces "filename friendly" hashes; i.e.
takes any file and produces a 8 byte char of [A-Z][0-9] subset only.
While collision prevention is important 0~Z yields 35 characters
36 characters


and 8
Quote:
bytes resulting in a 35^8 permutations.
36^8 permutations

I am working on a project, mp3 player for 3rd world countries. One of
the things I'd like to do is to have each recording generate a unique
name so when they copy from device to device, nothing unique is overridden.

While I know something like MD5 or SHA1 can do just fine, I'd prefer if
it was 8 bytes, upper case only. This will yield the best compatibility.

You may find it more efficient for the hardware if you limit the file
names to A-Z, 0-5 (32 char or 5 bits 1099511627776 permutations). The
user will have about a million songs before he gets a collision. You
can use MD-5 or MD-4 for this just take off only the number of bits you
need. You might also be able to use a simple CRC instead.

Quote:

I am also hoping that because the device has a CODEC, that the algorithm
will be simple to program and perhaps be able to take advantage of DSP's
to generate the hash.

The last request is that this being crypto, that whatever is designed be
public domain.

While a modification on a current algorithm like SHA1 (mapping) is
probably quick and easy, I'd prefer one that was designed from the
ground up with these items in mind.

Not really needed and you might find little enthusiasm to do so.
Quote:

Love to hear some feedback.
Thanks.

Albert
Peter Pearson...
Posted: Fri Jul 04, 2008 2:53 pm
Guest
On Sat, 28 Jun 2008 14:16:56 +0800, Albert Yang <albert at (no spam) achtung.com> wrote:
Quote:
I need a hash algorithm that produces "filename friendly" hashes; i.e.
takes any file and produces a 8 byte char of [A-Z][0-9] subset only.
[snip]

While a modification on a current algorithm like SHA1 (mapping) is
probably quick and easy, I'd prefer one that was designed from the
ground up with these items in mind.

Perhaps what you really want is an off-the-shelf
impelementation of SHA or MD5, plus a very simple custom
program that maps hash values onto the desired filename
space.

--
To email me, substitute nowhere->spamcop, invalid->net.
...
Posted: Sun Jul 13, 2008 5:52 am
Guest
On Jun 28, 2:16 am, Albert Yang <alb... at (no spam) achtung.com> wrote:
Quote:
I need a hash algorithm that produces "filename friendly" hashes; i.e.
takes any file and produces a 8 byte char of [A-Z][0-9] subset only.
While collision prevention is important 0~Z yields 35 characters and 8
bytes resulting in a 35^8 permutations.

I am working on a project, mp3 player for 3rd world countries.  One of
the things I'd like to do is to have each recording generate a unique
name so when they copy from device to device, nothing unique is overridden.

The one time I had to do something like that, I simply outputted the
SHA-1 result in hexadecimal. This only produced char within [0-9] and
[a-f], so it's not exactly what you had in mind, but this method had
the benefit of being extremely easy to implement (no need for ugly
lookup table - the resulting filename will be exactly twice as long as
the binary hash), and it is "filename friendly".
Albert Yang...
Posted: Wed Jul 23, 2008 3:35 am
Guest
Quote:
You may find it more efficient for the hardware if you limit the file
names to A-Z, 0-5 (32 char or 5 bits 1099511627776 permutations). The
user will have about a million songs before he gets a collision. You
can use MD-5 or MD-4 for this just take off only the number of bits
you >need. You might also be able to use a simple CRC instead.


That's a great idea!

8char X 5 bits = 40 bits needed.

00-FF = 8 bits X 5 sets = 40 bits.

So I can map that and be done with it. Great suggestion.

Question:

If I take an MD5, would there be more entropy on the right most
characters or left most? I know the "theory" is that they should be
evenly distributed, but I know they aren't in reality. Any tests for
this?? If I take the left 5 numbers vs the right 5 numbers, which gives
more collisions?

5ff20a4d0ed4a85b61147c76a8f54730

Will the "5ff20a4d0e" be more random, or the "76a8f54730"?

Thanks!
Albert
Albert Yang...
Posted: Wed Jul 23, 2008 5:39 am
Guest
Thanks for the great suggestion, I wrote a quick and dirty converter in
python, attached below. This seems a simple solution to what I thought
was a complex problem.

Albert.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


### hexto5bit.py by Albert Yang copyright 2008 ###
### Takes any command line arguement that's in HEX, and converts it to 5
bit mapping ###
### This will yield a 8 bytes of [A-Z,0-5] (5bit) output with
1099511627776 (2^40) permutations ###
### This should be enough to prevent collisions in name space while
still DOS compliant ###
### This allows for 8bytename and a 3 byte extension with no
extemperaneous characters. ###
### example: hexto5bit.py adfe1101ef ###
### yields: VX5BCAPP ###

import string
import sys

### 8 bit hex map ###
hexmap = {
'0' : '0000',
'1' : '0001',
'2' : '0010',
'3' : '0011',
'4' : '0100',
'5' : '0101',
'6' : '0110',
'7' : '0111',
'8' : '1000',
'9' : '1001',
'A' : '1010',
'B' : '1011',
'C' : '1100',
'D' : '1101',
'E' : '1110',
'F' : '1111'}

dosmap = {
'00000':'A',
'00001':'B',
'00010':'C',
'00011':'D',
'00100':'E',
'00101':'F',
'00110':'G',
'00111':'H',
'01000':'I',
'01001':'J',
'01010':'K',
'01011':'L',
'01100':'M',
'01101':'N',
'01110':'O',
'01111':'P',
'10000':'Q',
'10001':'R',
'10010':'S',
'10011':'T',
'10100':'U',
'10101':'V',
'10110':'W',
'10111':'X',
'11000':'Y',
'11001':'Z',
'11010':'0',
'11011':'1',
'11100':'2',
'11101':'3',
'11110':'4',
'11111':'5'}

step = 5
payload = sys.argv[1]
binstring =''



### Convert hex to binary string first ###
for i in range(0,len(payload)):
binstring += hexmap[payload[i].upper()]

### Convert binary to 5 bit mapped output ###
for j in range (0,len(binstring),5):
sys.stdout.write(str(dosmap[binstring[j:j+step]]))
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Mon Oct 13, 2008 2:29 am