| |
 |
|
|
Science Forum Index » Cryptography Forum » Does this scheme look sound, questionable, or downright wort
Page 1 of 1
|
| Author |
Message |
| Jeff |
Posted: Sat Jan 03, 2004 6:07 pm |
|
|
|
Guest
|
Suppose a scheme is devised to munge a poorly-distributed sequence of
integers (representing web application user IDs) beginning with 1 and
increasing to some value much, much less than 2^32.
Each time a user is created, the next available, unused integer is
used as that user's user ID. HOWEVER, the application is designed to
NEVER expose the raw user ID to users. Instead, when each user is
created, the md5 hash of the integer's string representation (with
some secret, but unchanging string of indeterminate length
concatenated to it to make determining the original value through
brute force more difficult). Furthermore, THAT raw md5 value is never
exposed directly to users, either (the main purpose of md5 at THIS
stage is to generate a nicely-distributed collection of 128-bit
numeric values to use instead of the tiny sequence of small integers.)
Instead, each time a user session begins, a random 128 bit integer is
created for that user and XOR'ed against the 128-bit md5 value prior
to making it visible to the user. Likewise, values submitted by the
user as input parameters representing the hashed id of some other user
are desalinated against the same 128 bit salt via XOR as well.
So far, so good... I think.
Now for the vulnerabilities:
Suppose a malicious user knows EXACTLY how the scheme works. He
doesn't know the raw integer representing his user ID, nor does he
know the raw user ID of any other user. Furthermore, he doesn't know
what value was used to salt the raw user IDs to generate the md5 hash
stored alongside the raw id in the database.
Based upon the knowledge he DOES have, how hard would it be for him to
determine his raw user ID, or the raw user ID of another user, by
repeatedly logging on (to generate a new session salt) and comparing
the observed values of his session-salted user ID and those of some
finite number of other specific users?
OK, getting a little trickier. Let's assume he knows that the raw
underlying user IDs increase by 1 each time a new user is created.
Suppose further that he were to create two or more new users late at
night when he knew he had a good chance of creating users with
sequential user IDs. Assuming his assumption were right, and he knew
for a fact that the salted hash associated with user "B" has a raw
value exactly 1 greater than the raw value associated with user "A",
how hard would it be for him to discover the raw id of his user, or
some other user? In other words, now he doesn't just know how the
scheme works... he also knows that two hashed values, though unknown,
represent integers differing by exactly 1, and he also knows which one
is the larger of the two. He logs on as user "A" and discerns the
session-salted value associated with "B" and his own user, then logs
on as user "B" to discern the session-salted value associated with
both himself and "A". If necessary, he creates THREE, FOUR, or some
other small, finite number of users believed to have sequential
underlying user IDs and pulls the same stunt... logging in as each one
or more times to determine the session-salted hashed values for the
other users.
Ultimately, does this look like a decent scheme for keeping raw
underlying user IDs secret, and making HASHED user IDs hard to find
for ANY user, much less a specific one (due to there being only a few
thousand valid ones among a haystack with 2^128 values), without
introducing TOO much overhead into transactions that are going to
happen a LOT (so overhead *is* a big deal, but tolerable as long as
the only computation that needs to be repeated with every single
request is a Java BigInteger.xor()?
The biggest vulnerability I can see is that if someone manages to
discern his raw md5 hash, he'll be able to easily discover the current
session's salt. From that point, he could easily determine the
nonvarying raw md5 hash of any other user. Armed with that knowledge,
he could try to discover the unvarying value used to salt the raw
integer ids prior to taking their md5 hash through brute force by
systematically guessing at plausible salts and generating the md5
hashes of integers from 0 to, say, 10000 (knowing there's a good
chance the underlying values ARE less than 1000) salted with that
guessed salt value and looking for three sequential hash values that
match the three known raw hashes. From that point, it would just be a
matter of generating the md5 hashes of all the integers from 0 to some
reasonable maximum salted with that now-known salt value to generate
an index against which any user's real id could be determined.
One possible improvement to the scheme might be to randomly generate a
new salt for EACH new user to use when salting the new user ID to
obtain the unchanging md5 hash associated with each user. Obviously
I'd have to carefully check for collisions (since two different
numbers salted with two different random strings of random length
COULD produce identical md5 hashes). Varying the salt would probably
screw up the distribution of md5 hash values, too... but then again,
treating the salt like a one time pad would effectively remove
discovery of that salt as a possible exploit since it would differ for
every single id.
Another possible improvement might be to do away with sequential raw
user IDs altogether and just generate md5 hashes from random strings
of random length from the very beginning. I'd have to make sure I
didn't accidentally re-use an existing hash value, but that would
effectively eliminate sequential users as a potential exploit.
Ultimately the randomness would be no better than whatever Sun
happened to design into Java, but it would probably be way better than
sequential underlying integers in any case.
Hmmm. If I did both improvements, are there any obvious
encryption-specific exploits left that I've overlooked? I suspect if I
did everything I mentioned above, a potential attacker would just
forget about trying to crack the user ids and work on hacking his way
into the database server or web server instead as an easier
alternative... Comments? |
|
|
| Back to top |
|
| Paul Rubin |
Posted: Sat Jan 03, 2004 6:27 pm |
|
|
|
Guest
|
jskubick@hotmail.com (Jeff) writes:
Quote: Each time a user is created, the next available, unused integer is
used as that user's user ID. HOWEVER, the application is designed to
NEVER expose the raw user ID to users. Instead, when each user is
created, the md5 hash of the integer's string representation (with
some secret, but unchanging string of indeterminate length
I didn't have the patience to slog through your two page post (maybe
you could recap it in 10 or so lines) but as I remember, you
previously wanted to be able to be able to invert the function,
i.e. to get the UID back from the cookie. That means you want
encryption and not hashing.
If you're trying to use md5 to un-invertibly change a small UID into a
bigger and unguessable code, just use the HMAC construction (RFC 2104)
instead of cooking up your own scheme.
Your best bet is to read a book like Applied Cryptography by Bruce
Schneier and get some understanding of the principles you need, rather
than splashing around on the newsgroup. |
|
|
| Back to top |
|
| Paul Rubin |
Posted: Sat Jan 03, 2004 6:27 pm |
|
|
|
Guest
|
jskubick@hotmail.com (Jeff) writes:
Quote: Each time a user is created, the next available, unused integer is
used as that user's user ID. HOWEVER, the application is designed to
NEVER expose the raw user ID to users. Instead, when each user is
created, the md5 hash of the integer's string representation (with
some secret, but unchanging string of indeterminate length
I didn't have the patience to slog through your two page post (maybe
you could recap it in 10 or so lines) but as I remember, you
previously wanted to be able to be able to invert the function,
i.e. to get the UID back from the cookie. That means you want
encryption and not hashing.
If you're trying to use md5 to un-invertibly change a small UID into a
bigger and unguessable code, just use the HMAC construction (RFC 2104)
instead of cooking up your own scheme.
Your best bet is to read a book like Applied Cryptography by Bruce
Schneier and get some understanding of the principles you need, rather
than splashing around on the newsgroup. |
|
|
| Back to top |
|
| lyal |
Posted: Sun Jan 04, 2004 12:36 am |
|
|
|
Guest
|
"Jeff" <jskubick@hotmail.com> wrote in message
news:1504e6b9.0401031507.29cb7de6@posting.google.com...
Quote: Suppose a scheme is devised to munge a poorly-distributed sequence of
integers (representing web application user IDs) beginning with 1 and
increasing to some value much, much less than 2^32.
Each time a user is created, the next available, unused integer is
used as that user's user ID. HOWEVER, the application is designed to
NEVER expose the raw user ID to users. Instead, when each user is
created, the md5 hash of the integer's string representation (with
some secret, but unchanging string of indeterminate length
concatenated to it to make determining the original value through
brute force more difficult). Furthermore, THAT raw md5 value is never
exposed directly to users, either (the main purpose of md5 at THIS
stage is to generate a nicely-distributed collection of 128-bit
numeric values to use instead of the tiny sequence of small integers.)
Instead, each time a user session begins, a random 128 bit integer is
created for that user and XOR'ed against the 128-bit md5 value prior
to making it visible to the user. Likewise, values submitted by the
user as input parameters representing the hashed id of some other user
are desalinated against the same 128 bit salt via XOR as well.
So far, so good... I think.
[snip]
I'm not so sure - but I hope to learn from feedback on this.
After a series of the outputs from the above process is collected, the
results will all have the form of
Rn XOR C (a constant for the user - the Hashed/Salted UserID) to give Zn
XORing pairs of these together will, I think, tend to show
R1 XOR C XOR R2 XOR C = R1 XOR R2
Collect 100 Z's (or any largish number).
XOR 99 of the Z values together to make Y (i.e C XOR R1- to R99)
The residue/result of XORing Y with Z100 (the next output) is statistically
more probably to be due to C than any R value.
Continue for further pairs of Zn to make Yn+100 - those bits in the residue
that don't change are more probably due to C than the random R.
Continue till yourt level of confidence is hish enought to know, or reduce
the deirvation of C values to say a 40-bit issue, rather than a problem of
~2^128 magnitude.
Once you are confident that you have C (or some specific bits in C), re/XOR
against all Zn's, and determine the R values as a cross check.
Once you have C, then a dictionary attack against the ID integers may be
feasible if the ID space is 2^32.
Lyal |
|
|
| Back to top |
|
| lyal |
Posted: Sun Jan 04, 2004 12:36 am |
|
|
|
Guest
|
"Jeff" <jskubick@hotmail.com> wrote in message
news:1504e6b9.0401031507.29cb7de6@posting.google.com...
Quote: Suppose a scheme is devised to munge a poorly-distributed sequence of
integers (representing web application user IDs) beginning with 1 and
increasing to some value much, much less than 2^32.
Each time a user is created, the next available, unused integer is
used as that user's user ID. HOWEVER, the application is designed to
NEVER expose the raw user ID to users. Instead, when each user is
created, the md5 hash of the integer's string representation (with
some secret, but unchanging string of indeterminate length
concatenated to it to make determining the original value through
brute force more difficult). Furthermore, THAT raw md5 value is never
exposed directly to users, either (the main purpose of md5 at THIS
stage is to generate a nicely-distributed collection of 128-bit
numeric values to use instead of the tiny sequence of small integers.)
Instead, each time a user session begins, a random 128 bit integer is
created for that user and XOR'ed against the 128-bit md5 value prior
to making it visible to the user. Likewise, values submitted by the
user as input parameters representing the hashed id of some other user
are desalinated against the same 128 bit salt via XOR as well.
So far, so good... I think.
[snip]
I'm not so sure - but I hope to learn from feedback on this.
After a series of the outputs from the above process is collected, the
results will all have the form of
Rn XOR C (a constant for the user - the Hashed/Salted UserID) to give Zn
XORing pairs of these together will, I think, tend to show
R1 XOR C XOR R2 XOR C = R1 XOR R2
Collect 100 Z's (or any largish number).
XOR 99 of the Z values together to make Y (i.e C XOR R1- to R99)
The residue/result of XORing Y with Z100 (the next output) is statistically
more probably to be due to C than any R value.
Continue for further pairs of Zn to make Yn+100 - those bits in the residue
that don't change are more probably due to C than the random R.
Continue till yourt level of confidence is hish enought to know, or reduce
the deirvation of C values to say a 40-bit issue, rather than a problem of
~2^128 magnitude.
Once you are confident that you have C (or some specific bits in C), re/XOR
against all Zn's, and determine the R values as a cross check.
Once you have C, then a dictionary attack against the ID integers may be
feasible if the ID space is 2^32.
Lyal |
|
|
| Back to top |
|
| Hans Fleischmann |
Posted: Sun Jan 04, 2004 5:03 am |
|
|
|
Guest
|
Jeff wrote:
Quote: Suppose a scheme is devised to munge a poorly-distributed sequence of
integers (representing web application user IDs) beginning with 1 and
increasing to some value much, much less than 2^32.
[...] Comments?
Yeah, don't take serial ID's, take them random. As soon as someone get's
access to your database he'll know the sequence of the table-entries and
that's your numbering. And don't record a "date-registered" too.
Hans |
|
|
| Back to top |
|
| John E. Hadstate |
Posted: Sun Jan 04, 2004 10:47 am |
|
|
|
Guest
|
"Jeff" <jskubick@hotmail.com> wrote in message
news:1504e6b9.0401031507.29cb7de6@posting.google.com...
Quote: Suppose a scheme is devised to munge a poorly-distributed sequence of
integers (representing web application user IDs) beginning with 1 and
increasing to some value much, much less than 2^32.
Each time a user is created, the next available, unused integer is
used as that user's user ID. HOWEVER, the application is designed to
NEVER expose the raw user ID to users. Instead, when each user is
created, the md5 hash of the integer's string representation (with
some secret, but unchanging string of indeterminate length
concatenated to it to make determining the original value through
brute force more difficult).
<note>
I believe that a strict interpretation of the rules would hold that a
"secret, but unchanging string" is not secret, cryptographically speaking.
In other words, it's part of the algorithm, not the parameterization, and
therefore should be considered open to public scrutiny.
</note>
Quote: Furthermore, THAT raw md5 value is never
exposed directly to users, either (the main purpose of md5 at THIS
stage is to generate a nicely-distributed collection of 128-bit
numeric values to use instead of the tiny sequence of small integers.)
Instead, each time a user session begins, a random 128 bit integer is
created for that user and XOR'ed against the 128-bit md5 value prior
to making it visible to the user. Likewise, values submitted by the
user as input parameters representing the hashed id of some other user
are desalinated against the same 128 bit salt via XOR as well.
This "random 128 bit integer" has the effect of encrypting the userid hash,
so the user sees the ciphertext of his own hashed userid. Why does the user
need to see this? Is this "random 128 bit integer" a constant, does it
change from session to session according to some formula, or is it truly
random? Is it reproducible later?
Quote:
So far, so good... I think.
Now for the vulnerabilities:
Suppose a malicious user knows EXACTLY how the scheme works. He
doesn't know the raw integer representing his user ID, nor does he
know the raw user ID of any other user. Furthermore, he doesn't know
what value was used to salt the raw user IDs to generate the md5 hash
stored alongside the raw id in the database.
<exception>
See note above
</exception>
Quote:
Based upon the knowledge he DOES have, how hard would it be for him to
determine his raw user ID, or the raw user ID of another user, by
repeatedly logging on (to generate a new session salt) and comparing
the observed values of his session-salted user ID and those of some
finite number of other specific users?
Can't answer these questions without knowing a lot more details about the
processing described above. |
|
|
| Back to top |
|
| Jeff |
Posted: Mon Jan 05, 2004 4:57 pm |
|
|
|
Guest
|
Refinements to original scheme:
Based on the comments of everyone, I eliminated the underlying 32-bit
integers and their salted (with an unchanging salt) md5 hashes
altogether in favor of just generating an outright random 128 or 64
bit unsigned integer when the account is created (after verifying that
it's not already in use by another user, of course) and using it
directly as the user's primary key in the database.
To keep it safe, a new random 128 or 64 bit unsigned integer salt will
be generated each time a user session begins (presumably by his
logging in) and XOR'ed against any user id value exposed to a user
(including the user's own id).
XOR'ing is computationally cheap, so applying it to user ID values on
their way in and out won't add much in the way of overhead. By
changing the salt each time a user session begins, any bookmarked or
recorded user id values from the previous (and all earlier) sessions
will become meaningless. I believe that the number of meaningful
values (corresponding to actual user IDs) will be small enough
relative to the pool of potential values to make even the accidental
discovery of a random valid ID through brute force difficult, let
alone an id code that, when salted with the user's current session
salt, will correspond to the user id of a SPECIFIC user.
Why 64 bit values instead of 128 bit? Mainly, performance and
convenience. MySQL can happily handle a 64-bit BigInt as a native
primitive datatype, but would have to handle 128-bit values by their
string representations (39 characters if decimal, 32 characters if
hex). As isolated lookups the difference in speed isn't huge, but
amidst a torrent of requests, it quickly becomes a Big Deal. [Sigh.
Maybe someday we'll get to have a nice, fast, native SuperInt
datatype...]
I know it would make the scheme a lot easier to compromise, but I
suspect that even 64 bits would keep an attacker busy for quite a
while. In a worst-case (but non-catastrophic) scenario, I could extend
the scheme to generate rolling session salt values lasting only a few
hours, but I suspect that forcibly ending user sessions after 18-30
hours (ensuring that the session salt becomes invalid) will be more
than enough to frustrate attackers when one considers that the rate at
which Tomcat itself can/will respond to requests limits the number of
values a malicious user could try out of brute force. Chances are,
anyone trying to generate http requests fast enough to meaningfully
compromise even 64 bits would probably bring the server to its knees
and draw attention long before they tried enough values to seriously
compromise anything. |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Mon Sep 08, 2008 7:14 am
|
|