Main Page | Report this Page
 
   
Science Forum Index  »  Compression Forum  »  misc: how random is 'random'?...
Page 1 of 1    
Author Message
cr88192
Posted: Mon Apr 28, 2008 9:05 pm
Guest
well, this is once again in response to one of my RNG/PRNG issues...

right now, I am wondering if there is any "good" way to measure/estimate the
amount of "randomness" in an RNG's state.


the particular approach I am using looks about like this:
I have a 1024 bit state consisting of 32 32-bit numbers.

this state is seeded, initially, using rand, which is seeded using the rdtsc
instruction (x86 and friends, tells how many clock cycles the CPU has
currently gone through).

on each iteration:
the current rdtsc value is read, and added to the low-end of the array.
a linear multiply is pulled off between the array and the mersenne prime
near 2^31 (this multiply "wraps around", feeding the high end of the array
back into the bottom);
a hash is computed (multiplying each element by this prime and accumulating)
of which the upper half is used as the result.

likewise, a thread is also spawned that every 10ms goes through the above
process, and every second stores the current state to a file (this is used
as the input state the next time the thing is initialized). it is expected
that system events will cause the exact timing to vary...


theoretically, entropy should 'accumulate' in this array, so that over time
it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.

any thoughts?...
Gene
Posted: Tue Apr 29, 2008 6:26 am
Guest
On Apr 28, 10:05 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
Quote:
well, this is once again in response to one of my RNG/PRNG issues...

right now, I am wondering if there is any "good" way to measure/estimate the
amount of "randomness" in an RNG's state.

the particular approach I am using looks about like this:
I have a 1024 bit state consisting of 32 32-bit numbers.

this state is seeded, initially, using rand, which is seeded using the rdtsc
instruction (x86 and friends, tells how many clock cycles the CPU has
currently gone through).

on each iteration:
the current rdtsc value is read, and added to the low-end of the array.
a linear multiply is pulled off between the array and the mersenne prime
near 2^31 (this multiply "wraps around", feeding the high end of the array
back into the bottom);
a hash is computed (multiplying each element by this prime and accumulating)
of which the upper half is used as the result.

likewise, a thread is also spawned that every 10ms goes through the above
process, and every second stores the current state to a file (this is used
as the input state the next time the thing is initialized). it is expected
that system events will cause the exact timing to vary...

theoretically, entropy should 'accumulate' in this array, so that over time
it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.

any thoughts?...

This is a bit like asking how high is up. It depends what you are
trying to do.
cr88192
Posted: Tue Apr 29, 2008 3:53 pm
Guest
"Gene" <gene.ressler@gmail.com> wrote in message
news:0fd5f7ee-6b13-4841-8838-0db03b0ac669@i76g2000hsf.googlegroups.com...
Quote:
On Apr 28, 10:05 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:

theoretically, entropy should 'accumulate' in this array, so that over
time
it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.

any thoughts?...

This is a bit like asking how high is up. It depends what you are
trying to do.


well, people can measure up, but it is not clear how to measure accumulated
randomness...


the idea is to generate random numbers with a low probability of clash.

examples would be GUIDs/UUIDs, PUIDs (a term I use: Probably Unique IDs,
....).

in this particular case, the intended task is to generate 48 bit PUIDs with
the intention of being used as "segment" addresses in a 128 bit address
space (the practicality of which is debated, basically the idea is to have a
128 bit space which can represent both stores and shared-memory segments on
a potentially arbitrary, but likely small, number of nodes...). randomness
is desired as an attempt to minimize any two nodes accidentally picking the
same address...

theoretically, the chances are 1/(2^4Cool if I have a decent RNG, but are much
lower if the RNG lacks sufficient "unique" entropy.


one crude way I guess would be to do statistical analysis of my current main
entropy source, namely rdtsc, and try to make an estimate as to the amount
of entropy I am getting from it...

or such...
John Reiser
Posted: Tue Apr 29, 2008 9:39 pm
Guest
Quote:
This is a bit like asking how high is up. It depends what you are
trying to do.

well, people can measure up, but it is not clear how to measure accumulated
randomness...

There are several exceedingly good measures of randomness.
Consult: Knuth, Seminumerical Algorithms, vol.2 of The Art of Computer Programming.


Quote:
the idea is to generate random numbers with a low probability of clash.

If "avoid generating the same number twice" is the only measure, then
the guaranteed *BEST* method is to generate ascending consecutive integers
1, 2, 3, ... modulo some wide but efficient word size for your machine.
If you don't like the very high correlation between consecutive terms,
then simulate a ring counter having period 2**n -1. Consult any textbook
on "linear feedback shift register".

--
Neil
Posted: Tue Apr 29, 2008 11:14 pm
Guest
cr88192 wrote:
Quote:
well, this is once again in response to one of my RNG/PRNG issues...

right now, I am wondering if there is any "good" way to measure/estimate the
amount of "randomness" in an RNG's state.


the particular approach I am using looks about like this:
I have a 1024 bit state consisting of 32 32-bit numbers.

this state is seeded, initially, using rand, which is seeded using the rdtsc
instruction (x86 and friends, tells how many clock cycles the CPU has
currently gone through).

on each iteration:
the current rdtsc value is read, and added to the low-end of the array.
a linear multiply is pulled off between the array and the mersenne prime
near 2^31 (this multiply "wraps around", feeding the high end of the array
back into the bottom);
a hash is computed (multiplying each element by this prime and accumulating)
of which the upper half is used as the result.

likewise, a thread is also spawned that every 10ms goes through the above
process, and every second stores the current state to a file (this is used
as the input state the next time the thing is initialized). it is expected
that system events will cause the exact timing to vary...


theoretically, entropy should 'accumulate' in this array, so that over time
it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.

any thoughts?...



Yes, but you are not going to find a short answer. There are measures

for distribution and how long before it repeats.
cr88192
Posted: Wed Apr 30, 2008 1:35 am
Guest
"John Reiser" <jreiser@BitWagon.com> wrote in message
news:fv8m3m02v1g@enews2.newsguy.com...
Quote:
This is a bit like asking how high is up. It depends what you are
trying to do.

well, people can measure up, but it is not clear how to measure
accumulated
randomness...

There are several exceedingly good measures of randomness.
Consult: Knuth, Seminumerical Algorithms, vol.2 of The Art of Computer
Programming.


I don't have, nor can I likely get ahold of, this book (unless it is online
somewhere as a free download).


Quote:

the idea is to generate random numbers with a low probability of clash.

If "avoid generating the same number twice" is the only measure, then
the guaranteed *BEST* method is to generate ascending consecutive integers
1, 2, 3, ... modulo some wide but efficient word size for your machine.
If you don't like the very high correlation between consecutive terms,
then simulate a ring counter having period 2**n -1. Consult any textbook
on "linear feedback shift register".


this only works on a single machine...

if the algo is being used independently on many independent machines (for
example, in the case of a bottom-up/self-organizing system), then it is not
possible to avoid clashes this way...


as noted, I may resort to using statistical analysis of the input (rdtsc
deltas, ...) in order to estimate the approximate entropy, where from this I
can estimate how much and how quickly entropy is being accumulated in the
RNG state.

potentially, I could dump some of this out in an auditory form as well,
since the ear may be able to hear patterns that the mind may miss (a hiss
would be a good sign, buzzes or tones being not so good, with silence being
the worst though...).

but, I will mostly just have to do said analysis...


> --
cr88192
Posted: Wed Apr 30, 2008 1:43 am
Guest
"Neil" <NeilKurzm@worldnet.att.net> wrote in message
news:4817f20b$0$11599$607ed4bc@cv.net...
Quote:
cr88192 wrote:
well, this is once again in response to one of my RNG/PRNG issues...

right now, I am wondering if there is any "good" way to measure/estimate
the amount of "randomness" in an RNG's state.


<snip>

Quote:

theoretically, entropy should 'accumulate' in this array, so that over
time it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.

any thoughts?...



Yes, but you are not going to find a short answer. There are measures for
distribution and how long before it repeats.

distribution, yes...
howeve, should this thing ever repeat, that shows that this whole effort has
failed.
the point is that I am attempting to make a "true" RNG, not a PRNG, which is
where rdtsc and the monitor thread come in: they are an attempt to mine
entropy from the system itself, expected in the form of non-deterministic
noise resulting from the interactions of the hardware, the OS, and the
processor...

it is a similar approach to my former "clock() masturbation" approaches,
except that rdtsc returns elapsed processor cycles, which give much larger
and much more likely choatic numbers (especially considering that I
endlessly call sleep in order to maximize scheduler effects).

I will see though, I will have to analyze the signal.
worst case it is "silent", which would mean that there is almost no choas in
these events...


or such...
cr88192
Posted: Wed Apr 30, 2008 5:54 am
Guest
"cr88192" <cr88192@NOSPAM.hotmail.com> wrote in message
news:af21d$48167ec4$ca83b482$19141@saipan.com...
Quote:
well, this is once again in response to one of my RNG/PRNG issues...

right now, I am wondering if there is any "good" way to measure/estimate
the amount of "randomness" in an RNG's state.


<snip>

Quote:

theoretically, entropy should 'accumulate' in this array, so that over
time it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.

any thoughts?...


ok, so here is an update:
did a whole lot of fiddling and looking at the values (rdtsc deltas).

please excuse my horridly crude treatment of statistics (and me ignoring the
risk of there being some pattern in the values returned, which at present
would require LPC filtering and similar to eliminate).

(there is enough noise to make me suspect that at least a good part of this
is actual entropy, a recording sounding about like a broken radio...).


here is a ranking in terms of value probability distribution (the actual
values in question are not displayed, rather what is visible is the counts
from the top ranking members in a histogram).

rank: count lower/higher=low/high,high/low

0: 931011 931011/3263293=0.2853,3.5051
1: 506751 1437762/2756542=0.5216,1.9172
2: 380757 1818519/2375785=0.7654,1.3064
3: 217757 2036276/2158028=0.9436,1.0598
4: 170588 2206864/1987440=1.1104,0.9006
5: 91644 2298508/1895796=1.2124,0.8248
6: 87101 2385609/1808695=1.3190,0.7582
7: 76866 2462475/1731829=1.4219,0.7033
8: 75809 2538284/1656020=1.5328,0.6524
9: 69182 2607466/1586838=1.6432,0.6086
10: 67574 2675040/1519264=1.7607,0.5679
11: 67506 2742546/1451758=1.8891,0.5293
12: 62471 2805017/1389287=2.0190,0.4953
13: 62367 2867384/1326920=2.1609,0.4628
14: 61064 2928448/1265856=2.3134,0.4323
15: 60482 2988930/1205374=2.4797,0.4033

as can be noted, the ratio is closest to 1 at about index 3.

as a crude estimate, I will assert that I am getting at least about
log2(3)=1.584962 bits of entropy per step.

so, if I call rdtsc() 64 times, I get around 101 bits of entropy, and 646
calls is sufficient to completely seed a 1024 bit RNG state (1024 steps
would play it safe).


this should be more than sufficient for generating probably-unique random
numbers...
rossum
Posted: Wed Apr 30, 2008 5:57 am
Guest
On Wed, 30 Apr 2008 16:35:28 +1000, "cr88192"
<cr88192@NOSPAM.hotmail.com> wrote:

Quote:

"John Reiser" <jreiser@BitWagon.com> wrote in message
news:fv8m3m02v1g@enews2.newsguy.com...
This is a bit like asking how high is up. It depends what you are
trying to do.

well, people can measure up, but it is not clear how to measure
accumulated
randomness...

There are several exceedingly good measures of randomness.
Consult: Knuth, Seminumerical Algorithms, vol.2 of The Art of Computer
Programming.


I don't have, nor can I likely get ahold of, this book (unless it is online
somewhere as a free download).
Does your town have a library? Alternatively google for the Diehard

tests.

Quote:


the idea is to generate random numbers with a low probability of clash.

If "avoid generating the same number twice" is the only measure, then
the guaranteed *BEST* method is to generate ascending consecutive integers
1, 2, 3, ... modulo some wide but efficient word size for your machine.
If you don't like the very high correlation between consecutive terms,
then simulate a ring counter having period 2**n -1. Consult any textbook
on "linear feedback shift register".


this only works on a single machine...
Either use a central server to allocate numbers serially to all other

machines ...


Quote:

if the algo is being used independently on many independent machines (for
example, in the case of a bottom-up/self-organizing system), then it is not
possible to avoid clashes this way...
.... or assign part of your 48 bits as a fixed unique number for each

machine and use the rest of the 48 bits for the incrementing counter.
The resulting 48 bits will be unique across all machines. For example
with sixteen machines you could use four bits to uniquely number each
machine and 44 bits for the incrementing counter.

rossum

Quote:


as noted, I may resort to using statistical analysis of the input (rdtsc
deltas, ...) in order to estimate the approximate entropy, where from this I
can estimate how much and how quickly entropy is being accumulated in the
RNG state.

potentially, I could dump some of this out in an auditory form as well,
since the ear may be able to hear patterns that the mind may miss (a hiss
would be a good sign, buzzes or tones being not so good, with silence being
the worst though...).

but, I will mostly just have to do said analysis...


--
Thomas Richter
Posted: Wed Apr 30, 2008 6:06 am
Guest
cr88192 wrote:
Quote:
well, this is once again in response to one of my RNG/PRNG issues...

right now, I am wondering if there is any "good" way to measure/estimate the
amount of "randomness" in an RNG's state.

There is a full book chapter (several hundred pages) about this question
in Knuth's seminal work "The Art of Programming". You want to look into
volume 2, "Numerical and Seminumerical Algorithms"; the entire first
half of the book is devoted to the question of Pseudo-RG and measuring
their performance.

Despite some ad-hoc methods, I recommend highly to look into the
spectral test, i.e. how well are the outputs of the rng are spread in a
multi-dimensional space, and the chi-square test, i.e. how well does the
rng approach the statistics of a "truely random" signal.

I cannot repeat Knuth here, there is simply too much material in this
book. I can only recommend going into a library and check it, or
actually buying the series (if you're serious in computer science,
there's probably no way around that step in first place).

So long,
Thomas
cr88192
Posted: Wed Apr 30, 2008 2:12 pm
Guest
"rossum" <rossum48@coldmail.com> wrote in message
news:bqjg14l6881q2t0obq5b3eqbkcen4qjuov@4ax.com...
Quote:
On Wed, 30 Apr 2008 16:35:28 +1000, "cr88192"
cr88192@NOSPAM.hotmail.com> wrote:


"John Reiser" <jreiser@BitWagon.com> wrote in message
news:fv8m3m02v1g@enews2.newsguy.com...
This is a bit like asking how high is up. It depends what you are
trying to do.

well, people can measure up, but it is not clear how to measure
accumulated
randomness...

There are several exceedingly good measures of randomness.
Consult: Knuth, Seminumerical Algorithms, vol.2 of The Art of Computer
Programming.


I don't have, nor can I likely get ahold of, this book (unless it is
online
somewhere as a free download).
Does your town have a library? Alternatively google for the Diehard
tests.


libraries: not one that would have this kind of book...

diehard tests:
I know of these, but these are good for testing the statistical randomness
of a PRNG, but have will not say much of anything WRT the "true" randomness
of a TRNG, which is what is in question here...


Quote:


the idea is to generate random numbers with a low probability of clash.

If "avoid generating the same number twice" is the only measure, then
the guaranteed *BEST* method is to generate ascending consecutive
integers
1, 2, 3, ... modulo some wide but efficient word size for your machine.
If you don't like the very high correlation between consecutive terms,
then simulate a ring counter having period 2**n -1. Consult any
textbook
on "linear feedback shift register".


this only works on a single machine...
Either use a central server to allocate numbers serially to all other
machines ...



if the algo is being used independently on many independent machines (for
example, in the case of a bottom-up/self-organizing system), then it is
not
possible to avoid clashes this way...
... or assign part of your 48 bits as a fixed unique number for each
machine and use the rest of the 48 bits for the incrementing counter.
The resulting 48 bits will be unique across all machines. For example
with sixteen machines you could use four bits to uniquely number each
machine and 44 bits for the incrementing counter.


of course, there is a big problem here:
you need a unique number for each machine (for example, the MAC address),
however, what if such a number is not conviniently available?...

this is where a TRNG works well.
if the output is truely random, there is a much higher chance of having a
unique value...


the big problem of a TRNG however, is the reliability of the entropy source.

I am not certain how random rdtsc deltas are (for example, what if the
processor, OS, and devices, are not actually generating truely random
variability, but are instead showing a psuedo-random result of the complex
interactions of the components?...).

even MS apparently messed up here, having a TRNG that could be exploited and
predicted...
cr88192
Posted: Wed Apr 30, 2008 2:34 pm
Guest
"Thomas Richter" <thor@math.tu-berlin.de> wrote in message
news:fv9jof$tdq$1@infosun2.rus.uni-stuttgart.de...
Quote:
cr88192 wrote:
well, this is once again in response to one of my RNG/PRNG issues...

right now, I am wondering if there is any "good" way to measure/estimate
the amount of "randomness" in an RNG's state.

There is a full book chapter (several hundred pages) about this question
in Knuth's seminal work "The Art of Programming". You want to look into
volume 2, "Numerical and Seminumerical Algorithms"; the entire first half
of the book is devoted to the question of Pseudo-RG and measuring their
performance.


I should not have written PRNG there, or made it clearer:
TRNG+PRNG

TRNG is used to drive a PRNG, but it is a joint RNG.
I have a 1024 bit state being regularly fed by theoretically random entropy
sources...

the TRNG aspect tries to keep the current state in a unique and continually
varying stste, wheras the PRNG aspect tries to generate good statistically
random numbers (the actual sources not generating very statistically-good
results).


Quote:
Despite some ad-hoc methods, I recommend highly to look into the spectral
test, i.e. how well are the outputs of the rng are spread in a
multi-dimensional space, and the chi-square test, i.e. how well does the
rng approach the statistics of a "truely random" signal.


and, that is the misconception here:
I already know how to test for statistical randomness, I wanted a measure of
the "true" randomness, however, a clear answer remains elusive...


Quote:
I cannot repeat Knuth here, there is simply too much material in this
book. I can only recommend going into a library and check it, or actually
buying the series (if you're serious in computer science, there's probably
no way around that step in first place).


sadly, I have no real libraries of this sort available, nor do I currently
have the means of buying any books...


Quote:
So long,
Thomas
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Thu Jul 24, 2008 10:26 pm