| |
 |
|
|
Science Forum Index » Compression Forum » misc: how random is 'random'?...
Page 1 of 2 Goto page 1, 2 Next
|
| Author |
Message |
| cr88192 |
Posted: Mon Apr 28, 2008 9:05 pm |
|
|
|
Guest
|
well, this is once again in response to one of my RNG/PRNG issues...
right now, I am wondering if there is any "good" way to measure/estimate the
amount of "randomness" in an RNG's state.
the particular approach I am using looks about like this:
I have a 1024 bit state consisting of 32 32-bit numbers.
this state is seeded, initially, using rand, which is seeded using the rdtsc
instruction (x86 and friends, tells how many clock cycles the CPU has
currently gone through).
on each iteration:
the current rdtsc value is read, and added to the low-end of the array.
a linear multiply is pulled off between the array and the mersenne prime
near 2^31 (this multiply "wraps around", feeding the high end of the array
back into the bottom);
a hash is computed (multiplying each element by this prime and accumulating)
of which the upper half is used as the result.
likewise, a thread is also spawned that every 10ms goes through the above
process, and every second stores the current state to a file (this is used
as the input state the next time the thing is initialized). it is expected
that system events will cause the exact timing to vary...
theoretically, entropy should 'accumulate' in this array, so that over time
it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.
any thoughts?... |
|
|
| Back to top |
|
| Gene |
Posted: Tue Apr 29, 2008 6:26 am |
|
|
|
Guest
|
On Apr 28, 10:05 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
Quote: well, this is once again in response to one of my RNG/PRNG issues...
right now, I am wondering if there is any "good" way to measure/estimate the
amount of "randomness" in an RNG's state.
the particular approach I am using looks about like this:
I have a 1024 bit state consisting of 32 32-bit numbers.
this state is seeded, initially, using rand, which is seeded using the rdtsc
instruction (x86 and friends, tells how many clock cycles the CPU has
currently gone through).
on each iteration:
the current rdtsc value is read, and added to the low-end of the array.
a linear multiply is pulled off between the array and the mersenne prime
near 2^31 (this multiply "wraps around", feeding the high end of the array
back into the bottom);
a hash is computed (multiplying each element by this prime and accumulating)
of which the upper half is used as the result.
likewise, a thread is also spawned that every 10ms goes through the above
process, and every second stores the current state to a file (this is used
as the input state the next time the thing is initialized). it is expected
that system events will cause the exact timing to vary...
theoretically, entropy should 'accumulate' in this array, so that over time
it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.
any thoughts?...
This is a bit like asking how high is up. It depends what you are
trying to do. |
|
|
| Back to top |
|
| cr88192 |
Posted: Tue Apr 29, 2008 3:53 pm |
|
|
|
Guest
|
"Gene" <gene.ressler@gmail.com> wrote in message
news:0fd5f7ee-6b13-4841-8838-0db03b0ac669@i76g2000hsf.googlegroups.com...
Quote: On Apr 28, 10:05 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
theoretically, entropy should 'accumulate' in this array, so that over
time
it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.
any thoughts?...
This is a bit like asking how high is up. It depends what you are
trying to do.
well, people can measure up, but it is not clear how to measure accumulated
randomness...
the idea is to generate random numbers with a low probability of clash.
examples would be GUIDs/UUIDs, PUIDs (a term I use: Probably Unique IDs,
....).
in this particular case, the intended task is to generate 48 bit PUIDs with
the intention of being used as "segment" addresses in a 128 bit address
space (the practicality of which is debated, basically the idea is to have a
128 bit space which can represent both stores and shared-memory segments on
a potentially arbitrary, but likely small, number of nodes...). randomness
is desired as an attempt to minimize any two nodes accidentally picking the
same address...
theoretically, the chances are 1/(2^4 if I have a decent RNG, but are much
lower if the RNG lacks sufficient "unique" entropy.
one crude way I guess would be to do statistical analysis of my current main
entropy source, namely rdtsc, and try to make an estimate as to the amount
of entropy I am getting from it...
or such... |
|
|
| Back to top |
|
| John Reiser |
Posted: Tue Apr 29, 2008 9:39 pm |
|
|
|
Guest
|
Quote: This is a bit like asking how high is up. It depends what you are
trying to do.
well, people can measure up, but it is not clear how to measure accumulated
randomness...
There are several exceedingly good measures of randomness.
Consult: Knuth, Seminumerical Algorithms, vol.2 of The Art of Computer Programming.
Quote: the idea is to generate random numbers with a low probability of clash.
If "avoid generating the same number twice" is the only measure, then
the guaranteed *BEST* method is to generate ascending consecutive integers
1, 2, 3, ... modulo some wide but efficient word size for your machine.
If you don't like the very high correlation between consecutive terms,
then simulate a ring counter having period 2**n -1. Consult any textbook
on "linear feedback shift register".
-- |
|
|
| Back to top |
|
| Neil |
Posted: Tue Apr 29, 2008 11:14 pm |
|
|
|
Guest
|
cr88192 wrote:
Quote: well, this is once again in response to one of my RNG/PRNG issues...
right now, I am wondering if there is any "good" way to measure/estimate the
amount of "randomness" in an RNG's state.
the particular approach I am using looks about like this:
I have a 1024 bit state consisting of 32 32-bit numbers.
this state is seeded, initially, using rand, which is seeded using the rdtsc
instruction (x86 and friends, tells how many clock cycles the CPU has
currently gone through).
on each iteration:
the current rdtsc value is read, and added to the low-end of the array.
a linear multiply is pulled off between the array and the mersenne prime
near 2^31 (this multiply "wraps around", feeding the high end of the array
back into the bottom);
a hash is computed (multiplying each element by this prime and accumulating)
of which the upper half is used as the result.
likewise, a thread is also spawned that every 10ms goes through the above
process, and every second stores the current state to a file (this is used
as the input state the next time the thing is initialized). it is expected
that system events will cause the exact timing to vary...
theoretically, entropy should 'accumulate' in this array, so that over time
it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.
any thoughts?...
Yes, but you are not going to find a short answer. There are measures
for distribution and how long before it repeats. |
|
|
| Back to top |
|
| cr88192 |
Posted: Wed Apr 30, 2008 1:35 am |
|
|
|
Guest
|
"John Reiser" <jreiser@BitWagon.com> wrote in message
news:fv8m3m02v1g@enews2.newsguy.com...
Quote: This is a bit like asking how high is up. It depends what you are
trying to do.
well, people can measure up, but it is not clear how to measure
accumulated
randomness...
There are several exceedingly good measures of randomness.
Consult: Knuth, Seminumerical Algorithms, vol.2 of The Art of Computer
Programming.
I don't have, nor can I likely get ahold of, this book (unless it is online
somewhere as a free download).
Quote:
the idea is to generate random numbers with a low probability of clash.
If "avoid generating the same number twice" is the only measure, then
the guaranteed *BEST* method is to generate ascending consecutive integers
1, 2, 3, ... modulo some wide but efficient word size for your machine.
If you don't like the very high correlation between consecutive terms,
then simulate a ring counter having period 2**n -1. Consult any textbook
on "linear feedback shift register".
this only works on a single machine...
if the algo is being used independently on many independent machines (for
example, in the case of a bottom-up/self-organizing system), then it is not
possible to avoid clashes this way...
as noted, I may resort to using statistical analysis of the input (rdtsc
deltas, ...) in order to estimate the approximate entropy, where from this I
can estimate how much and how quickly entropy is being accumulated in the
RNG state.
potentially, I could dump some of this out in an auditory form as well,
since the ear may be able to hear patterns that the mind may miss (a hiss
would be a good sign, buzzes or tones being not so good, with silence being
the worst though...).
but, I will mostly just have to do said analysis...
> -- |
|
|
| Back to top |
|
| cr88192 |
Posted: Wed Apr 30, 2008 1:43 am |
|
|
|
Guest
|
"Neil" <NeilKurzm@worldnet.att.net> wrote in message
news:4817f20b$0$11599$607ed4bc@cv.net...
Quote: cr88192 wrote:
well, this is once again in response to one of my RNG/PRNG issues...
right now, I am wondering if there is any "good" way to measure/estimate
the amount of "randomness" in an RNG's state.
<snip>
Quote:
theoretically, entropy should 'accumulate' in this array, so that over
time it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.
any thoughts?...
Yes, but you are not going to find a short answer. There are measures for
distribution and how long before it repeats.
distribution, yes...
howeve, should this thing ever repeat, that shows that this whole effort has
failed.
the point is that I am attempting to make a "true" RNG, not a PRNG, which is
where rdtsc and the monitor thread come in: they are an attempt to mine
entropy from the system itself, expected in the form of non-deterministic
noise resulting from the interactions of the hardware, the OS, and the
processor...
it is a similar approach to my former "clock() masturbation" approaches,
except that rdtsc returns elapsed processor cycles, which give much larger
and much more likely choatic numbers (especially considering that I
endlessly call sleep in order to maximize scheduler effects).
I will see though, I will have to analyze the signal.
worst case it is "silent", which would mean that there is almost no choas in
these events...
or such... |
|
|
| Back to top |
|
| cr88192 |
Posted: Wed Apr 30, 2008 5:54 am |
|
|
|
Guest
|
"cr88192" <cr88192@NOSPAM.hotmail.com> wrote in message
news:af21d$48167ec4$ca83b482$19141@saipan.com...
Quote: well, this is once again in response to one of my RNG/PRNG issues...
right now, I am wondering if there is any "good" way to measure/estimate
the amount of "randomness" in an RNG's state.
<snip>
Quote:
theoretically, entropy should 'accumulate' in this array, so that over
time it becomes ever more random.
the problem is, I am not sure how quickly or reliably this will occur.
any thoughts?...
ok, so here is an update:
did a whole lot of fiddling and looking at the values (rdtsc deltas).
please excuse my horridly crude treatment of statistics (and me ignoring the
risk of there being some pattern in the values returned, which at present
would require LPC filtering and similar to eliminate).
(there is enough noise to make me suspect that at least a good part of this
is actual entropy, a recording sounding about like a broken radio...).
here is a ranking in terms of value probability distribution (the actual
values in question are not displayed, rather what is visible is the counts
from the top ranking members in a histogram).
rank: count lower/higher=low/high,high/low
0: 931011 931011/3263293=0.2853,3.5051
1: 506751 1437762/2756542=0.5216,1.9172
2: 380757 1818519/2375785=0.7654,1.3064
3: 217757 2036276/2158028=0.9436,1.0598
4: 170588 2206864/1987440=1.1104,0.9006
5: 91644 2298508/1895796=1.2124,0.8248
6: 87101 2385609/1808695=1.3190,0.7582
7: 76866 2462475/1731829=1.4219,0.7033
8: 75809 2538284/1656020=1.5328,0.6524
9: 69182 2607466/1586838=1.6432,0.6086
10: 67574 2675040/1519264=1.7607,0.5679
11: 67506 2742546/1451758=1.8891,0.5293
12: 62471 2805017/1389287=2.0190,0.4953
13: 62367 2867384/1326920=2.1609,0.4628
14: 61064 2928448/1265856=2.3134,0.4323
15: 60482 2988930/1205374=2.4797,0.4033
as can be noted, the ratio is closest to 1 at about index 3.
as a crude estimate, I will assert that I am getting at least about
log2(3)=1.584962 bits of entropy per step.
so, if I call rdtsc() 64 times, I get around 101 bits of entropy, and 646
calls is sufficient to completely seed a 1024 bit RNG state (1024 steps
would play it safe).
this should be more than sufficient for generating probably-unique random
numbers... |
|
|
| Back to top |
|
| rossum |
Posted: Wed Apr 30, 2008 5:57 am |
|
|
|
Guest
|
On Wed, 30 Apr 2008 16:35:28 +1000, "cr88192"
<cr88192@NOSPAM.hotmail.com> wrote:
Quote:
"John Reiser" <jreiser@BitWagon.com> wrote in message
news:fv8m3m02v1g@enews2.newsguy.com...
This is a bit like asking how high is up. It depends what you are
trying to do.
well, people can measure up, but it is not clear how to measure
accumulated
randomness...
There are several exceedingly good measures of randomness.
Consult: Knuth, Seminumerical Algorithms, vol.2 of The Art of Computer
Programming.
I don't have, nor can I likely get ahold of, this book (unless it is online
somewhere as a free download).
Does your town have a library? Alternatively google for the Diehard
tests.
Quote:
the idea is to generate random numbers with a low probability of clash.
If "avoid generating the same number twice" is the only measure, then
the guaranteed *BEST* method is to generate ascending consecutive integers
1, 2, 3, ... modulo some wide but efficient word size for your machine.
If you don't like the very high correlation between consecutive terms,
then simulate a ring counter having period 2**n -1. Consult any textbook
on "linear feedback shift register".
this only works on a single machine...
Either use a central server to allocate numbers serially to all other
machines ...
Quote:
if the algo is being used independently on many independent machines (for
example, in the case of a bottom-up/self-organizing system), then it is not
possible to avoid clashes this way...
.... or assign part of your 48 bits as a fixed unique number for each
machine and use the rest of the 48 bits for the incrementing counter.
The resulting 48 bits will be unique across all machines. For example
with sixteen machines you could use four bits to uniquely number each
machine and 44 bits for the incrementing counter.
rossum
Quote:
as noted, I may resort to using statistical analysis of the input (rdtsc
deltas, ...) in order to estimate the approximate entropy, where from this I
can estimate how much and how quickly entropy is being accumulated in the
RNG state.
potentially, I could dump some of this out in an auditory form as well,
since the ear may be able to hear patterns that the mind may miss (a hiss
would be a good sign, buzzes or tones being not so good, with silence being
the worst though...).
but, I will mostly just have to do said analysis...
--
|
|
|
| Back to top |
|
| Thomas Richter |
Posted: Wed Apr 30, 2008 6:06 am |
|
|
|
Guest
|
cr88192 wrote:
Quote: well, this is once again in response to one of my RNG/PRNG issues...
right now, I am wondering if there is any "good" way to measure/estimate the
amount of "randomness" in an RNG's state.
There is a full book chapter (several hundred pages) about this question
in Knuth's seminal work "The Art of Programming". You want to look into
volume 2, "Numerical and Seminumerical Algorithms"; the entire first
half of the book is devoted to the question of Pseudo-RG and measuring
their performance.
Despite some ad-hoc methods, I recommend highly to look into the
spectral test, i.e. how well are the outputs of the rng are spread in a
multi-dimensional space, and the chi-square test, i.e. how well does the
rng approach the statistics of a "truely random" signal.
I cannot repeat Knuth here, there is simply too much material in this
book. I can only recommend going into a library and check it, or
actually buying the series (if you're serious in computer science,
there's probably no way around that step in first place).
So long,
Thomas |
|
|
| Back to top |
|
| cr88192 |
Posted: Wed Apr 30, 2008 2:12 pm |
|
|
|
Guest
|
"rossum" <rossum48@coldmail.com> wrote in message
news:bqjg14l6881q2t0obq5b3eqbkcen4qjuov@4ax.com...
Quote: On Wed, 30 Apr 2008 16:35:28 +1000, "cr88192"
cr88192@NOSPAM.hotmail.com> wrote:
"John Reiser" <jreiser@BitWagon.com> wrote in message
news:fv8m3m02v1g@enews2.newsguy.com...
This is a bit like asking how high is up. It depends what you are
trying to do.
well, people can measure up, but it is not clear how to measure
accumulated
randomness...
There are several exceedingly good measures of randomness.
Consult: Knuth, Seminumerical Algorithms, vol.2 of The Art of Computer
Programming.
I don't have, nor can I likely get ahold of, this book (unless it is
online
somewhere as a free download).
Does your town have a library? Alternatively google for the Diehard
tests.
libraries: not one that would have this kind of book...
diehard tests:
I know of these, but these are good for testing the statistical randomness
of a PRNG, but have will not say much of anything WRT the "true" randomness
of a TRNG, which is what is in question here...
Quote:
the idea is to generate random numbers with a low probability of clash.
If "avoid generating the same number twice" is the only measure, then
the guaranteed *BEST* method is to generate ascending consecutive
integers
1, 2, 3, ... modulo some wide but efficient word size for your machine.
If you don't like the very high correlation between consecutive terms,
then simulate a ring counter having period 2**n -1. Consult any
textbook
on "linear feedback shift register".
this only works on a single machine...
Either use a central server to allocate numbers serially to all other
machines ...
if the algo is being used independently on many independent machines (for
example, in the case of a bottom-up/self-organizing system), then it is
not
possible to avoid clashes this way...
... or assign part of your 48 bits as a fixed unique number for each
machine and use the rest of the 48 bits for the incrementing counter.
The resulting 48 bits will be unique across all machines. For example
with sixteen machines you could use four bits to uniquely number each
machine and 44 bits for the incrementing counter.
of course, there is a big problem here:
you need a unique number for each machine (for example, the MAC address),
however, what if such a number is not conviniently available?...
this is where a TRNG works well.
if the output is truely random, there is a much higher chance of having a
unique value...
the big problem of a TRNG however, is the reliability of the entropy source.
I am not certain how random rdtsc deltas are (for example, what if the
processor, OS, and devices, are not actually generating truely random
variability, but are instead showing a psuedo-random result of the complex
interactions of the components?...).
even MS apparently messed up here, having a TRNG that could be exploited and
predicted... |
|
|
| Back to top |
|
| cr88192 |
Posted: Wed Apr 30, 2008 2:34 pm |
|
|
|
Guest
|
"Thomas Richter" <thor@math.tu-berlin.de> wrote in message
news:fv9jof$tdq$1@infosun2.rus.uni-stuttgart.de...
Quote: cr88192 wrote:
well, this is once again in response to one of my RNG/PRNG issues...
right now, I am wondering if there is any "good" way to measure/estimate
the amount of "randomness" in an RNG's state.
There is a full book chapter (several hundred pages) about this question
in Knuth's seminal work "The Art of Programming". You want to look into
volume 2, "Numerical and Seminumerical Algorithms"; the entire first half
of the book is devoted to the question of Pseudo-RG and measuring their
performance.
I should not have written PRNG there, or made it clearer:
TRNG+PRNG
TRNG is used to drive a PRNG, but it is a joint RNG.
I have a 1024 bit state being regularly fed by theoretically random entropy
sources...
the TRNG aspect tries to keep the current state in a unique and continually
varying stste, wheras the PRNG aspect tries to generate good statistically
random numbers (the actual sources not generating very statistically-good
results).
Quote: Despite some ad-hoc methods, I recommend highly to look into the spectral
test, i.e. how well are the outputs of the rng are spread in a
multi-dimensional space, and the chi-square test, i.e. how well does the
rng approach the statistics of a "truely random" signal.
and, that is the misconception here:
I already know how to test for statistical randomness, I wanted a measure of
the "true" randomness, however, a clear answer remains elusive...
Quote: I cannot repeat Knuth here, there is simply too much material in this
book. I can only recommend going into a library and check it, or actually
buying the series (if you're serious in computer science, there's probably
no way around that step in first place).
sadly, I have no real libraries of this sort available, nor do I currently
have the means of buying any books...
|
|
|
| Back to top |
|
| Ed Prochak |
Posted: Fri May 02, 2008 3:10 am |
|
|
|
Guest
|
On Apr 30, 3:34 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
Quote: "Thomas Richter" <t...@math.tu-berlin.de> wrote in message
news:fv9jof$tdq$1@infosun2.rus.uni-stuttgart.de...
[]
Despite some ad-hoc methods, I recommend highly to look into the spectral
test, i.e. how well are the outputs of the rng are spread in a
multi-dimensional space, and the chi-square test, i.e. how well does the
rng approach the statistics of a "truely random" signal.
and, that is the misconception here:
I already know how to test for statistical randomness, I wanted a measure of
the "true" randomness, however, a clear answer remains elusive...
Statistical tests of randomness are tests of "true" randomness.
Something random merely means the outcome of the next trial is
independent of any previous trial. In your opinion, how random is
Brownian motion? (Is it random when, in classical physics, it is
theoretically possible to calculate the motion, given, of course,
information on ALL the interacting particles.)
If you want to be extreme about randomness, you would use some quantum
mechanical input. You need to decide, since you apparently do not
agree with the mathematical definition.
have a nice day,
Ed |
|
|
| Back to top |
|
| rossum |
Posted: Fri May 02, 2008 5:47 am |
|
|
|
Guest
|
On Thu, 1 May 2008 05:12:45 +1000, "cr88192"
<cr88192@NOSPAM.hotmail.com> wrote:
Quote: libraries: not one that would have this kind of book...
Inter library loan? Knuth really is worth reading.
Quote:
diehard tests:
I know of these, but these are good for testing the statistical randomness
of a PRNG, but have will not say much of anything WRT the "true" randomness
of a TRNG, which is what is in question here...
Diehard tests a file of bytes, it does not test how those bytes were
generated.
rossum |
|
|
| Back to top |
|
| jacko |
Posted: Fri May 02, 2008 6:07 am |
|
|
|
Guest
|
hi I thought the disk seek driver introduced entropy in the linux rng.
still wonder why a windowed ladder for disk algorithm is not common.
maybe it's the task stalling effect. better throughput though when
thrashing onsets.
cheers jacko |
|
|
| Back to top |
|
| |
Page 1 of 2 Goto page 1, 2 Next
All times are GMT - 5 Hours
The time now is Fri Sep 05, 2008 5:11 am
|
|