| |
 |
|
|
Science Forum Index » Compression Forum » misc: how random is 'random'?...
Page 2 of 2 Goto page Previous 1, 2
|
| Author |
Message |
| cr88192 |
Posted: Fri May 02, 2008 7:56 pm |
|
|
|
Guest
|
"rossum" <rossum48@coldmail.com> wrote in message
news:s6sl14hkv1jiirco8d44u0f9ileuf9mnj2@4ax.com...
Quote: On Thu, 1 May 2008 05:12:45 +1000, "cr88192"
cr88192@NOSPAM.hotmail.com> wrote:
libraries: not one that would have this kind of book...
Inter library loan? Knuth really is worth reading.
diehard tests:
I know of these, but these are good for testing the statistical randomness
of a PRNG, but have will not say much of anything WRT the "true"
randomness
of a TRNG, which is what is in question here...
Diehard tests a file of bytes, it does not test how those bytes were
generated.
yes, but it also only tells how statistically random they are, not how
truely random they are, and this difference is fairly important.
actually, I already ran the diehard tests, and I think the results are
acceptable.
ended up changing to primes other than the mersenne primes though, having
noticed what their hex values were, and then running a few example tests,
make me think they are not so good for RNG purposes (actually, exponents of
smaller primes seem more effective, for example, 251^12, rather than, for
example, 2^107-1).
for example, a deterministic permutation with a large period, could still
score very well in terms of statistical randomness, and thus pass the tests,
despite being very much deterministic.
meanwhile, a noise pattern with a strong bias towards 0, can infact be
truely far more random, but statistically it is not so random seeming (we
need PRNG-like "amplifier" logic in the mix to make it seem a lot more
random, which is what I had done).
|
|
|
| Back to top |
|
| cr88192 |
Posted: Fri May 02, 2008 8:00 pm |
|
|
|
Guest
|
"jacko" <jackokring@gmail.com> wrote in message
news:f03ccc8b-f5ef-4639-ad21-aee1aaf35001@c65g2000hsa.googlegroups.com...
Quote: hi I thought the disk seek driver introduced entropy in the linux rng.
still wonder why a windowed ladder for disk algorithm is not common.
maybe it's the task stalling effect. better throughput though when
thrashing onsets.
interesting, however, this will only really work as well within the kernel,
since in userspace most often files are being accesses from the
buffer-cache...
my current approach attempts to mine entropy from the scheduler
(theoretically, any chaotic system events, such as the minor delays
introduced by disk activity, ... should show up as slight interferences in
the timing).
> cheers jacko |
|
|
| Back to top |
|
| cr88192 |
Posted: Fri May 02, 2008 8:47 pm |
|
|
|
Guest
|
"Ed Prochak" <edprochak@gmail.com> wrote in message
news:e0af602e-e095-4bda-ad1c-e38d6d39806f@k13g2000hse.googlegroups.com...
Quote: On Apr 30, 3:34 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
"Thomas Richter" <t...@math.tu-berlin.de> wrote in message
news:fv9jof$tdq$1@infosun2.rus.uni-stuttgart.de...
[]
Despite some ad-hoc methods, I recommend highly to look into the
spectral
test, i.e. how well are the outputs of the rng are spread in a
multi-dimensional space, and the chi-square test, i.e. how well does
the
rng approach the statistics of a "truely random" signal.
and, that is the misconception here:
I already know how to test for statistical randomness, I wanted a measure
of
the "true" randomness, however, a clear answer remains elusive...
Statistical tests of randomness are tests of "true" randomness.
Something random merely means the outcome of the next trial is
independent of any previous trial. In your opinion, how random is
Brownian motion? (Is it random when, in classical physics, it is
theoretically possible to calculate the motion, given, of course,
information on ALL the interacting particles.)
from the POV of this system: no.
for my uses: yes.
the reason: even if the universe itself were/is deterministic, the amount of
internal entropy is sufficient to allow us to assume that it is random (go
find one reasonably-sized chunk of matter, such as a small rock, and then go
and look for another that is identical, how long until one is found?... very
possibly the internal state of this rock is far greater than the number of
rocks on earth...).
however, the mathematical definition is not useful for my purposes:
in my case, randomness means uniqueness and a non-deterministic outcome,
rather than an ideally-random statistical distribution.
two PRNGs with the same algo, and the same seed, will produce the same
output.
for my uses, this is not useful.
the reason is my usage:
that the numbers themselves are maximally unique, and may be generated from
any number of possible systems, each needing to generate numbers with a very
low probability of clash with any produced by any of the other systems.
so, the algo and initial state needs to be allowed to be the same, but the
output needs to be different.
what I want is not possible with a PRNG, since a PRNG produces a sequence no
more unique than the uniqueness of its input seed.
so, a non-deterministic entropy source is needed.
usually this would be a hardware device, but currently the only real device
I have (directly) available is the processor (OS-specific sources also
existing).
however, the processor contains its own unintentional entropy source: the
rdtsc instruction.
the reason this is an entropy source, is that nearly any system event,
generates an interrupt, which minorly disrupts the processor's instruction
count, and at unpredictable times, thus causing this value to be chaotic
(this choas being measurable and accumulatable...).
on linux, '/dev/random' also exists, as does CAPI on windows.
I can then siphon off these values, and feed them into the current seed
value (something resembling a PRNG is used, since otherwise the output of
rdtsc is not very exciting...), thus making it work like a kind of
statistical randomness-amplifier (me getting a non-deterministic stream of
also statistically-random values).
Quote: If you want to be extreme about randomness, you would use some quantum
mechanical input. You need to decide, since you apparently do not
agree with the mathematical definition.
my point is being missed...
I don't demand a higher statistical randomness than is allowed by the
mathematical definition, but rather, I am demanding a somewhat different
property...
of course, QM would be a good source of the kind of entropy I am looking
for, but I don't have this, thus I have to get by with what theoretically
non-deteministic sources I have available, and sadly, the inability to
determine about how much of this is non-deterministic, and how much is
deterministic...
however, I will just have to go and assume that it is "sufficient".
maybe I will at least gain as much as a few-hundred-bits worth, which is
enough...
Quote: have a nice day,
Ed
|
|
|
| Back to top |
|
| Greg Herlihy |
Posted: Sat May 03, 2008 10:37 am |
|
|
|
Guest
|
On May 2, 6:47 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
Quote: "Ed Prochak" <edproc...@gmail.com> wrote in message
news:e0af602e-e095-4bda-ad1c-e38d6d39806f@k13g2000hse.googlegroups.com...
On Apr 30, 3:34 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
"Thomas Richter" <t...@math.tu-berlin.de> wrote in message
news:fv9jof$tdq$1@infosun2.rus.uni-stuttgart.de...
Despite some ad-hoc methods, I recommend highly to look into the
spectral
test, i.e. how well are the outputs of the rng are spread in a
multi-dimensional space, and the chi-square test, i.e. how well does
the
rng approach the statistics of a "truely random" signal.
and, that is the misconception here:
I already know how to test for statistical randomness, I wanted a measure
of
the "true" randomness, however, a clear answer remains elusive...
Statistical tests of randomness are tests of "true" randomness.
Something random merely means the outcome of the next trial is
independent of any previous trial. In your opinion, how random is
Brownian motion? (Is it random when, in classical physics, it is
theoretically possible to calculate the motion, given, of course,
information on ALL the interacting particles.)
from the POV of this system: no.
for my uses: yes.
the reason: even if the universe itself were/is deterministic, the amount of
internal entropy is sufficient to allow us to assume that it is random (go
find one reasonably-sized chunk of matter, such as a small rock, and then go
and look for another that is identical, how long until one is found?... very
possibly the internal state of this rock is far greater than the number of
rocks on earth...).
however, the mathematical definition is not useful for my purposes:
in my case, randomness means uniqueness and a non-deterministic outcome,
rather than an ideally-random statistical distribution.
But randomness does not imply uniqueness - on the contrary. A set of
random numbers has a surprisingly high chance of containing duplicates
(the well-known "birthday paradox"). So reducing the likelihood of
duplicates in a set of generated numbers - necessarily requires
reducing the randomness of the method that generated the numbers in
the first place. So, extended to its logical conclusion, any
distributed system for generating unique values ultimately must rely
on a deterministic method of some sort to generate unique values.
Otherwise, relying on random number generation all by itself - could
not guarantee the set of generated numbers contains no duplicates.
In fact, the time and the location that each number is generated
happens to provide a nearly unique identification for that number
(across all time and locations). So it would make sense to incorporate
time and place information into any kind of GUID-generating scheme.
And in fact, the proposed standard for generating universally unique
identifiers relies does exactly that:
http://www.ietf.org/rfc/rfc4122.txt?number=4122
Greg |
|
|
| Back to top |
|
| cr88192 |
Posted: Sat May 03, 2008 8:31 pm |
|
|
|
Guest
|
"Greg Herlihy" <greghe@mac.com> wrote in message
news:da0ea59f-9b30-4693-85ba-3655e22cb0d3@r9g2000prd.googlegroups.com...
On May 2, 6:47 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
Quote: "Ed Prochak" <edproc...@gmail.com> wrote in message
<
But randomness does not imply uniqueness - on the contrary. A set of
random numbers has a surprisingly high chance of containing duplicates
(the well-known "birthday paradox"). So reducing the likelihood of
duplicates in a set of generated numbers - necessarily requires
reducing the randomness of the method that generated the numbers in
the first place. So, extended to its logical conclusion, any
distributed system for generating unique values ultimately must rely
on a deterministic method of some sort to generate unique values.
Otherwise, relying on random number generation all by itself - could
not guarantee the set of generated numbers contains no duplicates.
In fact, the time and the location that each number is generated
happens to provide a nearly unique identification for that number
(across all time and locations). So it would make sense to incorporate
time and place information into any kind of GUID-generating scheme.
And in fact, the proposed standard for generating universally unique
identifiers relies does exactly that:
http://www.ietf.org/rfc/rfc4122.txt?number=4122
Greg
yes, this is technically true, but the answer is this:
more bits.
while for, say, 1 billion unique items, we can get by with 32 bits and a
deterministic method, we can also use 64 bits and a good RNG, and have a
collision probability of about 1/16000000000.
in my case, this is good enough for me to regard the numbers as unique, even
if not provable as such (they will just work out this way in practice...). |
|
|
| Back to top |
|
| rossum... |
Posted: Sun May 04, 2008 4:11 am |
|
|
|
Guest
|
On Sun, 4 May 2008 11:31:27 +1000, "cr88192"
<cr88192 at (no spam) NOSPAM.hotmail.com> wrote:
Quote: yes, this is technically true, but the answer is this:
more bits.
while for, say, 1 billion unique items, we can get by with 32 bits and a
deterministic method, we can also use 64 bits and a good RNG, and have a
collision probability of about 1/16000000000.
in my case, this is good enough for me to regard the numbers as unique, even
if not provable as such (they will just work out this way in practice...).
A different answer might be to generate a provably unique number that
is obviously not random and then encrypt it. The resulting byte
string is both provably unique (because it can be decrypted) and
apparently random.
rossum |
|
|
| Back to top |
|
| cr88192... |
Posted: Sun May 04, 2008 3:56 pm |
|
|
|
Guest
|
"rossum" <rossum48 at (no spam) coldmail.com> wrote in message
news:95vq14pt79mucocnt573ro3m6oo8qducuf at (no spam) 4ax.com...
Quote: On Sun, 4 May 2008 11:31:27 +1000, "cr88192"
cr88192 at (no spam) NOSPAM.hotmail.com> wrote:
yes, this is technically true, but the answer is this:
more bits.
while for, say, 1 billion unique items, we can get by with 32 bits and a
deterministic method, we can also use 64 bits and a good RNG, and have a
collision probability of about 1/16000000000.
in my case, this is good enough for me to regard the numbers as unique,
even
if not provable as such (they will just work out this way in practice...).
A different answer might be to generate a provably unique number that
is obviously not random and then encrypt it. The resulting byte
string is both provably unique (because it can be decrypted) and
apparently random.
yes, however, we would need a provably unique number, and to extend it in a
way that is also provable not to damage its uniqueness.
for example, if we have a unique 32 bit number, we can tack on a 16 bit
number (say, a sequence number), and have a unique 48 bit number.
likelywise, we can tack 16 bits onto a MAC address, giving a unique 64 bit
number.
however, what if the spot we have is only 48 bits?
does every system have some necessarily unique 32 bit number?...
we have MAC addresses, but this would leave no space for adding a sequence
number.
all of these are problems, and using an RNG is much more practical (note, a
detail: I had been following the MAC-address formatting rules, each number
is encoded as a "locally-administered" address, thus a valid MAC address
would provably not clash with a randomly generated number).
|
|
|
| Back to top |
|
| Thomas Richter... |
Posted: Mon May 05, 2008 12:55 pm |
|
|
|
Guest
|
cr88192 wrote:
Quote: "rossum" <rossum48 at (no spam) coldmail.com> wrote in message
news:s6sl14hkv1jiirco8d44u0f9ileuf9mnj2 at (no spam) 4ax.com...
On Thu, 1 May 2008 05:12:45 +1000, "cr88192"
cr88192 at (no spam) NOSPAM.hotmail.com> wrote:
libraries: not one that would have this kind of book...
Inter library loan? Knuth really is worth reading.
I second this.
Quote: Diehard tests a file of bytes, it does not test how those bytes were
generated.
yes, but it also only tells how statistically random they are, not how
truely random they are, and this difference is fairly important.
In that case, forget it. A) because the term "truely random" is
ill-defined, and B) in case you mean "Kolmogorov complexity", it can be
shown that you cannot determine this algorithmically, which defeats your
purpose. All you can do *is* statistical testing.
Quote: for example, a deterministic permutation with a large period, could still
score very well in terms of statistical randomness, and thus pass the tests,
despite being very much deterministic.
Each Pseudo-RNG is deterministic by its very construction. There is
nothing against this. So, the question you should *really* ask is: For
which purpose do you need the numbers, and: When is random random enough?
So long,
Thomas |
|
|
| Back to top |
|
| cr88192... |
Posted: Mon May 05, 2008 5:52 pm |
|
|
|
Guest
|
"Thomas Richter" <thor at (no spam) math.tu-berlin.de> wrote in message
news:fvnhhr$e8p$1 at (no spam) infosun2.rus.uni-stuttgart.de...
Quote: cr88192 wrote:
"rossum" <rossum48 at (no spam) coldmail.com> wrote in message
news:s6sl14hkv1jiirco8d44u0f9ileuf9mnj2 at (no spam) 4ax.com...
On Thu, 1 May 2008 05:12:45 +1000, "cr88192"
cr88192 at (no spam) NOSPAM.hotmail.com> wrote:
libraries: not one that would have this kind of book...
Inter library loan? Knuth really is worth reading.
I second this.
Diehard tests a file of bytes, it does not test how those bytes were
generated.
yes, but it also only tells how statistically random they are, not how
truely random they are, and this difference is fairly important.
In that case, forget it. A) because the term "truely random" is
ill-defined, and B) in case you mean "Kolmogorov complexity", it can be
shown that you cannot determine this algorithmically, which defeats your
purpose. All you can do *is* statistical testing.
yeah.
Quote: for example, a deterministic permutation with a large period, could still
score very well in terms of statistical randomness, and thus pass the
tests, despite being very much deterministic.
Each Pseudo-RNG is deterministic by its very construction. There is
nothing against this. So, the question you should *really* ask is: For
which purpose do you need the numbers, and: When is random random enough?
the point is to generate presumably non-clashing numbers.
in this case, they would serve as segment numbers in some large address
space.
in my case, the numbers are 48 bit and random, however, they are encoded as
locally-administered MAC addresses.
the potential risk of clashes, is that two nodes using shared memory within
a given cluster, could conflict in terms of used segment numbers. in this
context however, the risk could be reduced mostly by having each node, when
creating a segment, checking with a kind of lookup server to verify that the
segment has not been used (if by some chance a positive hit comes back, a
new number is generated).
|
|
|
| Back to top |
|
| Thomas Richter... |
Posted: Tue May 06, 2008 5:29 am |
|
|
|
Guest
|
cr88192 wrote:
Quote: the point is to generate presumably non-clashing numbers.
in this case, they would serve as segment numbers in some large address
space.
In that case, you don't want "random" numbers, but a random
*permutation* of the 48bit space, which is something different. I forgot
the algorithm to do that, but I remember, it's in Knuth's book.
So long,
Thomas |
|
|
| Back to top |
|
| rossum... |
Posted: Tue May 06, 2008 11:47 am |
|
|
|
Guest
|
On Tue, 06 May 2008 12:29:37 +0200, Thomas Richter
<thor at (no spam) math.tu-berlin.de> wrote:
Quote: cr88192 wrote:
the point is to generate presumably non-clashing numbers.
in this case, they would serve as segment numbers in some large address
space.
In that case, you don't want "random" numbers, but a random
*permutation* of the 48bit space, which is something different. I forgot
the algorithm to do that, but I remember, it's in Knuth's book.
Knuth, Vol 2, Algorithm 3.4.2 P
An alternative shuffling algorithm would be an encryption as I
suggested elsethread.
rossum
|
|
|
| Back to top |
|
| cr88192... |
Posted: Wed May 07, 2008 12:57 am |
|
|
|
Guest
|
"Thomas Richter" <thor at (no spam) math.tu-berlin.de> wrote in message
news:fvpbq8$5dc$1 at (no spam) infosun2.rus.uni-stuttgart.de...
Quote: cr88192 wrote:
the point is to generate presumably non-clashing numbers.
in this case, they would serve as segment numbers in some large address
space.
In that case, you don't want "random" numbers, but a random *permutation*
of the 48bit space, which is something different. I forgot the algorithm
to do that, but I remember, it's in Knuth's book.
no, for the reason that a random permutation requires some amount of
centralization (for example, the segment addresses being generated from a
centralized source, wheras in this case, this is not possible).
|
|
|
| Back to top |
|
| |
Page 2 of 2 Goto page Previous 1, 2
All times are GMT - 5 Hours
The time now is Mon Jul 07, 2008 10:33 am
|
|