| |
 |
|
|
Science Forum Index » Math - Numerical Analysis Forum » parallel random numbers generator
Page 1 of 1
|
| Author |
Message |
| Crni Gorac |
Posted: Fri May 02, 2008 10:13 am |
|
|
|
Guest
|
Need above for massively parallel Monte Carlo simulation. Read loads
of papers and other materials discussing it, and still am not able to
decide for the best option. So - any suggestion?
Thanks. |
|
|
| Back to top |
|
| Gordon Sande |
Posted: Fri May 02, 2008 3:21 pm |
|
|
|
Guest
|
On 2008-05-02 17:13:57 -0300, Crni Gorac <cgorac@gmail.com> said:
Quote: Need above for massively parallel Monte Carlo simulation. Read loads
of papers and other materials discussing it, and still am not able to
decide for the best option. So - any suggestion?
Thanks.
Do you need to be able to rerun exactly the same RNs?
How much computation will you do on each RN? Is the speed of
the PRNG important in the overall computation? Much of the
discussion of RN speed seems to assume that the other computation
is free! It matters to folks doing RN testing but one wonders
why it matters for others.
You might think of just having a RN server that does not have
to be parallel itself. |
|
|
| Back to top |
|
| Thiophene |
Posted: Sat May 03, 2008 12:36 am |
|
|
|
Guest
|
Nvidia have source code for the MT on their CUDA web-page. I guess
that might be fast enough for you.
The CUDA enabled graphics cards allow you to do super-computing in
your own home, you can use the GPU for general purpose computing.
Off-topic a bit, but if you need to generate Gaussian deviates really,
really fast download the assembly language code for the O'Connor
transform here:
http://code.google.com/p/lemontree/downloads/list
There are some demos there and timing code to show exactly how fast it
is.
Sean O'Connor |
|
|
| Back to top |
|
| Thiophene |
Posted: Sat May 03, 2008 12:39 am |
|
|
|
Guest
|
Nvidia have source code for the MT on their CUDA web-page. I guess
that might be fast enough for you.
The CUDA enabled graphics cards allow you to do super-computing in
your own home, you can use the GPU for general purpose computing.
Off-topic a bit, but if you need to generate Gaussian deviates really,
really fast download the assembly language code for the O'Connor
transform here:
http://code.google.com/p/lemontree/downloads/list
There are some demos there and timing code to show exactly how fast it
is.
Sean O'Connor |
|
|
| Back to top |
|
| Crni Gorac |
Posted: Sat May 03, 2008 5:53 am |
|
|
|
Guest
|
On May 2, 10:21 pm, Gordon Sande <g.sa...@worldnet.att.net> wrote:
Quote: On 2008-05-02 17:13:57 -0300, Crni Gorac <cgo...@gmail.com> said:
Need above for massively parallel Monte Carlo simulation. Read loads
of papers and other materials discussing it, and still am not able to
decide for the best option. So - any suggestion?
Thanks.
Do you need to be able to rerun exactly the same RNs?
Yes, it would be good to be able to do so.
Quote: How much computation will you do on each RN? Is the speed of
the PRNG important in the overall computation? Much of the
discussion of RN speed seems to assume that the other computation
is free! It matters to folks doing RN testing but one wonders
why it matters for others.
Yes, the speed of RNG is rather important here - RNG calculations may
be significant part of the overall calculation.
Quote: You might think of just having a RN server that does not have
to be parallel itself.
Thought about that, but seems that every reference is dismissing that
approach because usually (and in this particular case too) large
number of random numbers need to be generated, and the server may
become bottleneck. |
|
|
| Back to top |
|
| Crni Gorac |
Posted: Sat May 03, 2008 5:54 am |
|
|
|
Guest
|
On May 3, 12:36 pm, Thiophene <bitterlemo...@yahoo.ie> wrote:
Quote: Nvidia have source code for the MT on their CUDA web-page. I guess
that might be fast enough for you.
The CUDA enabled graphics cards allow you to do super-computing in
your own home, you can use the GPU for general purpose computing.
It is CUDA calculation in question, and MT is not satisfactory here -
too much of the state is needed, that is limiting with regard to the
CUDA optimizations... |
|
|
| Back to top |
|
| Thiophene... |
Posted: Sun May 04, 2008 4:21 am |
|
|
|
Guest
|
On May 3, 11:54 pm, Crni Gorac <cgo... at (no spam) gmail.com> wrote:
Quote: On May 3, 12:36 pm, Thiophene <bitterlemo... at (no spam) yahoo.ie> wrote:
Nvidia have source code for the MT on their CUDA web-page. I guess
that might be fast enough for you.
The CUDA enabled graphics cards allow you to do super-computing in
your own home, you can use the GPU for general purpose computing.
It is CUDA calculation in question, and MT is not satisfactory here -
too much of the state is needed, that is limiting with regard to the
CUDA optimizations...
I am not a salesperson for that graphics card company but you could
buy another, though I guess memory bandwidth...
I have tried many different RNG's for doing random permutations, and
you know what they are all really bad, even the so called best.
Better to call them Somewhat Disordered Number Generators. You know
about linear congruent generators, you know about Fibonacci
generators. Perhaps a 3-term Fibonacci generator would suffice. I can
understand your quandary, there are no good answers.
If you can quantify exactly how disordered the numbers need to be for
your application, maybe you will find that an LC might actually do.
That is what I do in relation to random permutations. I ask the
question how many different permutation, and how different are they
from each other. |
|
|
| Back to top |
|
| Crni Gorac... |
Posted: Sun May 04, 2008 12:19 pm |
|
|
|
Guest
|
OK, let me try to be somewhat more specific: Particular needs for this
simulation are alike, I guess, to needs of most of MC based
simulations to be implemented on CUDA, which means the demands from
RNG are: to generate many large sequences of random numbers in
parallel, to have these sequences manifesting low correlation between
each other and being generally "good" in other statistical regards,
then to have small state (Mersenne Twister need 624 numbers of state,
this is way too big) in order to be able to keep it in CUDA shared
memory (otherwise the performance will suffer heavily). Seems like
good choice may be to use some of RNGs with small state for each
thread of execution, but initialized with different seeds (generated
by another RNG). But again - there are way too much options available
for both "main" and "seeding" RNG, so the question is: which ones to
choose, and how to measure the quality of the results? |
|
|
| Back to top |
|
| Gordon Sande... |
Posted: Sun May 04, 2008 1:20 pm |
|
|
|
Guest
|
On 2008-05-04 11:21:33 -0300, Thiophene <bitterlemon40 at (no spam) yahoo.ie> said:
Quote: On May 3, 11:54 pm, Crni Gorac <cgo... at (no spam) gmail.com> wrote:
On May 3, 12:36 pm, Thiophene <bitterlemo... at (no spam) yahoo.ie> wrote:
Nvidia have source code for the MT on their CUDA web-page. I guess
that might be fast enough for you.
The CUDA enabled graphics cards allow you to do super-computing in
your own home, you can use the GPU for general purpose computing.
It is CUDA calculation in question, and MT is not satisfactory here -
too much of the state is needed, that is limiting with regard to the
CUDA optimizations...
I am not a salesperson for that graphics card company but you could
buy another, though I guess memory bandwidth...
I have tried many different RNG's for doing random permutations, and
you know what they are all really bad, even the so called best.
Random permutations of how many objects? N! (N factorial) gets very big
very fast and then you need a much greater cycle length to have a hope
of getting a representative permutation. Many do not realize just how
demanding random permutations are of a PRNG. But an elementary check
of the effective order of the requirement shows that many popular
PRNG with cycle length of (2^30)^k for small k are totally hopeless.
You suggest a k of 2 or 4 so it is not surprising that you have found
things not very good. The mathematics is the Geometry of Numbers from
the late 1800s as found in the Lattice or Spectral Tests of PRNGs.
If you tried something like Mersenne Twister and kept N small (or
moderate if you are lucky) then things might be better.
Quote: Better to call them Somewhat Disordered Number Generators. You know
about linear congruent generators, you know about Fibonacci
generators. Perhaps a 3-term Fibonacci generator would suffice. I can
understand your quandary, there are no good answers.
If you can quantify exactly how disordered the numbers need to be for
your application, maybe you will find that an LC might actually do.
That is what I do in relation to random permutations. I ask the
question how many different permutation, and how different are they
from each other. |
|
|
| Back to top |
|
| Gordon Sande... |
Posted: Sun May 04, 2008 5:35 pm |
|
|
|
Guest
|
On 2008-05-04 19:19:47 -0300, Crni Gorac <cgorac at (no spam) gmail.com> said:
Quote: OK, let me try to be somewhat more specific: Particular needs for this
simulation are alike, I guess, to needs of most of MC based
simulations to be implemented on CUDA, which means the demands from
RNG are: to generate many large sequences of random numbers in
parallel, to have these sequences manifesting low correlation between
each other and being generally "good" in other statistical regards,
then to have small state (Mersenne Twister need 624 numbers of state,
this is way too big) in order to be able to keep it in CUDA shared
memory (otherwise the performance will suffer heavily). Seems like
good choice may be to use some of RNGs with small state for each
thread of execution, but initialized with different seeds (generated
by another RNG). But again - there are way too much options available
for both "main" and "seeding" RNG, so the question is: which ones to
choose, and how to measure the quality of the results?
If it has a small state then it will have a short cycle. If you want
large sequences then you seem to want a long cycle. Having several
points active on a short cycle is not compatible with independence
of long sequences. You seem to keep asking for self contradictory or
mutually incompatible properties. |
|
|
| Back to top |
|
| Crni Gorac... |
Posted: Mon May 05, 2008 10:27 am |
|
|
|
Guest
|
On May 5, 12:35 am, Gordon Sande <g.sa... at (no spam) worldnet.att.net> wrote:
Quote:
If it has a small state then it will have a short cycle. If you want
large sequences then you seem to want a long cycle. Having several
points active on a short cycle is not compatible with independence
of long sequences. You seem to keep asking for self contradictory or
mutually incompatible properties.
Yep, I can see that... Ok, how about: large number (hundreds of
thousands or even millions) of moderate size (several hundreds)
sequences? |
|
|
| Back to top |
|
| Herman Rubin... |
Posted: Tue May 06, 2008 2:37 pm |
|
|
|
Guest
|
In article <784a6e8d-5124-4206-8040-ac42b5f35092 at (no spam) 34g2000hsh.googlegroups.com>,
Crni Gorac <cgorac at (no spam) gmail.com> wrote:
Quote: OK, let me try to be somewhat more specific: Particular needs for this
simulation are alike, I guess, to needs of most of MC based
simulations to be implemented on CUDA, which means the demands from
RNG are: to generate many large sequences of random numbers in
parallel, to have these sequences manifesting low correlation between
each other and being generally "good" in other statistical regards,
then to have small state (Mersenne Twister need 624 numbers of state,
this is way too big) in order to be able to keep it in CUDA shared
memory (otherwise the performance will suffer heavily). Seems like
good choice may be to use some of RNGs with small state for each
thread of execution, but initialized with different seeds (generated
by another RNG). But again - there are way too much options available
for both "main" and "seeding" RNG, so the question is: which ones to
choose, and how to measure the quality of the results?
If you are doing MC calculations with large numbers of
random numbers per calculation, or are doing some sort
of MCMC, independence is as important as anything else.
You need a long period, and you should XOR or binary add
with some physical random number generator, which should
get rid of the dependence, which can be very subtle. Do
not worry if the physical generator is the greatest, but
if you have to reuse it, it would help if its length is
reasonably incommensurate with that of the PRNG. Reusing
a file of physical random numbers allows repetition with
the same random numbers.
One problem with using random numbers of any type is that
even if the same random numbers are used, a slight change
in the parameters may cause exponential or normal or
even uniform random variables used in the program to come
from different terms in the sequence. This is very
difficult to overcome, and if it is necessary to compensate
for it, the problems are considerable.
Also, acceptance-rejection and related methods, the fastest
in practice, have problems on SIMD machines. As I have
no idea what CUDA means or allows, it is hard to give advice
on that level. Private email is acceptable.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hrubin at (no spam) stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558 |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Mon Oct 13, 2008 2:05 pm
|
|