Main Page | Report Page

 

  Science Forum Index » Physics - Research Forum » Are we reaching the maximum of CPU performance ?...

Author Message
Nicolaas Vroom...
Posted: Sun Nov 07, 2010 7:18 am
 
I could also have raised the following question:
Which is currently the best performing CPU ?
This question is important for all people who try to simulate
physical systems.
In my case this is the movement of 7 planets around the Sun
using Newton's Law
I have tested 4 different types of CPU's.
In order to compare I use the same VB program.
In order to test performance the program
calculates the number of years in 1 minute.
a) In the case of a Intel Pentium R II processor I get 0.825 years
b) Genuineltel X86 Family 6 Model 8 I get 1.542 years
c) Intel Pentium R 4 CPU 2.8 GHZ I get 6 years
This one is roughly 8 years old. Load 100%
d) Intel Quad Core i5 M460 I get 3.9 years. Load 27%

In the last case I can also load the program 4 times.
On average I get 2.6 years. Load 100%

Does this mean that we have reached the top of CPU
performance ?
Ofcourse I can rewrite my program in a different language.
Maybe than my program runs faster but that does not
solve the issue.
I did not test a Dual Core, may be those are faster.
Of my Pentium R 4 CPU there apparently exists
a 3.3 GHZ version. Is that the solution?

Any suggestion what to do ?

Nicolaas Vroom
http://users.telenet.be/nicvroom/galaxy-mercury.htm
 
Dirk Bruere at NeoPax...
Posted: Sun Nov 07, 2010 5:09 pm
 
On 07/11/2010 17:18, Nicolaas Vroom wrote:
[quote]I could also have raised the following question:
Which is currently the best performing CPU ?
This question is important for all people who try to simulate
physical systems.
In my case this is the movement of 7 planets around the Sun
using Newton's Law
I have tested 4 different types of CPU's.
In order to compare I use the same VB program.
In order to test performance the program
calculates the number of years in 1 minute.
a) In the case of a Intel Pentium R II processor I get 0.825 years
b) Genuineltel X86 Family 6 Model 8 I get 1.542 years
c) Intel Pentium R 4 CPU 2.8 GHZ I get 6 years
This one is roughly 8 years old. Load 100%
d) Intel Quad Core i5 M460 I get 3.9 years. Load 27%

In the last case I can also load the program 4 times.
On average I get 2.6 years. Load 100%

Does this mean that we have reached the top of CPU
performance ?
Ofcourse I can rewrite my program in a different language.
Maybe than my program runs faster but that does not
solve the issue.
I did not test a Dual Core, may be those are faster.
Of my Pentium R 4 CPU there apparently exists
a 3.3 GHZ version. Is that the solution?

Any suggestion what to do ?
[/quote]
Use a top end graphics card and something like CUDA by nVidia.
That will give you up to a 100x increase in computing power to around
1TFLOPS

http://en.wikipedia.org/wiki/CUDA

Apart from that, processing power is still increasing by around 2 orders
of magnitude per decade. If that's not enough the DARPA exascale
computer is due around 2018 (10^18 FLOPS).

--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.blogtalkradio.com/onetribe - Occult Talk Show


[[Mod. note -- Sequential (single-thread) CPU speed has indeed roughly
plateaued since 2005 or so, at roughly 3 GHz and 4 instructions-per-cycle
out-of-order. More recent progress has been
* bigger and bigger caches [transparent to the programmer]
* some small increases in memory bandwidth [transparent to the progrrammer]
* lots of parallelism [NOT transparent to the programmer]

Parallelism today comes in many flavors [almost all of which must be
explicitly managed by the programmer]:
* lots of CPU chips are "multicore" and/or "multithreaded"
* lots of computer systems incorporate multiple CPUs
[this includes the graphics-card processors mentioned in this posting]

*If* your application is such that it can be (re)programmed to use
many individual processors working in parallel, then parallelism can
be very useful. Alas, small-N N-body simulations of the type
discussed by the original poster are notoriously hard to parallelize. :(

I'd also like to point out the existence of the newsgroup comp.arch
(unmoderated), devoted to discussions of computer architecture.
-- jt]]
 
Giorgio Pastore...
Posted: Sun Nov 07, 2010 5:17 pm
 
On 11/7/10 6:18 PM, Nicolaas Vroom wrote:
....
[quote]In my case this is the movement of 7 planets around the Sun
using Newton's Law
....
In order to compare I use the same VB program.
....
Any suggestion what to do ?
[/quote]

You said nothing about the algorithm you implemented in VB. If you chose
the wrong algorithm you may may waist a lot of CPU time.

Giorgio

[[Mod. note -- If your goal is actually to study physics via N-body
simulations of this sort, there's a considerable literature on clever
numerical methods/codes which are very accurate and efficient. A good
starting point to learn more might be
http://www.amara.com/papers/nbody.html
which has lots of references. It discusses both the small-N and
large-N cases (which require vastly different sorts of numerical
methods, and have very different accuracy/cost/fidelity tradeoffs).

A recent paper of interest (describing a set of simulations of Sun +
8 planets + Pluto + time-averaged Moon + approximate general relativistic
effects) is
J. Laskar & M. Gastineau
"Esistence of collisional trajectories of Mercury, Mars, and Venus
with the Earth"
Nature volume 459, 11 June 2009, pages 817--819
http://dx.doi.org/10.1038/nature08096
-- jt]]
 
Lester Welch...
Posted: Mon Nov 08, 2010 12:52 am
 
On Nov 7, 10:17†pm, Giorgio Pastore <past... at (no spam) units.it> wrote:
[quote]On 11/7/10 6:18 PM, Nicolaas Vroom wrote:
...

In my case this is the movement of 7 planets around the Sun
using Newton's Law
...
In order to compare I use the same VB program.
...
Any suggestion what to do ?

You said nothing about the algorithm you implemented in VB. If you chose
the wrong algorithm you may may waist a lot of CPU time.

Giorgio

[[Mod. note -- If your goal is actually to study physics via N-body
simulations of this sort, there's a considerable literature on clever
numerical methods/codes which are very accurate and efficient. †A good
starting point to learn more might be
†http://www.amara.com/papers/nbody.html
which has lots of references. †It discusses both the small-N and
large-N cases (which require vastly different sorts of numerical
methods, and have very different accuracy/cost/fidelity tradeoffs).

A recent paper of interest (describing a set of simulations of Sun +
8 planets + Pluto + time-averaged Moon + approximate general relativistic
effects) is
† J. Laskar & M. Gastineau
† "Esistence of collisional trajectories of Mercury, Mars, and Venus
† †with the Earth"
† Nature volume 459, 11 June 2009, pages 817--819
†http://dx.doi.org/10.1038/nature08096
-- jt]]
[/quote]
Isn't the OP comparing CPU's - not algorithms? An inefficient
algorithm run on a multitude of CPUs is a fair test of the CPUs is it
not?
 
Nicolaas Vroom...
Posted: Mon Nov 08, 2010 11:01 am
 
"Dirk Bruere at NeoPax" <dirk.bruere at (no spam) gmail.com> schreef in bericht
news:8jo9kuFc80U1 at (no spam) mid.individual.net...
[quote]On 07/11/2010 17:18, Nicolaas Vroom wrote:
a) In the case of a Intel Pentium R II processor I get 0.825 years
b) Genuineltel X86 Family 6 Model 8 I get 1.542 years
c) Intel Pentium R 4 CPU 2.8 GHZ I get 6 years
This one is roughly 8 years old. Load 100%
d) Intel Quad Core i5 M460 I get 3.9 years. Load 27%


Does this mean that we have reached the top of CPU
performance ?
Ofcourse I can rewrite my program in a different language.
Maybe than my program runs faster but that does not
solve the issue.
I did not test a Dual Core, may be those are faster.
Of my Pentium R 4 CPU there apparently exists
a 3.3 GHZ version. Is that the solution?

Any suggestion what to do ?

Use a top end graphics card and something like CUDA by nVidia.
That will give you up to a 100x increase in computing power to around
1TFLOPS
[/quote]
That is the same suggestion as my CPU vendor told me.
But this solution requires reprogramming and that is not what I want.
I also do not want to reprogram in C++

[quote]http://en.wikipedia.org/wiki/CUDA

--
Dirk

[[Mod. note -- Sequential (single-thread) CPU speed has indeed roughly
plateaued since 2005 or so, at roughly 3 GHz and 4 instructions-per-cycle
out-of-order. More recent progress has been
[/quote]
I did not know this. I was truelly amased by the results of my tests.
You buy something expensive and almost for 60% you get the best results.

[quote]* bigger and bigger caches [transparent to the programmer]
* some small increases in memory bandwidth [transparent to the
progrrammer]
* lots of parallelism [NOT transparent to the programmer]

Parallelism today comes in many flavors [almost all of which must be
explicitly managed by the programmer]:
* lots of CPU chips are "multicore" and/or "multithreaded"
* lots of computer systems incorporate multiple CPUs
[this includes the graphics-card processors mentioned in this posting]

*If* your application is such that it can be (re)programmed to use
many individual processors working in parallel, then parallelism can
be very useful.
I agree[/quote]
This is for example true for a game like chess.
SETI falls in this category.

[quote]Alas, small-N N-body simulations of the type
discussed by the original poster are notoriously hard to parallelize. Sad
That is correct.[/quote]
But there is an inpact for all simulations of allmost physical systems
where dependency is an issue.

[quote]I'd also like to point out the existence of the newsgroup comp.arch
(unmoderated), devoted to discussions of computer architecture.
-- jt]]
[/quote]
Thanks for the comments.

Nicolaas Vroom
 
Dirk Bruere at NeoPax...
Posted: Mon Nov 08, 2010 11:28 am
 
On 08/11/2010 10:52, Lester Welch wrote:

[quote]Isn't the OP comparing CPU's - not algorithms? An inefficient
algorithm run on a multitude of CPUs is a fair test of the CPUs is it
not?
[/quote]
Not necessarily if the algorithm is playing to a common weakness rather
than the increasing strengths of modern CPUs. For example, if the
algorithm requires significant HDD access, or overflows the onchip
cache, or simply requires more memory than is placed in the motherboard
and has to use a pagefile.

--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.blogtalkradio.com/onetribe - Occult Talk Show
 
Arnold Neumaier...
Posted: Mon Nov 08, 2010 11:28 am
 
Nicolaas Vroom wrote:
[quote]
Does this mean that we have reached the top of CPU
performance ?
[/quote]
We reached the top of CPU performance/core.

Performance is still increasing but by using slower
multiple core architectures and balancing the load.
This is the curent trend, and is likely to be so in the future.


[quote]Ofcourse I can rewrite my program in a different language.
Maybe than my program runs faster but that does not
solve the issue.
I did not test a Dual Core, may be those are faster.
Of my Pentium R 4 CPU there apparently exists
a 3.3 GHZ version. Is that the solution?

Any suggestion what to do ?
[/quote]
You need to use multiple cores.
If you are purely sequential code, you are unlucky.
Computers will become _slower_ for these.
 
Nicolaas Vroom...
Posted: Mon Nov 08, 2010 11:28 am
 
"Dirk Bruere at NeoPax" <dirk.bruere at (no spam) gmail.com> schreef in bericht
news:8jo9kuFc80U1 at (no spam) mid.individual.net...
[quote]On 07/11/2010 17:18, Nicolaas Vroom wrote:
I could also have raised the following question:
Which is currently the best performing CPU ?
This question is important for all people who try to simulate
physical systems.
In my case this is the movement of 7 planets around the Sun
using Newton's Law
I have tested 4 different types of CPU's.
In order to compare I use the same VB program.
In order to test performance the program
calculates the number of years in 1 minute.
a) In the case of a Intel Pentium R II processor I get 0.825 years
b) Genuineltel X86 Family 6 Model 8 I get 1.542 years
c) Intel Pentium R 4 CPU 2.8 GHZ I get 6 years
This one is roughly 8 years old. Load 100%
d) Intel Quad Core i5 M460 I get 3.9 years. Load 27%

In the last case I can also load the program 4 times.
On average I get 2.6 years. Load 100%

Does this mean that we have reached the top of CPU
performance ?
Ofcourse I can rewrite my program in a different language.
Maybe than my program runs faster but that does not
solve the issue.
I did not test a Dual Core, may be those are faster.
Of my Pentium R 4 CPU there apparently exists
a 3.3 GHZ version. Is that the solution?

Any suggestion what to do ?

Use a top end graphics card and something like CUDA by nVidia.
That will give you up to a 100x increase in computing power to around
1TFLOPS

http://en.wikipedia.org/wiki/CUDA

Apart from that, processing power is still increasing by around 2 orders
of magnitude per decade. If that's not enough the DARPA exascale
computer is due around 2018 (10^18 FLOPS).

--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.blogtalkradio.com/onetribe - Occult Talk Show


[[Mod. note -- Sequential (single-thread) CPU speed has indeed roughly
plateaued since 2005 or so, at roughly 3 GHz and 4 instructions-per-cycle
out-of-order. More recent progress has been
* bigger and bigger caches [transparent to the programmer]
* some small increases in memory bandwidth [transparent to the
progrrammer]
* lots of parallelism [NOT transparent to the programmer]

Parallelism today comes in many flavors [almost all of which must be
explicitly managed by the programmer]:
* lots of CPU chips are "multicore" and/or "multithreaded"
* lots of computer systems incorporate multiple CPUs
[this includes the graphics-card processors mentioned in this posting]

*If* your application is such that it can be (re)programmed to use
many individual processors working in parallel, then parallelism can
be very useful. Alas, small-N N-body simulations of the type
discussed by the original poster are notoriously hard to parallelize. :(

I'd also like to point out the existence of the newsgroup comp.arch
(unmoderated), devoted to discussions of computer architecture.
-- jt]][/quote]
 
Nicolaas Vroom...
Posted: Mon Nov 08, 2010 5:05 pm
 
"Giorgio Pastore" <pastgio at (no spam) units.it> schreef in bericht
news:4cd6fcf1$0$40011$4fafbaef at (no spam) reader3.news.tin.it...
[quote]On 11/7/10 6:18 PM, Nicolaas Vroom wrote:
...
In my case this is the movement of 7 planets around the Sun
using Newton's Law
...
In order to compare I use the same VB program.
...
Any suggestion what to do ?


You said nothing about the algorithm you implemented in VB. If you chose
the wrong algorithm you may may waist a lot of CPU time.

Giorgio
[/quote]
My goal is not to find the cleverst algorithm.
My goal is to find the best CPU using the same program.
What my tests show is that single cores are the fastest.
However there is a small chance that there are dual cores which are
faster, but I do not know if that is actual true.

Nicolaas Vroom

[quote][[Mod. note -- If your goal is actually to study physics via N-body
simulations of this sort, there's a considerable literature on clever
numerical methods/codes which are very accurate and efficient. A good
starting point to learn more might be
http://www.amara.com/papers/nbody.html
which has lots of references. It discusses both the small-N and
large-N cases (which require vastly different sorts of numerical
methods, and have very different accuracy/cost/fidelity tradeoffs).

A recent paper of interest (describing a set of simulations of Sun +
8 planets + Pluto + time-averaged Moon + approximate general relativistic
effects) is
J. Laskar & M. Gastineau
"Esistence of collisional trajectories of Mercury, Mars, and Venus
with the Earth"
Nature volume 459, 11 June 2009, pages 817--819
http://dx.doi.org/10.1038/nature08096
-- jt]]
[/quote]
 
Hans Aberg...
Posted: Mon Nov 08, 2010 5:13 pm
 
On 2010/11/08 22:28, Arnold Neumaier wrote:
[quote]Nicolaas Vroom wrote:

Does this mean that we have reached the top of CPU
performance ?

We reached the top of CPU performance/core.

Performance is still increasing but by using slower
multiple core architectures and balancing the load.
This is the curent trend, and is likely to be so in the future.
[/quote]
There is no technological limitation going into higher frequencies, but
energy consumption is higher than linear. So using parallelism at lower
frequencies requires less energy, and is easier to cool.


[[Mod. note -- Actually there are very difficult technological obstacles
to increasing CPU clock rates. Historically, surmouting these obstacles
has required lots of very clever electrical engineering and applied
physics (and huge amounts of money).

The result of this effort is that historically each successive generation
of semiconductor-fabrication process has been very roughly ~1/3 faster
than the previous generation (i.e., all other things being equal, each
successive generation allows roughly a 50% increase in clock frequency),
and uses roughly 1/2 to 2/3 the power at a given clock frequency. Alas,
power generally increases at least quadratically with clock frequency.
-- jt]]
 
Juan R. GonzŠlez-Ńlvarez...
Posted: Tue Nov 09, 2010 6:02 pm
 
Hans Aberg wrote on Mon, 08 Nov 2010 22:13:14 -0500:

[quote]On 2010/11/08 22:28, Arnold Neumaier wrote:
Nicolaas Vroom wrote:

Does this mean that we have reached the top of CPU performance ?

We reached the top of CPU performance/core.

Performance is still increasing but by using slower multiple core
architectures and balancing the load. This is the curent trend, and
is likely to be so in the future.

There is no technological limitation going into higher frequencies,
but energy consumption is higher than linear. So using parallelism at
lower frequencies requires less energy, and is easier to cool.
[/quote]
Currently there is stronger technological limitations (reflected in the
current frequency limits for the CPUs that you can buy). And we are
close to the physical limits for the current technology (that is why
optical or quantum computers are an active research topic).

You are right that energy consumption is not linear for increasing
frecuencies, but neither is for *algorithmic* parallelism. Duplicating
the number of cores roughly increase energetic consumption by a factor
of 2, but not the algorithmic power. There is many variables to consider
here but you would get about a 10-30% gain over a single Intel core.

For obtaining double algorithmic power you would maybe need 8x cores,
doing the real power consumption non-linear again. And I doubt that
VB software of the original poster can use all the cores in any decent
way.



--
http://www.canonicalscience.org/

BLOG:
http://www.canonicalscience.org/publications/canonicalsciencetoday/canonicalsciencetoday.html
 
Hans Aberg...
Posted: Wed Nov 10, 2010 7:45 am
 
On 2010/11/10 05:02, Juan R. Gonz√°lez-√Ālvarez wrote:
[quote]Does this mean that we have reached the top of CPU performance ?

We reached the top of CPU performance/core.

Performance is still increasing but by using slower multiple core
architectures and balancing the load. This is the curent trend, and
is likely to be so in the future.

There is no technological limitation going into higher frequencies,
but energy consumption is higher than linear. So using parallelism at
lower frequencies requires less energy, and is easier to cool.

Currently there is stronger technological limitations (reflected in the
current frequency limits for the CPUs that you can buy).
[/quote]
The fastest you can buy is over 5 Ghz, which sits i a mainframe (see WP
"Clock rate" article). So if energy consumption and and heat dissipation
wasn't a problem, you could have it in your laptop.

[quote]And we are
close to the physical limits for the current technology (that is why
optical or quantum computers are an active research topic).
[/quote]
Don't hold your breath for the latter.

[quote]You are right that energy consumption is not linear for increasing
frecuencies, but neither is for *algorithmic* parallelism. Duplicating
the number of cores roughly increase energetic consumption by a factor
of 2, but not the algorithmic power. There is many variables to consider
here but you would get about a 10-30% gain over a single Intel core.
[/quote]
That's why much parallel capacity currently moves into the GPU, where it
is needed and is fairly easy to parallelize algorithms.

[quote]For obtaining double algorithmic power you would maybe need 8x cores,
doing the real power consumption non-linear again. And I doubt that
VB software of the original poster can use all the cores in any decent
way.
[/quote]
Classical computer languages are made for single threading, so making
full parallelization isn't easy.
 
Hans Aberg...
Posted: Wed Nov 10, 2010 10:11 pm
 
On 2010/11/07 18:18, Nicolaas Vroom wrote:
[quote]d) Intel Quad Core i5 M460 I get 3.9 years. Load 27%

In the last case I can also load the program 4 times.
On average I get 2.6 years. Load 100%
[/quote]
If you only get about a quarter of full load on a quad core, perhaps you
haven't threaded it. Here is an OpenCL example implementing a FFT, which
using a rather old GPU gets 144 Gflops (see reference 27):
http://en.wikipedia.org/wiki/OpenCL
 
Hans Aberg...
Posted: Wed Nov 10, 2010 10:11 pm
 
On 2010/11/08 22:01, Nicolaas Vroom wrote:
[quote]Use a top end graphics card and something like CUDA by nVidia.
That will give you up to a 100x increase in computing power to around
1TFLOPS

That is the same suggestion as my CPU vendor told me.
But this solution requires reprogramming and that is not what I want.
I also do not want to reprogram in C++
[/quote]
I pointed out OpenCL in post that has not yet appeared. See WP article,
which has an example.
 
Arnold Neumaier...
Posted: Thu Nov 11, 2010 6:13 am
 
Juan R. Gonz?lez-?lvarez wrote:
[quote]Hans Aberg wrote on Mon, 08 Nov 2010 22:13:14 -0500:

On 2010/11/08 22:28, Arnold Neumaier wrote:
Nicolaas Vroom wrote:
Does this mean that we have reached the top of CPU performance ?
We reached the top of CPU performance/core.

Performance is still increasing but by using slower multiple core
architectures and balancing the load. This is the curent trend, and
is likely to be so in the future.
There is no technological limitation going into higher frequencies,
but energy consumption is higher than linear. So using parallelism at
lower frequencies requires less energy, and is easier to cool.

Currently there is stronger technological limitations (reflected in the
current frequency limits for the CPUs that you can buy). And we are
close to the physical limits for the current technology (that is why
optical or quantum computers are an active research topic).

You are right that energy consumption is not linear for increasing
frecuencies, but neither is for *algorithmic* parallelism. Duplicating
the number of cores roughly increase energetic consumption by a factor
of 2, but not the algorithmic power. There is many variables to consider
here but you would get about a 10-30% gain over a single Intel core.

For obtaining double algorithmic power you would maybe need 8x cores,
[/quote]
This depends very much on the algorithm.

Some algorithms (e.g., Monte Carlo simulation)
can be trivially parallelized, so the factor is only 2x.
For many othe cases, n times the alg. speed needs a number
of cores that grows slowly with n. And there are algorithms
that one can hardly spped up no matter how many cores one uses.
 
 
Page 1 of 2    Goto page 1, 2  Next
All times are GMT - 5 Hours
The time now is Wed Sep 03, 2014 12:17 am