Main Page | Report Page

 

  Computers Forum Index » Computer Architecture » mips per watt - ARM vs. X86...

Author Message
Stephen Fuld...
Posted: Tue Sep 28, 2010 8:48 pm
 
It seems to be the "conventional wisdom" that ARM has higher mips per
watt than X86. This leads to several questions.

1. Is it true?


2. If it is true, what is it about ARM that provides this? Mitch Alsup
has estimated the more complex instruction decoding for X86 costs
perhaps 5%, but even if true, that can't be the whole difference. Is it
perhaps that neither Intel nor AMD has really focused on mips per watt
and could do a lot better if they wanted to? It would seem that, at
least Intel would benefit from its better fabs, but that doesn't seemed
to compensate. Is it the case that mips per watt is a function of mips?
That is, it costs more watts per mip to provide higher mips and the
difference in mips per watt is just an artifact of the different target
market segments?

3. If it isn't true, what is the reason for the conventional wisdom?



--
- Stephen Fuld
(e-mail address disguised to prevent spam)
 
Michael S...
Posted: Tue Sep 28, 2010 9:02 pm
 
On Sep 28, 6:48 pm, Stephen Fuld <SF... at (no spam) alumni.cmu.edu.invalid> wrote:
Quote:
It seems to be the "conventional wisdom" that ARM has higher mips per
watt than X86.  This leads to several questions.

1.      Is it true?


Very likely. ARM Cortex-M3 datasheet promises 0.19 mW/MIPS at 180nm.
No actual M3 chip (at least from big vendors) comes close to that
figure. However there are several M3 chips that deliver ~1 mW/MIPS
which is much (at least 10x) better than the best 180nm x86.
Now the comparison is not exactly fair since in absolute performance
180nm x86 was ~10-20 times faster than the fastest Cortex M3.
ARM Cortex A9 vs Intel Atom is more fair since top absolute
performance is pretty close. There is hard to get precise numbers
since A9 SOCs tend to pack more non-CPU functionality than Atom
counterparts, but it seems that at equal performance A9 consumes at
least twice less power under full load. Until recently it also
consumed enormously (more than 100x) less power when idle, but Intel
mostly fixed that with their latest Moorestown (Z6xx) SoC.
One thing we should take into account when comparing Atom vs Cortex-A9
is that, due to OoO, A9 is more "robust". In the other words, it sucks
less when running ARM Cortex code not specifically optimized for it.
On the rather hand, Atom sucks rather badly when running code
optimized for "major" x86 cores from Intel and AMD, less so for
integer, more so for FP/SIMD code. So comparisons based on SPECCPU or
some other benchmarks compiled specifically for target
microarchitecture show Atom in better light than typical experience of
majority of users. Much less so for Cortex-A9

I think, 2.5-3 years down the road we will be in better position for
comparison since then we will have AMD Bobcat vs ARM Cortex A15. Both
target more or less the same performance point and both feature
similarly capable OoO.
 
Robert Myers...
Posted: Wed Sep 29, 2010 12:15 am
 
On Sep 28, 5:02 pm, Michael S <already5cho... at (no spam) yahoo.com> wrote:
Quote:
On Sep 28, 6:48 pm, Stephen Fuld <SF... at (no spam) alumni.cmu.edu.invalid> wrote:

It seems to be the "conventional wisdom" that ARM has higher mips per
watt than X86.  This leads to several questions.

1.      Is it true?

Very likely. ARM Cortex-M3 datasheet promises 0.19 mW/MIPS at 180nm.
No actual M3 chip (at least from big vendors) comes close to that
figure. However there are several M3 chips that deliver ~1 mW/MIPS
which is much (at least 10x) better than the best 180nm x86.
Now the comparison is not exactly fair since in absolute performance
180nm x86 was ~10-20 times faster than the fastest Cortex M3.
ARM Cortex A9 vs Intel Atom is more fair since top absolute
performance is pretty close. There is hard to get precise numbers
since A9 SOCs tend to pack more non-CPU functionality than Atom
counterparts, but it seems that at equal performance A9 consumes at
least twice less power under full load. Until recently it also
consumed enormously (more than 100x) less power when idle, but Intel
mostly fixed that with their latest Moorestown (Z6xx) SoC.
One thing we should take into account when comparing Atom vs Cortex-A9
is that, due to OoO, A9 is more "robust". In the other words, it sucks
less when running ARM Cortex code not specifically optimized for it.
On the rather hand, Atom sucks rather badly when running code
optimized for "major" x86 cores from Intel and AMD, less so for
integer, more so for FP/SIMD code. So comparisons based on SPECCPU or
some other benchmarks compiled specifically for target
microarchitecture show Atom in better light than typical experience of
majority of users. Much less so for Cortex-A9

I think, 2.5-3 years down the road we will be in better position for
comparison since then we will have AMD Bobcat vs ARM Cortex A15. Both
target more or less the same performance point and both feature
similarly capable OoO.

For anyone interested in trying to guess the future direction of HPC,
how far Atom can be pushed in terms or performance/watt relative to
ARM is an important question.

At the risk of starting an AMD fanboy flame war, I wonder about the
relevance/usefulness/motivation of the "information" you have
provided.

What possible use is information from a 180 nm generation of Intel
chips unless, as I suspect, you are looking to support an Intel-will-
never-get-there bias (immovable object?) on your part? That suspicion
is further deepened for me by your urging us to wait to look at Bobcat
for a real answer. The suspicion is deepened still further by your
"Atom sucks at code not specifically compiled for it" potshot. For
whatever it's worth, Fedora distributions are being compiled with Atom
as a target.

In the end, I'm as clueless as I was about the prospects for Atom
relative to Arm as I was before you started your post, but it's clear
you don't like Intel.

Robert.
 
Michael S...
Posted: Wed Sep 29, 2010 1:59 am
 
On Sep 29, 2:15 am, Robert Myers <rbmyers... at (no spam) gmail.com> wrote:
Quote:
On Sep 28, 5:02 pm, Michael S <already5cho... at (no spam) yahoo.com> wrote:

For anyone interested in trying to guess the future direction of HPC,
how far Atom can be pushed in terms or performance/watt relative to
ARM is an important question.


Methinks, both irrelevant.

Quote:
At the risk of starting an AMD fanboy flame war, I wonder about the
relevance/usefulness/motivation of the "information" you have
provided.

What possible use is information from a 180 nm generation of Intel
chips unless, as I suspect, you are looking to support an Intel-will-
never-get-there bias (immovable object?) on your part?

What can I do with the fact that no serious chip maker makes Cortex-M3
at geometries finer than 180nm?
I would like to compare Cortex-M3 against Intel Banias, but Banias is
130nm.

Quote:
 That suspicion
is further deepened for me by your urging us to wait to look at Bobcat
for a real answer.  The suspicion is deepened still further by your
"Atom sucks at code not specifically compiled for it" potshot.  For
whatever it's worth, Fedora distributions are being compiled with Atom
as a target.


I think that it is wrong to compare in-order chip against OoO chip on
metrics that do not emphasis one of the more significant advantages of
OoO in the real world - it's ability to achieve decent result on
legacy binaries.

Quote:
In the end, I'm as clueless as I was about the prospects for Atom
relative to Arm as I was before you started your post, but it's clear
you don't like Intel.

Robert.

You are wrong. I like Intel. I like Dothan, Yonah, Merom and Penryn
cores. Nehalem core is not bad either, although I'd prefer Penryn core/
L2 coupled with Nehalem' awesome uncore stuff. I am a bit skeptical
about Sandy Bridge, but I would be glad if the real life numbers
disprove my skepticism.
But I don't like Atom. See nothing to like about it.
 
Robert Myers...
Posted: Wed Sep 29, 2010 2:44 am
 
On Sep 28, 9:59 pm, Michael S <already5cho... at (no spam) yahoo.com> wrote:
Quote:
On Sep 29, 2:15 am, Robert Myers <rbmyers... at (no spam) gmail.com> wrote:

On Sep 28, 5:02 pm, Michael S <already5cho... at (no spam) yahoo.com> wrote:

For anyone interested in trying to guess the future direction of HPC,
how far Atom can be pushed in terms or performance/watt relative to
ARM is an important question.

Methinks, both irrelevant.

That's a prediction in itself. At the moment, if you are not a bomb

lab and/or have no commitment to IBM as a matter of industrial policy,
you are buying Intel. I think that includes the sainted Hawking and
his acolytes at Cambridge.

Quote:
At the risk of starting an AMD fanboy flame war, I wonder about the
relevance/usefulness/motivation of the "information" you have
provided.

What possible use is information from a 180 nm generation of Intel
chips unless, as I suspect, you are looking to support an Intel-will-
never-get-there bias (immovable object?) on your part?

What can I do with the fact that no serious chip maker makes Cortex-M3
at geometries finer than 180nm?
I would like to compare Cortex-M3 against Intel Banias, but Banias is
130nm.

You could have done lots better or your posts aren't worth the

bother. Intel had no reason to care about power until 90nm. What's
next, you'll hold up 90nm P4 as evidence that ARM will win?

Quote:
 That suspicion
is further deepened for me by your urging us to wait to look at Bobcat
for a real answer.  The suspicion is deepened still further by your
"Atom sucks at code not specifically compiled for it" potshot.  For
whatever it's worth, Fedora distributions are being compiled with Atom
as a target.

I think that it is wrong to compare in-order chip against OoO chip on
metrics that do not emphasis one of the more significant advantages of
OoO in the real world - it's ability to achieve decent result on
legacy binaries.

Oh, dear God.


Legacy binaries.

Has Wall Street bought, not just politics, but computer architecture?

Quote:
In the end, I'm as clueless as I was about the prospects for Atom
relative to Arm as I was before you started your post, but it's clear
you don't like Intel.

You are wrong. I like Intel. I like Dothan, Yonah, Merom and Penryn
cores. Nehalem core is not bad either, although I'd prefer Penryn core/
L2 coupled with Nehalem' awesome uncore stuff. I am a bit skeptical
about Sandy Bridge, but I would be glad if the real life numbers
disprove my skepticism.
But I don't like Atom. See nothing to like about it.

I'm pretty clueless about Atom so far. Lots of heat. Very little
light. Don't see much light from your contribution, but I'll accept
your representation that you are not an AMD fanboy. You only look
like one. Lots of Opterons still running at the home office? Did it
affect your bonus?

Robert.
 
Brett Davis...
Posted: Wed Sep 29, 2010 5:15 am
 
In article
<91dd9482-5f49-42c0-a75b-36a472c8e3b1 at (no spam) m15g2000yqm.googlegroups.com>,
Robert Myers <rbmyersusa at (no spam) gmail.com> wrote:
Quote:
On Sep 28, 5:02 pm, Michael S <already5cho... at (no spam) yahoo.com> wrote:
On Sep 28, 6:48 pm, Stephen Fuld <SF... at (no spam) alumni.cmu.edu.invalid> wrote:

It seems to be the "conventional wisdom" that ARM has higher mips per
watt than X86.  This leads to several questions.

1.      Is it true?

Very likely. ARM Cortex-M3 datasheet promises 0.19 mW/MIPS at 180nm.
No actual M3 chip (at least from big vendors) comes close to that
figure. However there are several M3 chips that deliver ~1 mW/MIPS
which is much (at least 10x) better than the best 180nm x86.

I think, 2.5-3 years down the road we will be in better position for
comparison since then we will have AMD Bobcat vs ARM Cortex A15. Both
target more or less the same performance point and both feature
similarly capable OoO.

For anyone interested in trying to guess the future direction of HPC,
how far Atom can be pushed in terms or performance/watt relative to
ARM is an important question.

At the risk of starting an AMD fanboy flame war, I wonder about the
relevance/usefulness/motivation of the "information" you have
provided.

The ARM chip has no MMU, that alone likely accounts for a ~third
the difference. The huge x86 decoder costs you another ~third.
Intel/AMD indifference to the micro controller market another ~third.
(No designs for super low power.)

No conspiracy theories needed.

Bobcat will crush ARMs netbook ambitions like a zit.

All the PC folk will jump on Bobcat for their "iPad killers".
The phone hardware guys will just laugh at the PC folk for this.

Apple will NOT use Bobcat in any iPad ever.

The ARM instruction set was decidedly NOT designed for low power,
that just happened to be the only place they could make sales.
(Opcodes with three sources is not a good way to reduce power. ;)

ARM has no magic pixie dust, just a hand full of engineers.

Brett
 
Paul Gotch...
Posted: Wed Sep 29, 2010 12:03 pm
 
Robert Myers <rbmyersusa at (no spam) gmail.com> wrote:
Quote:
you are buying Intel. I think that includes the sainted Hawking and
his acolytes at Cambridge.

The current HPCS (which replaced the HPCF) at Cambridge is a large MPI
machine made out of Dell x86 boxen. However there are lots of people
using CUDA on Nvidia cards for more specialised compute requirements.

There is also CamGrid which is a Condor based distributed computing
service based on flocking lab machines and people's desktops into
clusters overnight.

Quote:
Oh, dear God.

Legacy binaries.

Has Wall Street bought, not just politics, but computer architecture?

I'm afraid being able to execute existing code (and code optimised for
previous microarchitectures) fast is extremely important. There are
various special cases, such as HPC, where people are willing and able
to recompile binaries to extract maximum performance but this is very
much the exception rather than the rule.

Intel has a problem with Atom here having lots of existing binaries
which execute perfectly well on a agressive OoO implementation but are
not performant on the in order Atom without being recompiled. ARM, on
the other hand, is going from binaries compiled and scheduled for
existing in order implementations and running on them on new out of
order implementations.

-p
--
Paul Gotch
--------------------------------------------------------------------
 
nedbrek...
Posted: Wed Sep 29, 2010 3:26 pm
 
Hello all,

"Stephen Fuld" <SFuld at (no spam) alumni.cmu.edu.invalid> wrote in message
news:i7t69e$ap$1 at (no spam) news.eternal-september.org...
Quote:
Is it the case that mips per watt is a function of mips? That is, it costs
more watts per mip to provide higher mips and the difference in mips per
watt is just an artifact of the different target market segments?

Yes, you can think of this in terms of information theory. Energy is a
function of work over time. If you decrease the amount of time to do the
same amount of work, it takes more energy. Overhead like speculation,
schedulers, buffers, pipeline latches, etc. add even more into this.

That is why mips/watt is a poor metric. It optimizes for min power (no
incentive to produce performance). You need to weight performance by 2 or 3
(mips^2/w, or mips^3/w).

Ned
 
Paul A. Clayton...
Posted: Wed Sep 29, 2010 4:03 pm
 
On Sep 29, 5:22 am, Scorpiion <robert.... at (no spam) gmail.com> wrote:
[snip]
Quote:
It would be interesting if you Brett, or someone else could explain a
little bit more about the statement above;

"Opcodes with three sources"

What does that mean? (I know what Opcodes are, I've done a very simple
CPU in VHDL, but it's the tree sources part I don't get).

I believe he is referring to the fact that classic ARM
instructions have three input operands: the condition code
register and two general purpose register inputs. (Of course,
Brett probably also dislikes requiring three encoded operands
as in MIPS--one destination register name and two source/input
register names--since code density can also improve power
efficiency and often the destination register name can be the
same as one of the source register names.)



Paul A. Clayton
just a technophile
 
jacko...
Posted: Wed Sep 29, 2010 5:47 pm
 
Quote:
Intel has a problem with Atom here having lots of existing binaries
which execute perfectly well on a agressive OoO implementation but are
not performant on the in order Atom without being recompiled. ARM, on
the other hand, is going from binaries compiled and scheduled for
existing in order implementations and running on them on new out of
order implementations.

What's so special about machine code. Just take the RTL of the code
(generator) and make a new compile of the code. This should be a
reasonably easy software product. Adding a checksum feature to the
binary loader, would be good too, to prevent infection via virus.
 
Scorpiion...
Posted: Wed Sep 29, 2010 6:18 pm
 
On Sep 29, 6:03 pm, "Paul A. Clayton" <paaronclay... at (no spam) gmail.com> wrote:
Quote:
On Sep 29, 5:22 am, Scorpiion <robert.... at (no spam) gmail.com> wrote:
[snip]

It would be interesting if you Brett, or someone else could explain a
little bit more about the statement above;

"Opcodes with three sources"

What does that mean? (I know what Opcodes are, I've done a very simple
CPU in VHDL, but it's the tree sources part I don't get).

I believe he is referring to the fact that classic ARM
instructions have three input operands: the condition code
register and two general purpose register inputs.  (Of course,
Brett probably also dislikes requiring three encoded operands
as in MIPS--one destination register name and two source/input
register names--since code density can also improve power
efficiency and often the destination register name can be the
same as one of the source register names.)

Paul A. Clayton
just a technophile

Okey, I see, thanks.
 
Robert Myers...
Posted: Wed Sep 29, 2010 7:09 pm
 
On Sep 29, 8:03 am, Paul Gotch <pa... at (no spam) at-cantab-dot.net> wrote:
Quote:
Robert Myers <rbmyers... at (no spam) gmail.com> wrote:
you are buying Intel.  I think that includes the sainted Hawking and
his acolytes at Cambridge.

The current HPCS (which replaced the HPCF) at Cambridge is a large MPI
machine made out of Dell x86 boxen. However there are lots of people
using CUDA on Nvidia cards for more specialised compute requirements.

There is also CamGrid which is a Condor based distributed computing
service based on flocking lab machines and people's desktops into
clusters overnight.

Emphasis in my post on the word *buying*

http://www.sgi.com/company_info/newsroom/press_releases/2010/august/cosmos.html

<quote>

FREMONT, Calif., and Reading, England — August 11, 2010 — SGI (NASDAQ:
SGI), a global leader in HPC and data center solutions, today
announced that the UK Computational Cosmology Consortium (COSMOS),
based at the University of Cambridge, has selected SGI® Altix® UV 1000
to support its research. Altix UV will help cosmologists answer
questions at the foundation of our understanding of how the universe
came to be, of what it is made, how it has evolved and what the future
holds.

</quote>

http://www.sgi.com/products/servers/altix/uv/


Quote:
Oh, dear God.
Legacy binaries.
Has Wall Street bought, not just politics, but computer architecture?

I'm afraid being able to execute existing code (and code optimised for
previous microarchitectures) fast is extremely important. There are
various special cases, such as HPC, where people are willing and able
to recompile binaries to extract maximum performance but this is very
much the exception rather than the rule.

Intel has a problem with Atom here having lots of existing binaries
which execute perfectly well on a agressive OoO implementation but are
not performant on the in order Atom without being recompiled. ARM, on
the other hand, is going from binaries compiled and scheduled for
existing in order implementations and running on them on new out of
order implementations.


So.

If you need the latest in cutting-edge performance (including
performance/watt), you bite the bullet and recompile.

If you want to run ancient binariers, you buy expensive, ancient
hardware or very expensive hardware designed to be binary-compatible
with expensive, ancient hardware. Selling such hardware is IBM's
schtick, not Intel's.

My strong suspicion with Intel and Atom is that there is no margin
there to be had. Once again, they are reaching for high margin
business (as they tried to with Itanium) and the market is pushing the
other way. Intel has the luxury of being some unknown mixture of
cagey and stubborn.

Given what came out of Haifa, the idea that Intel couldn't do whatever
is technologically possible with Atom if it wanted to defies belief.

Robert.
 
MitchAlsup...
Posted: Wed Sep 29, 2010 8:18 pm
 
On Sep 29, 12:24 pm, George Neuner <gneun... at (no spam) comcast.net> wrote:

Quote:
There is a lot of research being done on this now.   I'm not aware of
any real-world system that tries to minimize CPU watts through
scheduling other than to slow down/power down if all code is pending
on interrupts.  

There are cell phones out there that attempt to run at the lowest
possible battery voltage and operating frequency until some real time
schedule is not met and then up the voltage and frequency only to the
point that the real time schedule is met.

Mitch
 
MitchAlsup...
Posted: Wed Sep 29, 2010 8:21 pm
 
On Sep 29, 12:47 pm, jacko <jackokr... at (no spam) gmail.com> wrote:

Quote:
What's so special about machine code. Just take the RTL of the code
(generator) and make a new compile of the code. This should be a
reasonably easy software product. Adding a checksum feature to the
binary loader, would be good too, to prevent infection via virus.

Given a symbol table, what you say is easily possible.
Without one it is fraught with peril and disaster looming at every
turn.
Now consider how much software is shipped WITH a symbol table.

Mitch
 
Morten Reistad...
Posted: Wed Sep 29, 2010 8:51 pm
 
In article <i7v7q9$sh8$1 at (no spam) news.eternal-september.org>,
nedbrek <nedbrek at (no spam) yahoo.com> wrote:
Quote:
Hello all,

"Stephen Fuld" <SFuld at (no spam) alumni.cmu.edu.invalid> wrote in message
news:i7t69e$ap$1 at (no spam) news.eternal-september.org...
Is it the case that mips per watt is a function of mips? That is, it costs
more watts per mip to provide higher mips and the difference in mips per
watt is just an artifact of the different target market segments?

Yes, you can think of this in terms of information theory. Energy is a
function of work over time. If you decrease the amount of time to do the
same amount of work, it takes more energy. Overhead like speculation,
schedulers, buffers, pipeline latches, etc. add even more into this.

That is why mips/watt is a poor metric. It optimizes for min power (no
incentive to produce performance). You need to weight performance by 2 or 3
(mips^2/w, or mips^3/w).

For some special applications, like the Internet, the figure of mips
per watt is very interesting. There is a lower threshold, due to fragmentation
of work, but it is more like (log(mips)*mips) / watt.

This assumes the basic units of work fit in the cpu already. Most
internet server apps are designed to run quite nicely on a 50 mips
cpu for one unit of work.

Electric power is the single most limiting factor when building
large scale internet systems today.

-- mrr
 
 
Page 1 of 9    Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9  Next
All times are GMT
The time now is Wed Jul 23, 2014 3:26 am