Main Page | Report this Page
Computers Forum Index  »  Computer Architecture  »  Is it time to stop research in Computer Architecture ?...
Page 10 of 10    Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Is it time to stop research in Computer Architecture ?...

Author Message
Gavin Scott...
Posted: Mon Oct 26, 2009 10:08 pm
Guest
nmm1 at (no spam) cam.ac.uk wrote:
Quote:
The other problem with even native IA64 code was reliability. The
skill needed to track down problems that might be code generation
bugs or obscure race conditions was MUCH greater than that needed
for other systems. So a lot of code was very unreliable.

At least under HP-UX this was not my experience. I've seen no more
compiler related issues under Itanium than I did under PA-RISC, though
we didn't really start doing significant work with it until customers
actually started having the machines in production so there were
probably more problems early on.

These days Itanium under HP-UX seems to be as stable and reliable a
development platform as any other.

Quote:
I should be interested if anyone used desktop GUIs and applications
compiled natively, to know what they thought. The HPC people were
distinctly unhappy - SGI got the Altix going, but several owners
of 'ordinary' Linux Itanium boxes turned them off as unsupportable
for an amount of effort that made them cost-effective.

The Itanium Linux project was reasonably well supported by HP, but
it didn't have anyhting like the mindshare it needed to get done well
and quickly, so IIRC it was mostly just a few guys in HP having to
figure out how to re-invent everything without very much help from
the Linux community at large.

G.
 
Tom...
Posted: Mon Oct 26, 2009 10:25 pm
Guest
"Andy \"Krazy\" Glew" <ag-news at (no spam) patten-glew.net> wrote in
news:4AE3A728.3040001 at (no spam) patten-glew.net:

Quote:
nmm1 at (no spam) cam.ac.uk wrote:
In article <4AE12FA9.1000706 at (no spam) patten-glew.net>,
Andy \"Krazy\" Glew <ag-news at (no spam) patten-glew.net> wrote:
Robert Myers wrote:

I am not aware of an Itanium shipped or proposed that had an "x86 core
on the side".

I am. I can't say how far the proposal got - it may never have got
beyond the back of the envelope stage, and then got flattened as a
project by management.

I am reasonably certain that you are misremembering or misunderstanding
a presentation that may have oversimplified things.

I worked in HP Labs (but nowhere near the Itanium) when
such an announcement was made. My reaction was "What?
WHAT? Bloody hell!"

It was about that time that I really became disenchanted
with the Itanium and made the decision that I would
consciously ignore it.
 
Gavin Scott...
Posted: Mon Oct 26, 2009 10:33 pm
Guest
"Andy \"Krazy\" Glew" <ag-news at (no spam) patten-glew.net> wrote:
Quote:
Perhaps the problem is terminology: to me, and to people at Intel, "x86
on the side" means that you have an x86 core right next to the Itanium
core, connected at the bus, or perhaps the L1 or L2 cache.

Yes, I think we're all talking about the same thing, but being outside
the implementaiton world the difference between the "x86 on the side"
and the "x86 front end" is easy to miss/ignore.

G.
 
Gavin Scott...
Posted: Mon Oct 26, 2009 10:49 pm
Guest
Stephen Sprunk <stephen at (no spam) sprunk.org> wrote:
Quote:
IIRC, HP's Dynamo, a run-time PA-RISC to PA-RISC binary translator, was
able to achieve a significant performance gain over just running the
binary natively. They also ported it to x86 under the name DynamoRIO,
though I can't recall the performance results.

DEC's binary translator (FX32?) made the Alpha, for a few months at
least, the fastest "x86" machine in existence.

MS's .NET stuff seems to run pretty fast, especially when you consider
all the safety checks that it always does which are usually left out of
native C/C++ binaries, garbage collection, etc. Some tests show that
it's actually _faster_ than native code, once you subtract the start-up
time hit.

Apple seems to be committed to their LLVM stuff, which is already
reaping huge graphics performance gains. Rosetta was pretty darn fast,
enough to be useful and transparent, though not quite as fast as native
code.

Run-time translation seems to have a lot of success stories; the
miserable failures of Transmeta and Intel may just be well-publicized
anomalies.

Just to add to the list, when HP migrated their "Classic" stack-based
architecture that ran their HP-3000 servers of the day onto PA-RISC,
they provided very good binary compatibility via both static translation
and run-time emulation (neither really counts as dynamic translation
though).

I've told this story before, but the run-time emulator for the 16-bit
classic architecture consisted of hand-coded PA-RISC that managed to
do the emulation in something like an average of 7 PA-RISC instructions
per classic instruction. One of the engineers suggested to me that
they suspected there were cases where the emulated 16-bit code might
run faster than native PA-RISC due to the fact that the smaller 16-
bit code plus the emulator could entirely fit into the cache where
the 32-bit PA-RISC code might not.

The static translation version eliminated the instruction decode
step and could omit things like condition code generation when it
was not needed. The translated code was much larger than the original
though and got tacked onto the end of the original bianry file.

In any case it was very impressive and I can't help but think that
part of the problem with PA-RISC -> Itanium translation was that
they just didn't mange to get a small enough team of clever enough
people assigned to it.

G.
 
Gavin Scott...
Posted: Tue Oct 27, 2009 12:27 am
Guest
Terje Mathisen <terje.wiig.mathisen at (no spam) gmail.com> wrote:
Quote:
IA64 seemed to have a close to complete superset of all PA-RISC
features/instructions, including some very funky address shift/
combination operations specifically claimed to be there to support PS-
RISC features.

Yes, and the virtual addressing scheme is practically identical to
PA-RISC 2.0.

There are some differences, like PA-RISC offering both pre- and post-
increment on memory ops, but it's the sort of thing translation should
not have had a big problem with.

Someone in HP did once say to me that they had changed the PA-RISC
compilers several years earlier to (allegedly) start scheduling for
optimal execution on Itanium (without telling anyone) long before
the Itanium sytems became available, which is a clever trick.

Quote:
The register set was so much larger that it could be mapped
statically.

The low 32 non-rotating registers map pretty well to the PA-RISC
register set, but again some oddities like gr0 being read-only on
IPF (IIRC anyway).

I suspect that dealing with corner cases caused them more grief
than they expected, and the cost of switching between translated
code and handling exceptions that required falling back to
emulation of some sort might have been an issue, but that's just
speculation and I've never really tried to go look at what it's
doing.

G.
 
Andy \"Krazy\" Glew...
Posted: Tue Oct 27, 2009 5:15 am
Guest
Bernd Paysan wrote:

Quote:
Sidenote about reversible computing: Often, people use a "history" to
make storing values in memory reversible. However, creating information
is just as costly as destroying it, so creating a history has no energy
benefit over replacing old information in a memory cell. The only
energy efficient reversible memory operation is to swap memory and
register values.

This has always been where I get annoyed by theorizing about reversible
computation.

Seems to me that you cannot do reversible computation for workloads
where you are streaming inputs; where the inputs exceed any reasonable
memory size. Since that is one of the most interesting classes of
computation, ipso facto not that interesting.

The counter argument is usually that the irreversible aspects of
compuation are related to communication: e.g. on a space probe where you
are beaming something back to earth.

Nevertheless, I don't see why we have to go for full irreversibility, if
by taking advantage of such techniuqes we can acheive a good fraction,
albeit still much worse than theoretically possible.
 
Robert Myers...
Posted: Tue Oct 27, 2009 9:20 am
Guest
On Oct 25, 5:26 pm, Bernd Paysan <bernd.pay... at (no spam) gmx.de> wrote:

Quote:
Sidenote about reversible computing: Often, people use a "history" to
make storing values in memory reversible. However, creating information
is just as costly as destroying it, so creating a history has no energy
benefit over replacing old information in a memory cell.  The only
energy efficient reversible memory operation is to swap memory and
register values.

This problem does seem fundamental to me at my current level of

understanding, and the history tape argument seems fundamentally
flawed. Sooner or later, you must overwrite something and throw it
away. When you do that, the process is no longer reversible and you
incur an irreversible energy cost. If at no other time, you will
incur that cost when you write your backup tape of the history to the
incoherent, "classical" world.

I'm sure that there are others who have thought about this much more
carefully than I have, but, if there is a way around it, I'm not
seeing it at the moment.

Robert.
 
Torbjorn Lindgren...
Posted: Tue Oct 27, 2009 11:53 am
Guest
Terje Mathisen <terje.wiig.mathisen at (no spam) gmail.com> wrote:
Quote:
On Oct 23, 1:45 pm, Mayan Moudgill <ma... at (no spam) bestweb.net> wrote:
Andy "Krazy" Glew wrote:
Problem is with the standard. H.264 specifies that the frame is CABAC
encoded.

Not quite:

H.264 defines two alternate encoding schemes, of which CABAC gets the
better compression, but it is fully compliant to use the other (I
don't remember the name of it) if the encoder wants to.

CAVLC. Supposedly CABAC uses about 15% less bits than CAVLC, which in
turn is more efficient than what's used in regular MPEG4.


Quote:
However, since a decoder has to be able to handle CABAC as well, that
limits the maximum bitrate that you can support in sw.

CABAC decode support is required in most profiles but not CBP, BP and
XP.

Most stuff is Main or High profile though, but there are devices which
doesn't support MP or HP (nor CABAC).
 
Torbjorn Lindgren...
Posted: Tue Oct 27, 2009 12:35 pm
Guest
Terje Mathisen <Terje.Mathisen at (no spam) tmsw.no> wrote:
Quote:
Andy "Krazy" Glew wrote:
For example: divide the image up into subblocks, and run CABAC on each
subblock in parallel. To obtain similar compression ratios you would

This is the only silver lining: Possibly due to the fact that they were
working on PS3 at the time, Sony specified that Bluray frames are all
split into 4 independent quadrants, which means that they could
trivially split the job across four of the 7 or 8 cell cores.

I'm reliably informed that it's a bit more complicated than that.
Also, I'm not sure if the slices are affect the (CABAC) bit stream, my
impression was that it did not but I'm no expert.

Apparently for h.264 Blu-ray supports High Profile Layer 4.1, EXCEPT
with at least 4 slices (special restrictions) and lower maximum
bitrate (which then gives smaller VBV buffer space and VBV max
instantaneous speed) than real Layer 4.1. There's also very strict
limitations on distance between keyframes also.

Or High Profile Layer 4.0. This does have a significatly lower max
bitrate, but x264 can cram a LOT of information into that with good
settings (apparently much more than very expensive h.264 encoders) and
the difference is smaller than it would normally be (because Blu-ray
doesn't allow full 4.1 bitrate anyway). It has the same restrictons on
keyframes (they're relaxed slightly for Level 3.1 or lower).

I think recent versions of x264 actually supports slices now and that
the compression hit was very small, but on the other hand some of the
devs thought you usually were actually better of encoding in L4.0
instead for Blu-ray on x264.

But it's pretty common to see hardware decoders handle Level 4.0, but
4.1 for Blu-ray. This usually means it can handle the Blu-ray subset
but not the full bitrate, but note that they usually don't care if the
video is sliced or not.


Quote:
This also reduced the size of each subframe, in 1080i, to 256 K pixels. Smile

I'm told that they stopped decoding slices separately on the PS/3
before launch, apparently it was more efficient even on that hardware
to work on multiple frames in parallel instead. Especially since they
needed to handle non-sliced L4.0 anyway, though at a lower bit-rate.

I don't have a PS/3 but several people say it shows high-speed L4.1
non-sliced (but otherwise Blu-ray spec'd) h.264 video without problem,
but it's always possible that it works on most but not all video.
 
Niels Jørgen Kruse...
Posted: Thu Oct 29, 2009 5:59 pm
Guest
Andy "Krazy" Glew <ag-news at (no spam) patten-glew.net> wrote:

Quote:
Heck: Willamette / Pentium 4 was brought to you by peopled who thought
OOO was a bad idea. The original concept was anti-OOO. They were
forced to implement OOO, badly, because the anti-OOO approach did not fly.

News to me. I suppose NDAs have expired.

--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark
 
Peter Grandi...
Posted: Mon Nov 09, 2009 3:37 am
Guest
[ ... ]

Quote:
If I make the following assumptions:
* Transistors are free
* But power is the most important thing
* and you have a lot of parallelism
as is true of some supercomputer workloads (and even virtual
reality graphics for the home)

Then I think that you can reasonably extrapolate that the best
computer architecture is MIMD, with the simplest possible,
non-pipelined, blocking on a cache miss, processor cores.

That's the idea behind Hewitt's (and lately Agha's) "actor
model" and "garbage collection of processes" approach...
 
 
Page 10 of 10    Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8, 9, 10
All times are GMT
The time now is Wed Dec 02, 2009 5:23 am