 |
|
| Computers Forum Index » Computer Architecture » Cray SV-2... |
|
Page 1 of 1 |
|
| Author |
Message |
| Robert Myers... |
Posted: Sat Nov 07, 2009 4:05 am |
|
|
|
Guest
|
The Cray for which I remember IBM manufacturing the processor is the
Cray SV-2. The chip architecture is definitely Cray. The chip is
arranged in a four processor MCM that looks very IBM. Who, exactly,
did what is not clear to me.
I'm also not clear on what happened to the SV-2, which is very much
what I would design if GPGPU's weren't available. Phase-change
fluorinert spray cooling. I have a feeling that gigaflops/watt killed
that whole approach.
Robert. |
|
|
| Back to top |
|
|
|
| Robert Myers... |
Posted: Sat Nov 07, 2009 6:11 am |
|
|
|
Guest
|
On Nov 6, 11:05 pm, Robert Myers <rbmyers... at (no spam) gmail.com> wrote:
..
Quote:
I'm also not clear on what happened to the SV-2, which is very much
what I would design if GPGPU's weren't available. Phase-change
fluorinert spray cooling. I have a feeling that gigaflops/watt killed
that whole approach.
One of the explanations of a rationale for not supporting the SV-2 is
that it was "too high risk:"
https://128.55.6.34/nusers/NUG/meeting_info/Feb01/Simon.NUG2.01.ppt
Horst D. Simon
NERSC, Division Director
February 23, 2001
Future Technology Assessment
The same presentation declares that
"By 2003-4 a shared memory vector supercomputer will no longer be a
capability platform."
The same presentation contains the astonishing statement:
A supercomputer is a "stretched" high end server.
This bureaucratic usurpation of language could be fixed by moving the
quotes:
A "super"-computer is a stretched high end server [or, at least,
that's what we (the DoE) have been claiming for years].
The "risk" here and "capability" refer to the possibility that the DoE
will not stay at the top of the Top500 list and "capability" refers to
linpack flops, although the same presentation earlier dismisses Blue
Gene as "not general purpose."
Robert. |
|
|
| Back to top |
|
|
|
| Robert Myers... |
Posted: Sat Nov 07, 2009 11:56 pm |
|
|
|
Guest
|
On Nov 7, 9:03 am, Thomas Womack
Quote: Heavily-banked RAMBUS memory and that aggressive cooling, but it came
out at what I think was the absolute high point of d(log x86
performance)/dt; I have the strong impression that very fast
computers, extracting full performance from which is a full-time job,
fit badly into current management structures where people are expected
to work on projects rather than on inner loops.
Thanks for that perspective.
If X-1 was perceived as hard to program, that would have killed it.
The Cray programmers of my generation complained that the parallelism
of massive clusters was hard to exploit, so I suppose that turnabout
was fair play. X-1 would have arrived, not only as x86 was really
starting to show its stuff, but as the Cray programmers were well
outnumbered by MPI programmers.
The seeds of disaster could have been seen in the SV-2 presentation:
codes had been rewritten to be cache-blocking friendly, which meant
lots of rather small two-dimensional arrays, with which would have
been very hard to use to keep the massive vector pipes of the SV-2
busy.
Cray's solution was to let the vector pipes be arbitrarily reassigned
so that they could work on many relatively small tasks or one or a few
big tasks. I'm having a hard time keeping up with Andy's acronyms,
but I'm sure one of them would have fit what they were proposing. As
I understand it, the idea would have been not to optimize inner loops
(which Cray programmers already knew how to do in their sleep), but to
take one long outer loop with many inner loops and do the inner loops
in parallel. In a bit more complicated trick, you could combine inner
loops from other outer loops, so long as the data-dependency allowed
it. To me, it looks like instruction window parallelism operating in
an unusually cavalier way on collections of inner loops, which are now
the "instructions."
If you have ready-made infrastructure for that sort of trickery (as
GPGPU's increasingly have, both hardware and software), it's one
thing. To expect a new generation of programmers used to something
else entirely to get it right away so you can sell your product is
quite another.
Robert. |
|
|
| Back to top |
|
|
|
| Robert Myers... |
Posted: Sun Nov 08, 2009 4:39 am |
|
|
|
Guest
|
On Nov 7, 10:46 pm, "Andy \"Krazy\" Glew" <ag-n... at (no spam) patten-glew.net>
wrote:
Quote: Robert Myers wrote:
Cray's solution was to let the vector pipes be arbitrarily reassigned
so that they could work on many relatively small tasks or one or a few
big tasks. I'm having a hard time keeping up with Andy's acronyms,
but I'm sure one of them would have fit what they were proposing. As
I understand it, the idea would have been not to optimize inner loops
(which Cray programmers already knew how to do in their sleep), but to
take one long outer loop with many inner loops and do the inner loops
in parallel. In a bit more complicated trick, you could combine inner
loops from other outer loops, so long as the data-dependency allowed
it. To me, it looks like instruction window parallelism operating in
an unusually cavalier way on collections of inner loops, which are now
the "instructions."
If you have ready-made infrastructure for that sort of trickery (as
GPGPU's increasingly have, both hardware and software), it's one
thing. To expect a new generation of programmers used to something
else entirely to get it right away so you can sell your product is
quite another.
Robert.
I wasn't aware that Cray was doing much beyond "standard" vectors.
References? User manuals?
http://www.ukhec.ac.uk/events/annual2002/carruthers.pdf, pp. 8-9.
I remember something even more aggressive from the video, but maybe my
memory is playing tricks.
Robert. |
|
|
| Back to top |
|
|
|
| Andy \"Krazy\" Glew... |
Posted: Sun Nov 08, 2009 6:15 am |
|
|
|
Guest
|
Robert Myers wrote:
Quote: Cray's solution was to let the vector pipes be arbitrarily reassigned
so that they could work on many relatively small tasks or one or a few
big tasks. I'm having a hard time keeping up with Andy's acronyms,
but I'm sure one of them would have fit what they were proposing. As
I understand it, the idea would have been not to optimize inner loops
(which Cray programmers already knew how to do in their sleep), but to
take one long outer loop with many inner loops and do the inner loops
in parallel. In a bit more complicated trick, you could combine inner
loops from other outer loops, so long as the data-dependency allowed
it. To me, it looks like instruction window parallelism operating in
an unusually cavalier way on collections of inner loops, which are now
the "instructions."
If you have ready-made infrastructure for that sort of trickery (as
GPGPU's increasingly have, both hardware and software), it's one
thing. To expect a new generation of programmers used to something
else entirely to get it right away so you can sell your product is
quite another.
Robert.
I wasn't aware that Cray was doing much beyond "standard" vectors.
References? User manuals?
By the way, this is another way of looking at the GPGPU architectures.
They look very familiar to an old vector programmer (like me), but they
have this twist that makes them different - and IMHO easier to exploit.
I can see how you can apply some of the SIMT/CoherentVectorLaneThreading
techniques as a compiler transform on a vector machine. But I am not aware
of anyone who did this. I am, interestingly, aware of people who applied
exactly this technique to do stuff like running 32 simultaneous RTL simulations
in a 32 bit machine - each running in a different bit position of a 32 bit word.
RTL simulation is a special case that is especially well suited to SIMD
- classic single assignment RTL has no divergent control flow.
Having been aware of this is one of the things that primed me to see
what was new about SIMD - just a bit of hardware support to make it
run more efficiently. |
|
|
| Back to top |
|
|
|
|
|
All times are GMT
The time now is Wed Dec 02, 2009 4:37 am
|
|