If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#31
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
On Mon, 13 Nov 2006 21:45:26 -0600, "Del Cecchi"
wrote: "George Macdonald" wrote in message news On Mon, 13 Nov 2006 08:23:31 -0600, Del Cecchi wrote: George Macdonald wrote: Did you not notice "high capability"? "Pick a processor" is not going to get you that. I haven't seen Del's announcement since I don't take comp.arch. You could check for "IBM System Cluster 1350" on IBM's web site http://www-03.ibm.com/systems/cluste...ware/1350.html if you are interested. I guess I don't understand what you mean by "high capability". Not sure where I got the "high" from:-) but "capability" and "capacity" seem to be used to contrast the two (extreme point) types of supercomputers in articles he http://www.hpcuserforum.com/events/. -- Rgds, George Macdonald I didn't see those terms in a quick scan but presumably capability refers to "big uniprocessors" like cray vector machines (I know they aren't really uniprocessors these days). I think this niche has largely been filled by machines like Blue Gene or other clusters. Capacity machines are just things like what google or yahoo have--a warehouse full of servers. So infact the Cluster 1350 is a Capability machine. SETI at home is a capacity machine. From the last useful Meeting Bulletin, admittedly a while back in April 2001, the sense I get is that anything built out of COTS, tightly coupled or not, is/was considered "capacity" when compared with Crays and others with specialized processors. Interestingly, the guy from Ford was one pushing the need for "capability". -- Rgds, George Macdonald |
#32
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
George Macdonald wrote:
On Mon, 13 Nov 2006 21:45:26 -0600, "Del Cecchi" wrote: "George Macdonald" wrote in message news On Mon, 13 Nov 2006 08:23:31 -0600, Del Cecchi wrote: George Macdonald wrote: Did you not notice "high capability"? "Pick a processor" is not going to get you that. I haven't seen Del's announcement since I don't take comp.arch. You could check for "IBM System Cluster 1350" on IBM's web site http://www-03.ibm.com/systems/cluste...ware/1350.html if you are interested. I guess I don't understand what you mean by "high capability". Not sure where I got the "high" from:-) but "capability" and "capacity" seem to be used to contrast the two (extreme point) types of supercomputers in articles he http://www.hpcuserforum.com/events/. -- Rgds, George Macdonald I didn't see those terms in a quick scan but presumably capability refers to "big uniprocessors" like cray vector machines (I know they aren't really uniprocessors these days). I think this niche has largely been filled by machines like Blue Gene or other clusters. Capacity machines are just things like what google or yahoo have--a warehouse full of servers. So infact the Cluster 1350 is a Capability machine. SETI at home is a capacity machine. From the last useful Meeting Bulletin, admittedly a while back in April 2001, the sense I get is that anything built out of COTS, tightly coupled or not, is/was considered "capacity" when compared with Crays and others with specialized processors. Interestingly, the guy from Ford was one pushing the need for "capability". Cool. Are there any "capability" machines left in the top500? -- Del Cecchi "This post is my own and doesn’t necessarily represent IBM’s positions, strategies or opinions.” |
#33
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
On Wed, 15 Nov 2006 10:01:39 -0600, Del Cecchi
wrote: George Macdonald wrote: On Mon, 13 Nov 2006 21:45:26 -0600, "Del Cecchi" wrote: "George Macdonald" wrote in message news On Mon, 13 Nov 2006 08:23:31 -0600, Del Cecchi wrote: George Macdonald wrote: Did you not notice "high capability"? "Pick a processor" is not going to get you that. I haven't seen Del's announcement since I don't take comp.arch. You could check for "IBM System Cluster 1350" on IBM's web site http://www-03.ibm.com/systems/cluste...ware/1350.html if you are interested. I guess I don't understand what you mean by "high capability". Not sure where I got the "high" from:-) but "capability" and "capacity" seem to be used to contrast the two (extreme point) types of supercomputers in articles he http://www.hpcuserforum.com/events/. -- Rgds, George Macdonald I didn't see those terms in a quick scan but presumably capability refers to "big uniprocessors" like cray vector machines (I know they aren't really uniprocessors these days). I think this niche has largely been filled by machines like Blue Gene or other clusters. Capacity machines are just things like what google or yahoo have--a warehouse full of servers. So infact the Cluster 1350 is a Capability machine. SETI at home is a capacity machine. From the last useful Meeting Bulletin, admittedly a while back in April 2001, the sense I get is that anything built out of COTS, tightly coupled or not, is/was considered "capacity" when compared with Crays and others with specialized processors. Interestingly, the guy from Ford was one pushing the need for "capability". Cool. Are there any "capability" machines left in the top500? Is that like the Billboard "Hot 100"... but for computers?:-) Yeah it's true that much progress has been made in COTS since 2001 so maybe that is the future? -- Rgds, George Macdonald |
#34
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
"George Macdonald" wrote in message ... On Wed, 15 Nov 2006 10:01:39 -0600, Del Cecchi wrote: George Macdonald wrote: On Mon, 13 Nov 2006 21:45:26 -0600, "Del Cecchi" wrote: "George Macdonald" wrote in message news On Mon, 13 Nov 2006 08:23:31 -0600, Del Cecchi wrote: George Macdonald wrote: Did you not notice "high capability"? "Pick a processor" is not going to get you that. I haven't seen Del's announcement since I don't take comp.arch. You could check for "IBM System Cluster 1350" on IBM's web site http://www-03.ibm.com/systems/cluste...ware/1350.html if you are interested. I guess I don't understand what you mean by "high capability". Not sure where I got the "high" from:-) but "capability" and "capacity" seem to be used to contrast the two (extreme point) types of supercomputers in articles he http://www.hpcuserforum.com/events/. -- Rgds, George Macdonald I didn't see those terms in a quick scan but presumably capability refers to "big uniprocessors" like cray vector machines (I know they aren't really uniprocessors these days). I think this niche has largely been filled by machines like Blue Gene or other clusters. Capacity machines are just things like what google or yahoo have--a warehouse full of servers. So infact the Cluster 1350 is a Capability machine. SETI at home is a capacity machine. From the last useful Meeting Bulletin, admittedly a while back in April 2001, the sense I get is that anything built out of COTS, tightly coupled or not, is/was considered "capacity" when compared with Crays and others with specialized processors. Interestingly, the guy from Ford was one pushing the need for "capability". Cool. Are there any "capability" machines left in the top500? Is that like the Billboard "Hot 100"... but for computers?:-) Yeah it's true that much progress has been made in COTS since 2001 so maybe that is the future? -- Rgds, George Macdonald Blue gene is a network of processors, but not exactly COTS. 240 Teraflops. Number one. That a capacity machine? |
#35
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
Del Cecchi wrote:
Blue gene is a network of processors, but not exactly COTS. 240 Teraflops. Number one. That a capacity machine? Linpack flops isn't the only measure of performance that matters. It's not sensitive to bisection bandwidth, and low bisection bandwidth forces a particular approach to numerical analysis. What the whiz kids at LLNL don't seem to get is that localized approximations will _always_ get the problem wrong for strongly nonlinear problems, because localized differencing invariably introduces an artificial renormalization: very good for getting nice-looking but incorrect answers. I've discussed this extensively with the one poster to comp.arch who seems to understand strongly nonlinear systems and he knows exactly what I'm saying. He won't go public because the IBM/National Labs juggernaut represents a fair slice of the non-academic jobs that might be open to him. The limitations of localized differencing may not be an issue for the class of problem that LLNL needs to do, but ultimately, you can't fool mother nature. The bisection bandwidth problem shows up in the poor performance of Blue Gene on FFT's. My fear about Blue Gene is that it will perpetuate a kind of analysis that works well for (say) routine structural analysis, but very poorly for the grand problems of physics (for example, turbulence and and strongly-interacting systems). As I'm sure you will say, if you've got enough bucks, you can buy all the bisection bandwidth you need. As it is, though, all the money right now is going into linpack-capable machines that will never make progress on the interesting problems of physics. It's a grand exercise in self-deception. Robert. |
#36
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
"Robert Myers" wrote in message ps.com... Del Cecchi wrote: Blue gene is a network of processors, but not exactly COTS. 240 Teraflops. Number one. That a capacity machine? Linpack flops isn't the only measure of performance that matters. It's not sensitive to bisection bandwidth, and low bisection bandwidth forces a particular approach to numerical analysis. What the whiz kids at LLNL don't seem to get is that localized approximations will _always_ get the problem wrong for strongly nonlinear problems, because localized differencing invariably introduces an artificial renormalization: very good for getting nice-looking but incorrect answers. I've discussed this extensively with the one poster to comp.arch who seems to understand strongly nonlinear systems and he knows exactly what I'm saying. He won't go public because the IBM/National Labs juggernaut represents a fair slice of the non-academic jobs that might be open to him. The limitations of localized differencing may not be an issue for the class of problem that LLNL needs to do, but ultimately, you can't fool mother nature. The bisection bandwidth problem shows up in the poor performance of Blue Gene on FFT's. My fear about Blue Gene is that it will perpetuate a kind of analysis that works well for (say) routine structural analysis, but very poorly for the grand problems of physics (for example, turbulence and and strongly-interacting systems). As I'm sure you will say, if you've got enough bucks, you can buy all the bisection bandwidth you need. As it is, though, all the money right now is going into linpack-capable machines that will never make progress on the interesting problems of physics. It's a grand exercise in self-deception. Robert. Well the Cluster 1350 has a pretty good network available, if the Blue Gene one isn't good enough. And Blue Gene was really designed for a few particular problems, not just Linpack. But the range of problems it is applicable to seems to be reasonably wide. And are the "interesting problems in Physics" something that folks are willing to spend reasonable amounts of money on, like the money spent on accelerators and nutrino detectors etc? And do they agree as to the kind of computer needed? Do you like the new Opteron/Cell Hybrid better? Throwing rocks is easy. How about specific suggestions? del |
#37
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
Del Cecchi wrote:
Do you like the new Opteron/Cell Hybrid better? Throwing rocks is easy. How about specific suggestions? I had really hoped to get out of the rock-throwing business. My criticism really isn't of IBM, which is apparently only giving the most important customer what it wants. The most important customer lost interest in science a long time ago, so maybe it doesn't matter that the machines it buys aren't good science machines. I'm sure that a good science machine can be built within the parameters of cluster 1350, and asking how you might go about that would be an interesting exercise. Sure. The Opteron/Copressor hybrid sounds good. All that's left to engineer is the network. Were it up to me, I'd optimize it to do FFT and Matrix transpose. If you can do those two operations efficiently, you can do an awful lot of very interesting physics. The money just isn't there for basic science right now. It isn't IBM's job to underwrite science or to try to get the government to buy machines that it apparently doesn't want. The bisection bandwidth of Blue Gene is millibytes per flop. That's apparently not a problem for some customers, but there is a big slice of important physics that you can't do correctly or efficiently with a machine like that. Robert. |
#38
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
Robert Myers wrote:
Del Cecchi wrote: Do you like the new Opteron/Cell Hybrid better? Throwing rocks is easy. How about specific suggestions? I had really hoped to get out of the rock-throwing business. My criticism really isn't of IBM, which is apparently only giving the most important customer what it wants. The most important customer lost interest in science a long time ago, so maybe it doesn't matter that the machines it buys aren't good science machines. I'm sure that a good science machine can be built within the parameters of cluster 1350, and asking how you might go about that would be an interesting exercise. Sure. The Opteron/Copressor hybrid sounds good. All that's left to engineer is the network. Were it up to me, I'd optimize it to do FFT and Matrix transpose. If you can do those two operations efficiently, you can do an awful lot of very interesting physics. The money just isn't there for basic science right now. It isn't IBM's job to underwrite science or to try to get the government to buy machines that it apparently doesn't want. The bisection bandwidth of Blue Gene is millibytes per flop. That's apparently not a problem for some customers, but there is a big slice of important physics that you can't do correctly or efficiently with a machine like that. Robert. Is BiSection bandwidth really a valid metric for very large clusters? It seems to me that it can be made arbitrarily small by configuring a large enough group of processors, since each processor has a finite number of links. For example a 2D mesh with nearest neighbor connectivity has a bisection bandwidth that grows as the square root of the number of processors. But the flops grow as the number of processors. So the bandwidth per flop decreases with the square root of the number of processors. I can't think of why this wouldn't apply in general but don't claim that it is true. It just seems so to me (although the rate of decrease wouldn't necessarily be square root) Apparently no one with money is interested in solving these special problems for which clusters are not good enough. See SSI and steve Chen, history of. -- Del Cecchi "This post is my own and doesn’t necessarily represent IBM’s positions, strategies or opinions.” |
#39
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
Del Cecchi wrote:
Is BiSection bandwidth really a valid metric for very large clusters? Yes, if you want to do FFT's, or, indeed, any kind of non-local differencing. It seems to me that it can be made arbitrarily small by configuring a large enough group of processors, since each processor has a finite number of links. For example a 2D mesh with nearest neighbor connectivity has a bisection bandwidth that grows as the square root of the number of processors. But the flops grow as the number of processors. So the bandwidth per flop decreases with the square root of the number of processors. That's the problem with the architecture and why I howled so loudly when it came out. Naturally, I was ridiculed by people whose entire knowledge of computer architecture is nearest neighbor clusters. Someone in New Mexico (LANL or Sandia, I don't want to dredge up the presentation again) understands the numbers as well as I do. The bisection bandwidth is a problem for a place like NCAR, which uses pseudospectral techniques, as do most global atmospheric simulations. The projected efficiency of Red Storm for FFT's was 25%. The efficiency of Japan's Earth Simulator is at least several times that for FFT's. No big deal. It was designed for Geophysical simulations, Blue Gene at Livermore was bought to produce the plots the Lab needed to justify its own existence (and not to do science). As you have correctly inferred, the more processors you hang off the nearest-neighbor network, the worse the situation becomes. I can't think of why this wouldn't apply in general but don't claim that it is true. It just seems so to me (although the rate of decrease wouldn't necessarily be square root) Unless you increase the aggregate bandwidth, you reach a point of diminishing returns. The special nature of Linpack has allowed unimaginative bureacrats to make a career out of buying and touting very limited machines that are the very opposite of being scalable. "Scalability" does not mean more processors or real estate. It means the ability to use the millionth processor as effectively as you use the 65th. Genuine scalability is hard, which is why no one is really bothering with it. Apparently no one with money is interested in solving these special problems for which clusters are not good enough. See SSI and steve Chen, history of. The problems aren't as special as you think. In fact, the glaring problem that I've pointed out with machines that rely on local differencing isn't agenda or marketing driven, it's an unavoidable mathematical fact. As things stand now, we will have ever more transistors chuffing away on generating ever-less reliable results. The problem is this: if you use a sufficiently low-order differencing scheme, you can do most of the problems of mathematical physics on a box like Blue Gene. Low order schemes are easy to code, undemanding with regard to non-local bandwidth, and usually much more stable than very high-order schemes. If you want to figure out how to place an air-conditioner, they're just fine. If you're trying to do physics, the plots you produce will be plausible and beautiful, but very often wrong. There is an out that, in fairness, I should mention. If you have processors to burn, you can always overresolve the problem to the point where the renormalization problem I've mentioned, while still there, becomes unimportant. Early results by the biggest ego in the field at the time suggested that it takes about ten times the resolution to do fluid mechanics with local differencing as accurately as you can do it with a pseudospectral scheme. In 3-D, that's a thousand times more processors. For fair comparison, the number of processors in Livermore box would be divided by 1000 to get equivalent performance to a box that could do a decent FFT. Should be posting to comp.arch so people there can switch from being experts on computer architecture to being experts on numerical analysis and mathematical physics. Robert. |
#40
|
|||
|
|||
Another AMD supercomputer, 13,000 quad-core
"Robert Myers" wrote in message oups.com... Del Cecchi wrote: Is BiSection bandwidth really a valid metric for very large clusters? Yes, if you want to do FFT's, or, indeed, any kind of non-local differencing. It seems to me that it can be made arbitrarily small by configuring a large enough group of processors, since each processor has a finite number of links. For example a 2D mesh with nearest neighbor connectivity has a bisection bandwidth that grows as the square root of the number of processors. But the flops grow as the number of processors. So the bandwidth per flop decreases with the square root of the number of processors. That's the problem with the architecture and why I howled so loudly when it came out. Naturally, I was ridiculed by people whose entire knowledge of computer architecture is nearest neighbor clusters. Someone in New Mexico (LANL or Sandia, I don't want to dredge up the presentation again) understands the numbers as well as I do. The bisection bandwidth is a problem for a place like NCAR, which uses pseudospectral techniques, as do most global atmospheric simulations. The projected efficiency of Red Storm for FFT's was 25%. The efficiency of Japan's Earth Simulator is at least several times that for FFT's. No big deal. It was designed for Geophysical simulations, Blue Gene at Livermore was bought to produce the plots the Lab needed to justify its own existence (and not to do science). As you have correctly inferred, the more processors you hang off the nearest-neighbor network, the worse the situation becomes. I can't think of why this wouldn't apply in general but don't claim that it is true. It just seems so to me (although the rate of decrease wouldn't necessarily be square root) Unless you increase the aggregate bandwidth, you reach a point of diminishing returns. The special nature of Linpack has allowed unimaginative bureacrats to make a career out of buying and touting very limited machines that are the very opposite of being scalable. "Scalability" does not mean more processors or real estate. It means the ability to use the millionth processor as effectively as you use the 65th. Genuine scalability is hard, which is why no one is really bothering with it. Apparently no one with money is interested in solving these special problems for which clusters are not good enough. See SSI and steve Chen, history of. The problems aren't as special as you think. In fact, the glaring problem that I've pointed out with machines that rely on local differencing isn't agenda or marketing driven, it's an unavoidable mathematical fact. As things stand now, we will have ever more transistors chuffing away on generating ever-less reliable results. The problem is this: if you use a sufficiently low-order differencing scheme, you can do most of the problems of mathematical physics on a box like Blue Gene. Low order schemes are easy to code, undemanding with regard to non-local bandwidth, and usually much more stable than very high-order schemes. If you want to figure out how to place an air-conditioner, they're just fine. If you're trying to do physics, the plots you produce will be plausible and beautiful, but very often wrong. There is an out that, in fairness, I should mention. If you have processors to burn, you can always overresolve the problem to the point where the renormalization problem I've mentioned, while still there, becomes unimportant. Early results by the biggest ego in the field at the time suggested that it takes about ten times the resolution to do fluid mechanics with local differencing as accurately as you can do it with a pseudospectral scheme. In 3-D, that's a thousand times more processors. For fair comparison, the number of processors in Livermore box would be divided by 1000 to get equivalent performance to a box that could do a decent FFT. Should be posting to comp.arch so people there can switch from being experts on computer architecture to being experts on numerical analysis and mathematical physics. Robert. If I recall red storm correctly, it was a hypercube so had same problem as blue gene. |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
AMD Athlon 64 3500+ Venice vs. Manchester core? | DRS | Overclocking AMD Processors | 2 | May 26th 06 01:22 PM |
Athlon 64 Dual or Single Core ? | Magnusfarce | Homebuilt PC's | 7 | October 30th 05 12:32 AM |
the inquierer posting a little news about new core | ewan | Nvidia Videocards | 0 | February 7th 05 05:54 PM |
Quad Cpu Mobo with Dual Core CPUS how fast would that be ? | We Live for the One we Die for the One | General | 0 | June 14th 04 10:16 PM |
CPU Core Voltage Too Low -> Crash? | Edward J. Neth | Gateway Computers | 27 | February 22nd 04 04:38 AM |