If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Itanium Montecito stuff
Multicore, symettric multi-threading, and 24MB of cache. Looks like this one
was designed with help from the Alpha team that Intel just bought out recently from HPaq. Yousuf Khan http://www.theinquirer.net/?article=12686 |
#2
|
|||
|
|||
Yousuf Khan wrote:
Multicore, symettric multi-threading, and 24MB of cache. Looks like this one was designed with help from the Alpha team that Intel just bought out recently from HPaq. Yousuf Khan http://www.theinquirer.net/?article=12686 24 Megs of high-speed SRAM ??? Think $$$! -- - Peter Perls¿ - web: http://u238.dk "If you have been voting for politicians who promise to give you goodies at someone else's expense, then you have no right to complain when they take your money and give it to someone else, including themselves." -- Thomas Sowell (1992) |
#3
|
|||
|
|||
"Peter Perlsø" wrote in message
k... Multicore, symettric multi-threading, and 24MB of cache. Looks like this one was designed with help from the Alpha team that Intel just bought out recently from HPaq. 24 Megs of high-speed SRAM ??? Think $$$! Yeah, I'm not even sure why they're dicking around. Just get it over and done with, put 1GB of SRAM on it, and get rid of that DRAM already. That would be a feature of the processor, doesn't need any external RAM. :-) Yousuf Khan |
#4
|
|||
|
|||
On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan"
wrote: Multicore, symettric multi-threading, and 24MB of cache. Looks like this one was designed with help from the Alpha team that Intel just bought out recently from HPaq. Yousuf Khan http://www.theinquirer.net/?article=12686 SMT was always aimed at Itanium. You can achieve most of the benefits of OoO execution without actually going OoO by using SMT helper threads. If you're supporting two cores with four threads each, the huge cache is inevitable. RM |
#5
|
|||
|
|||
"Robert Myers" wrote in message ... On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan" wrote: Multicore, symettric multi-threading, and 24MB of cache. Looks like this one was designed with help from the Alpha team that Intel just bought out recently from HPaq. I kind of doubt that: those people are reportedly all working on Tanglewood, any Itanic SMT effort aimed at shipping in 2005 would have had to have started at least a bit before the first of them settled in at Intel, and while they may have offered comments I suspect that whatever SMT mechanism may be incorporated into Itanic (I'm still a bit skeptical of this report, but it does seem to be pretty wide-spread) differs sufficiently at a very basic level from what they were working on for EV8 that their experience may not have been directly transferrable. Yousuf Khan http://www.theinquirer.net/?article=12686 SMT was always aimed at Itanium. Really? My impression is that the Itanic architecture was largely established somewhat before SMT appeared on the horizon, that most of the coordination by the University of Washington researchers was with DEC and Alpha, and that SMT is particularly amenable to leveraging existing mechanisms for out-of-order execution (e.g., in Alpha) that are conspicuously absent in Itanic. Intel may later have investigated ways to make use of SMT in Itanic, but I think it was definitely a retrofit. You can achieve most of the benefits of OoO execution without actually going OoO by using SMT helper threads. Maybe. But without doubt one of the things that you sacrifice is power efficiency (not that Itanic appears to worry about this much), since without the OoO hardware facilities you don't have a clue whether the extra work you're doing will be useful (and even if it is useful in preloading the caches, when the *real* code path reaches that point the instructions still get executed a second time anyway). Such helper threads are also a lot more expensive in use of execution units than OoO SMT mechanisms are (again, because of the redundant or useless execution activity noted above), so you need more EUs (and thus more core area, which starts to limit clock rates unless you go asynchronous) than you'd need in an OoO SMT implementation to perform as well. If you're supporting two cores with four threads each, Do you have a source for the suggestion that each Montecito core supports 4 threads? the huge cache is inevitable. Not if you're primarily using the SMT for helper threads (not that I'm suggesting that this as a great idea). - bill |
#6
|
|||
|
|||
On Sun, 16 Nov 2003 15:00:21 -0500, "Bill Todd"
wrote: "Robert Myers" wrote in message .. . On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan" wrote: snip SMT was always aimed at Itanium. Really? My impression is that the Itanic architecture was largely established somewhat before SMT appeared on the horizon, that most of the coordination by the University of Washington researchers was with DEC and Alpha, and that SMT is particularly amenable to leveraging existing mechanisms for out-of-order execution (e.g., in Alpha) that are conspicuously absent in Itanic. Oh, there I go again. SMT at _Intel_ was always aimed at Itanium. Intel may later have investigated ways to make use of SMT in Itanic, but I think it was definitely a retrofit. I don't think there's much doubt about that. You can achieve most of the benefits of OoO execution without actually going OoO by using SMT helper threads. Maybe. But without doubt one of the things that you sacrifice is power efficiency (not that Itanic appears to worry about this much), since without the OoO hardware facilities you don't have a clue whether the extra work you're doing will be useful (and even if it is useful in preloading the caches, when the *real* code path reaches that point the instructions still get executed a second time anyway). I expect helper threads to find a place even in OoO processors. The available work on prescheduled speculative slices looks very promising. A helper thread would also make things like DynamoRIO look more attractive. Such helper threads are also a lot more expensive in use of execution units than OoO SMT mechanisms are (again, because of the redundant or useless execution activity noted above), so you need more EUs (and thus more core area, which starts to limit clock rates unless you go asynchronous) than you'd need in an OoO SMT implementation to perform as well. A paper at SC 2003 suggests that "arithmetic is free, bandwidth is expensive." If someone else doesn't get there first, I'll post a thread for discussion. It warrants a separate thread. If you're supporting two cores with four threads each, Do you have a source for the suggestion that each Montecito core supports 4 threads? The paper I cited previously in comp.arch : :http://www.cs.ucsd.edu/users/jbrown/papers/sp-cmp.pdf : :"Speculative Precomputation on Chip Multiprocessors" : :which I gather is from : :6th Workshop on Multithreaded Execution, Architecture, and Compilation MTEAC-6) Tuesday, November 19 (2002) Istanbul, Turkey. : :"Figure 2 indicates that across the board, SMT consistently rovides the greatest speedup of the four configurations :shown, even though it has the fewest overall execution :resources and the least amount of aggregate cache capacity." : :with the four configurations being 4-way SMT, vs 2, 4, and 8 way CMP. the huge cache is inevitable. Not if you're primarily using the SMT for helper threads (not that I'm suggesting that this as a great idea). Scheduling helper threads without a roomy cache is tricky. The whole purpose is to pull stuff into cache ahead of time, and it would be annoying to have a helper thread bump something else out of cache that was needed sooner than what the helper thread just pulled in. RM |
#7
|
|||
|
|||
"Robert Myers" wrote in message ... On Sun, 16 Nov 2003 15:00:21 -0500, "Bill Todd" wrote: "Robert Myers" wrote in message .. . .... You can achieve most of the benefits of OoO execution without actually going OoO by using SMT helper threads. Maybe. But without doubt one of the things that you sacrifice is power efficiency (not that Itanic appears to worry about this much), since without the OoO hardware facilities you don't have a clue whether the extra work you're doing will be useful (and even if it is useful in preloading the caches, when the *real* code path reaches that point the instructions still get executed a second time anyway). I expect helper threads to find a place even in OoO processors. Possibly, but I suspect only in situations where the workload has fewer threads than the SMT core supports: otherwise, the other core threads will likely be far more effective servicing real threads and leaving the individual thread IPC up to the OoO mechanisms. With Itanic, the trade-off may be less clear (since it has more to gain on an individual thread from SP than an OoO core does). The available work on prescheduled speculative slices looks very promising. A helper thread would also make things like DynamoRIO look more attractive. Such helper threads are also a lot more expensive in use of execution units than OoO SMT mechanisms are (again, because of the redundant or useless execution activity noted above), so you need more EUs (and thus more core area, which starts to limit clock rates unless you go asynchronous) than you'd need in an OoO SMT implementation to perform as well. A paper at SC 2003 suggests that "arithmetic is free, bandwidth is expensive." Free in what respect(s)? The specific context above is power and chip area (and by extension of the latter clock rate). If someone else doesn't get there first, I'll post a thread for discussion. It warrants a separate thread. If you're supporting two cores with four threads each, Do you have a source for the suggestion that each Montecito core supports 4 threads? The paper I cited previously in comp.arch : :http://www.cs.ucsd.edu/users/jbrown/papers/sp-cmp.pdf : :"Speculative Precomputation on Chip Multiprocessors" : :which I gather is from : :6th Workshop on Multithreaded Execution, Architecture, and Compilation MTEAC-6) Tuesday, November 19 (2002) Istanbul, Turkey. : :"Figure 2 indicates that across the board, SMT consistently rovides the greatest speedup of the four configurations :shown, even though it has the fewest overall execution :resources and the least amount of aggregate cache capacity." : :with the four configurations being 4-way SMT, vs 2, 4, and 8 way CMP. That paper concentrates on SP in CMP-only environments, and uses the 4-thread SMT core only for comparison purposes. There's nothing in it to suggest that it refers in any way specifically to Montecito. the huge cache is inevitable. Not if you're primarily using the SMT for helper threads (not that I'm suggesting that this as a great idea). Scheduling helper threads without a roomy cache is tricky. The whole purpose is to pull stuff into cache ahead of time, and it would be annoying to have a helper thread bump something else out of cache that was needed sooner than what the helper thread just pulled in. If that were a serious problem, it would be worst in the extremely small L1 cache and significant in the modest L2 cache. The size of the L3 cache should be completely insensitive to it by comparison, especially with the 24-way associativity that the current Itanic2 L3 cache has: whatever data is evicted from the L3 by the helper thread is unlikely to be very important, whereas the new data that the helper thread is bringing in will almost certainly be needed almost immediately. - bill |
#8
|
|||
|
|||
Yousuf Khan wrote:
"Peter Perlsø" wrote in message k... Multicore, symettric multi-threading, and 24MB of cache. Looks like this one was designed with help from the Alpha team that Intel just bought out recently from HPaq. 24 Megs of high-speed SRAM ??? Think $$$! Yeah, I'm not even sure why they're dicking around. Just get it over and done with, put 1GB of SRAM on it, and get rid of that DRAM already. That would be a feature of the processor, doesn't need any external RAM. :-) Oddly enough, IBM were going on about that.. and on a .045 process, they could probably get a gig of edram in under 200mm^2 of die area, using the 36MB edram dies they've got alongside the POWER5 as a guide -JB |
#9
|
|||
|
|||
James Boswell wrote:
Yousuf Khan wrote: "Peter Perlsø" wrote in message .dk... Multicore, symettric multi-threading, and 24MB of cache. Looks like this one was designed with help from the Alpha team that Intel just bought out recently from HPaq. 24 Megs of high-speed SRAM ??? Think $$$! Yeah, I'm not even sure why they're dicking around. Just get it over and done with, put 1GB of SRAM on it, and get rid of that DRAM already. That would be a feature of the processor, doesn't need any external RAM. :-) Oddly enough, IBM were going on about that.. and on a .045 process, they could probably get a gig of edram in under 200mm^2 of die area, using the 36MB edram dies they've got alongside the POWER5 as a guide -JB EDRAM Enhanced Dynamic Random Access Memory (E-D-ram) Another form of DRAM that includes an SRAM cache on the chip. This allows frequently accessed data to be obtained faster. (Also known as CDRAM.) Just FYI. -- - Peter Perls¿ - web: http://u238.dk "If you have been voting for politicians who promise to give you goodies at someone else's expense, then you have no right to complain when they take your money and give it to someone else, including themselves." -- Thomas Sowell (1992) |
#10
|
|||
|
|||
|
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Anyone know any time frame that stuff like PCI express, BTX formfactor is going to be pushed out into the mkt? | [email protected] | General | 1 | April 28th 04 04:49 AM |
Intel COO signals willingness to go with AMD64!! | Yousuf Khan | General | 136 | February 16th 04 10:31 PM |
Itanium Montecito stuff | Yousuf Khan | General | 10 | November 30th 03 06:20 PM |
IBM white paper on Opteron | Yousuf Khan | General | 115 | November 7th 03 03:04 AM |
Supercomputer interconnect technologies, Opteron & Itanium | Yousuf Khan | Intel | 4 | August 29th 03 12:47 PM |