A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » Intel
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Itanium Montecito stuff



 
 
Thread Tools Display Modes
  #1  
Old November 16th 03, 04:50 PM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default Itanium Montecito stuff

Multicore, symettric multi-threading, and 24MB of cache. Looks like this one
was designed with help from the Alpha team that Intel just bought out
recently from HPaq.

Yousuf Khan

http://www.theinquirer.net/?article=12686


  #2  
Old November 16th 03, 05:03 PM
Peter Perlsø
external usenet poster
 
Posts: n/a
Default

Yousuf Khan wrote:

Multicore, symettric multi-threading, and 24MB of cache. Looks like this one
was designed with help from the Alpha team that Intel just bought out
recently from HPaq.

Yousuf Khan

http://www.theinquirer.net/?article=12686



24 Megs of high-speed SRAM ???

Think $$$!

--



- Peter Perls¿ - web: http://u238.dk

"If you have been voting for politicians who promise to give you goodies
at someone else's expense, then you have no right to complain when they
take your money and give it to someone else, including themselves."

-- Thomas Sowell (1992)

  #3  
Old November 16th 03, 05:12 PM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default

"Peter Perlsø" wrote in message
k...
Multicore, symettric multi-threading, and 24MB of cache. Looks like this

one
was designed with help from the Alpha team that Intel just bought out
recently from HPaq.


24 Megs of high-speed SRAM ???

Think $$$!


Yeah, I'm not even sure why they're dicking around. Just get it over and
done with, put 1GB of SRAM
on it, and get rid of that DRAM already. That would be a feature of the
processor, doesn't need any external RAM. :-)

Yousuf Khan



  #4  
Old November 16th 03, 06:44 PM
Robert Myers
external usenet poster
 
Posts: n/a
Default

On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan"
wrote:

Multicore, symettric multi-threading, and 24MB of cache. Looks like this one
was designed with help from the Alpha team that Intel just bought out
recently from HPaq.

Yousuf Khan

http://www.theinquirer.net/?article=12686


SMT was always aimed at Itanium. You can achieve most of the benefits
of OoO execution without actually going OoO by using SMT helper
threads. If you're supporting two cores with four threads each, the
huge cache is inevitable.

RM
  #5  
Old November 16th 03, 08:00 PM
Bill Todd
external usenet poster
 
Posts: n/a
Default


"Robert Myers" wrote in message
...
On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan"
wrote:

Multicore, symettric multi-threading, and 24MB of cache. Looks like this

one
was designed with help from the Alpha team that Intel just bought out
recently from HPaq.


I kind of doubt that: those people are reportedly all working on
Tanglewood, any Itanic SMT effort aimed at shipping in 2005 would have had
to have started at least a bit before the first of them settled in at Intel,
and while they may have offered comments I suspect that whatever SMT
mechanism may be incorporated into Itanic (I'm still a bit skeptical of this
report, but it does seem to be pretty wide-spread) differs sufficiently at a
very basic level from what they were working on for EV8 that their
experience may not have been directly transferrable.


Yousuf Khan

http://www.theinquirer.net/?article=12686


SMT was always aimed at Itanium.


Really? My impression is that the Itanic architecture was largely
established somewhat before SMT appeared on the horizon, that most of the
coordination by the University of Washington researchers was with DEC and
Alpha, and that SMT is particularly amenable to leveraging existing
mechanisms for out-of-order execution (e.g., in Alpha) that are
conspicuously absent in Itanic.

Intel may later have investigated ways to make use of SMT in Itanic, but I
think it was definitely a retrofit.

You can achieve most of the benefits
of OoO execution without actually going OoO by using SMT helper
threads.


Maybe. But without doubt one of the things that you sacrifice is power
efficiency (not that Itanic appears to worry about this much), since without
the OoO hardware facilities you don't have a clue whether the extra work
you're doing will be useful (and even if it is useful in preloading the
caches, when the *real* code path reaches that point the instructions still
get executed a second time anyway).

Such helper threads are also a lot more expensive in use of execution units
than OoO SMT mechanisms are (again, because of the redundant or useless
execution activity noted above), so you need more EUs (and thus more core
area, which starts to limit clock rates unless you go asynchronous) than
you'd need in an OoO SMT implementation to perform as well.

If you're supporting two cores with four threads each,


Do you have a source for the suggestion that each Montecito core supports 4
threads?

the
huge cache is inevitable.


Not if you're primarily using the SMT for helper threads (not that I'm
suggesting that this as a great idea).

- bill



  #6  
Old November 16th 03, 09:02 PM
Robert Myers
external usenet poster
 
Posts: n/a
Default

On Sun, 16 Nov 2003 15:00:21 -0500, "Bill Todd"
wrote:


"Robert Myers" wrote in message
.. .
On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan"
wrote:

snip

SMT was always aimed at Itanium.


Really? My impression is that the Itanic architecture was largely
established somewhat before SMT appeared on the horizon, that most of the
coordination by the University of Washington researchers was with DEC and
Alpha, and that SMT is particularly amenable to leveraging existing
mechanisms for out-of-order execution (e.g., in Alpha) that are
conspicuously absent in Itanic.


Oh, there I go again.

SMT at _Intel_ was always aimed at Itanium.

Intel may later have investigated ways to make use of SMT in Itanic, but I
think it was definitely a retrofit.


I don't think there's much doubt about that.

You can achieve most of the benefits
of OoO execution without actually going OoO by using SMT helper
threads.


Maybe. But without doubt one of the things that you sacrifice is power
efficiency (not that Itanic appears to worry about this much), since without
the OoO hardware facilities you don't have a clue whether the extra work
you're doing will be useful (and even if it is useful in preloading the
caches, when the *real* code path reaches that point the instructions still
get executed a second time anyway).


I expect helper threads to find a place even in OoO processors. The
available work on prescheduled speculative slices looks very
promising. A helper thread would also make things like DynamoRIO look
more attractive.

Such helper threads are also a lot more expensive in use of execution units
than OoO SMT mechanisms are (again, because of the redundant or useless
execution activity noted above), so you need more EUs (and thus more core
area, which starts to limit clock rates unless you go asynchronous) than
you'd need in an OoO SMT implementation to perform as well.


A paper at SC 2003 suggests that "arithmetic is free, bandwidth is
expensive." If someone else doesn't get there first, I'll post a
thread for discussion. It warrants a separate thread.

If you're supporting two cores with four threads each,


Do you have a source for the suggestion that each Montecito core supports 4
threads?


The paper I cited previously in comp.arch
:
:http://www.cs.ucsd.edu/users/jbrown/papers/sp-cmp.pdf
:
:"Speculative Precomputation on Chip Multiprocessors"
:
:which I gather is from
:
:6th Workshop on Multithreaded Execution, Architecture, and Compilation
MTEAC-6) Tuesday, November 19 (2002) Istanbul, Turkey.
:
:"Figure 2 indicates that across the board, SMT consistently
rovides the greatest speedup of the four configurations
:shown, even though it has the fewest overall execution
:resources and the least amount of aggregate cache capacity."
:
:with the four configurations being 4-way SMT, vs 2, 4, and 8 way CMP.

the
huge cache is inevitable.


Not if you're primarily using the SMT for helper threads (not that I'm
suggesting that this as a great idea).


Scheduling helper threads without a roomy cache is tricky. The whole
purpose is to pull stuff into cache ahead of time, and it would be
annoying to have a helper thread bump something else out of cache that
was needed sooner than what the helper thread just pulled in.

RM
  #7  
Old November 17th 03, 12:15 AM
Bill Todd
external usenet poster
 
Posts: n/a
Default


"Robert Myers" wrote in message
...
On Sun, 16 Nov 2003 15:00:21 -0500, "Bill Todd"
wrote:


"Robert Myers" wrote in message
.. .


....

You can achieve most of the benefits
of OoO execution without actually going OoO by using SMT helper
threads.


Maybe. But without doubt one of the things that you sacrifice is power
efficiency (not that Itanic appears to worry about this much), since

without
the OoO hardware facilities you don't have a clue whether the extra work
you're doing will be useful (and even if it is useful in preloading the
caches, when the *real* code path reaches that point the instructions

still
get executed a second time anyway).


I expect helper threads to find a place even in OoO processors.


Possibly, but I suspect only in situations where the workload has fewer
threads than the SMT core supports: otherwise, the other core threads will
likely be far more effective servicing real threads and leaving the
individual thread IPC up to the OoO mechanisms. With Itanic, the trade-off
may be less clear (since it has more to gain on an individual thread from SP
than an OoO core does).

The
available work on prescheduled speculative slices looks very
promising. A helper thread would also make things like DynamoRIO look
more attractive.

Such helper threads are also a lot more expensive in use of execution

units
than OoO SMT mechanisms are (again, because of the redundant or useless
execution activity noted above), so you need more EUs (and thus more core
area, which starts to limit clock rates unless you go asynchronous) than
you'd need in an OoO SMT implementation to perform as well.


A paper at SC 2003 suggests that "arithmetic is free, bandwidth is
expensive."


Free in what respect(s)? The specific context above is power and chip area
(and by extension of the latter clock rate).

If someone else doesn't get there first, I'll post a
thread for discussion. It warrants a separate thread.

If you're supporting two cores with four threads each,


Do you have a source for the suggestion that each Montecito core supports

4
threads?


The paper I cited previously in comp.arch
:
:http://www.cs.ucsd.edu/users/jbrown/papers/sp-cmp.pdf
:
:"Speculative Precomputation on Chip Multiprocessors"
:
:which I gather is from
:
:6th Workshop on Multithreaded Execution, Architecture, and Compilation
MTEAC-6) Tuesday, November 19 (2002) Istanbul, Turkey.
:
:"Figure 2 indicates that across the board, SMT consistently
rovides the greatest speedup of the four configurations
:shown, even though it has the fewest overall execution
:resources and the least amount of aggregate cache capacity."
:
:with the four configurations being 4-way SMT, vs 2, 4, and 8 way CMP.


That paper concentrates on SP in CMP-only environments, and uses the
4-thread SMT core only for comparison purposes. There's nothing in it to
suggest that it refers in any way specifically to Montecito.


the
huge cache is inevitable.


Not if you're primarily using the SMT for helper threads (not that I'm
suggesting that this as a great idea).


Scheduling helper threads without a roomy cache is tricky. The whole
purpose is to pull stuff into cache ahead of time, and it would be
annoying to have a helper thread bump something else out of cache that
was needed sooner than what the helper thread just pulled in.


If that were a serious problem, it would be worst in the extremely small L1
cache and significant in the modest L2 cache. The size of the L3 cache
should be completely insensitive to it by comparison, especially with the
24-way associativity that the current Itanic2 L3 cache has: whatever data
is evicted from the L3 by the helper thread is unlikely to be very
important, whereas the new data that the helper thread is bringing in will
almost certainly be needed almost immediately.

- bill



  #8  
Old November 28th 03, 11:37 AM
James Boswell
external usenet poster
 
Posts: n/a
Default

Yousuf Khan wrote:
"Peter Perlsø" wrote in message
k...
Multicore, symettric multi-threading, and 24MB of cache. Looks like
this one was designed with help from the Alpha team that Intel just
bought out recently from HPaq.


24 Megs of high-speed SRAM ???

Think $$$!


Yeah, I'm not even sure why they're dicking around. Just get it over and
done with, put 1GB of SRAM
on it, and get rid of that DRAM already. That would be a feature of the
processor, doesn't need any external RAM. :-)


Oddly enough, IBM were going on about that..

and on a .045 process, they could probably get a gig of edram in under
200mm^2 of die area, using the 36MB edram dies they've got alongside the
POWER5 as a guide

-JB


  #9  
Old November 28th 03, 04:20 PM
Peter Perlsø
external usenet poster
 
Posts: n/a
Default

James Boswell wrote:

Yousuf Khan wrote:

"Peter Perlsø" wrote in message
.dk...

Multicore, symettric multi-threading, and 24MB of cache. Looks like
this one was designed with help from the Alpha team that Intel just
bought out recently from HPaq.

24 Megs of high-speed SRAM ???

Think $$$!


Yeah, I'm not even sure why they're dicking around. Just get it over and
done with, put 1GB of SRAM
on it, and get rid of that DRAM already. That would be a feature of the
processor, doesn't need any external RAM. :-)



Oddly enough, IBM were going on about that..

and on a .045 process, they could probably get a gig of edram in under
200mm^2 of die area, using the 36MB edram dies they've got alongside the
POWER5 as a guide

-JB




EDRAM

Enhanced Dynamic Random Access Memory
(E-D-ram)

Another form of DRAM that includes an SRAM cache on the chip. This
allows frequently accessed data to be obtained faster. (Also known as
CDRAM.)


Just FYI.

--



- Peter Perls¿ - web: http://u238.dk

"If you have been voting for politicians who promise to give you goodies
at someone else's expense, then you have no right to complain when they
take your money and give it to someone else, including themselves."

-- Thomas Sowell (1992)

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Anyone know any time frame that stuff like PCI express, BTX formfactor is going to be pushed out into the mkt? [email protected] General 1 April 28th 04 04:49 AM
Intel COO signals willingness to go with AMD64!! Yousuf Khan General 136 February 16th 04 10:31 PM
Itanium Montecito stuff Yousuf Khan General 10 November 30th 03 06:20 PM
IBM white paper on Opteron Yousuf Khan General 115 November 7th 03 03:04 AM
Supercomputer interconnect technologies, Opteron & Itanium Yousuf Khan Intel 4 August 29th 03 12:47 PM


All times are GMT +1. The time now is 04:21 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.