A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » General
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Smart memory hubs being proposed



 
 
Thread Tools Display Modes
  #1  
Old December 27th 03, 10:56 PM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default Smart memory hubs being proposed

Both AMD and Intel are proposing a separate but similar new approach to
memory interconnection design for the future. They are dubbing it smart
memory hubs right now, but the details are a little sketchy. It involves
putting some sort of intelligence right into the memory modules.

http://www.eet.com/semi/news/OEG20030508S0023

The initial efforts are aimed at increasing memory density in servers. I'm
not sure how exactly these hubs are supposed to be "smart". I also fail to
see how adding another layer of circuitry in between the memory controller
and memory itself would speed up memory accesses, since it adds another hop
into the equation. However, perhaps these are the successors to the current
SPD ROM that is implanted on every DIMM to describe its architecture to the
memory controller on initialization? Perhaps these hubs send additional
information that SPDs can't send by themselves?

Yousuf Khan


  #2  
Old December 28th 03, 12:55 AM
daytripper
external usenet poster
 
Posts: n/a
Default

On Sat, 27 Dec 2003 21:56:56 GMT, "Yousuf Khan"
wrote:

Both AMD and Intel are proposing a separate but similar new approach to
memory interconnection design for the future. They are dubbing it smart
memory hubs right now, but the details are a little sketchy. It involves
putting some sort of intelligence right into the memory modules.

http://www.eet.com/semi/news/OEG20030508S0023

The initial efforts are aimed at increasing memory density in servers. I'm
not sure how exactly these hubs are supposed to be "smart". I also fail to
see how adding another layer of circuitry in between the memory controller
and memory itself would speed up memory accesses, since it adds another hop
into the equation. However, perhaps these are the successors to the current
SPD ROM that is implanted on every DIMM to describe its architecture to the
memory controller on initialization? Perhaps these hubs send additional
information that SPDs can't send by themselves?

Yousuf Khan


FB-DIMMs....Might be a lot less there than meets the eye of the article.

FB-DIMMs translate a narrow but very fast memory interconnect into ddr2 sdram
transactions, with each FB-Dimm having an asic (the "hub") doing all of the
things discrete registers and plls used to do - PLUS the memory interconnect
actually passes through the hub on one dimm to get to the next dimm/hub,
through that one to the next, and so on. It's quite extensible, which
addresses the problem of hooking a bunch of dimms to *anything* these days
while maintaining interconnect speed.

Note, however, that memory latency is clearly not addressed in a positive
manner - sticking n pass-thru elements between the nth dimm's drams and the
host chipset rarely results in quicker memory response ;-)

One can surmise the era of (up to) 6MB on-chip caches is expected to reduce
typical miss ratios down to where the even-longer-than-before latency isn't a
significant hit to overall platform performance...

And in any case, some powerful marketing forces will be brought to bear to
discourage any thoughts of "This is another iRDRAM marketing disaster waiting
to happen"...

/daytripper (wait for it ;-)
  #3  
Old December 28th 03, 03:02 PM
Robert Myers
external usenet poster
 
Posts: n/a
Default

On Sat, 27 Dec 2003 23:55:30 GMT, daytripper
wrote:

snip


FB-DIMMs....Might be a lot less there than meets the eye of the article.

FB-DIMMs translate a narrow but very fast memory interconnect into ddr2 sdram
transactions, with each FB-Dimm having an asic (the "hub") doing all of the
things discrete registers and plls used to do - PLUS the memory interconnect
actually passes through the hub on one dimm to get to the next dimm/hub,
through that one to the next, and so on. It's quite extensible, which
addresses the problem of hooking a bunch of dimms to *anything* these days
while maintaining interconnect speed.

Presumably solving the problems inherent in a multi-drop bus?

Note, however, that memory latency is clearly not addressed in a positive
manner - sticking n pass-thru elements between the nth dimm's drams and the
host chipset rarely results in quicker memory response ;-)

One can surmise the era of (up to) 6MB on-chip caches is expected to reduce
typical miss ratios down to where the even-longer-than-before latency isn't a
significant hit to overall platform performance...

The 6mb cache is an act of desperation on Intel's part. I don't
_think_ their strategy is to keep increasing cache size. It's a
losing strategy, anyway, unless you go to COMA. Itanium's in-order
architecture is just too inflexible, and the problem is still cache
misses.

Intel will, I gather, move the memory controller onto the die. Other
than that, the strategy of the day (and for the forseeable future) is
to hide latency, not to address it directly.

RM

  #4  
Old December 28th 03, 04:32 PM
Robert Redelmeier
external usenet poster
 
Posts: n/a
Default

In comp.sys.ibm.pc.hardware.chips Robert Myers wrote:
The 6mb cache is an act of desperation on Intel's part. I don't


Agreed. yet ...

_think_ their strategy is to keep increasing cache size. It's a
losing strategy, anyway, unless you go to COMA. Itanium's in-order
architecture is just too inflexible, and the problem is still cache
misses.


Then how do you explain the _dismal_ performance of the
Celeron4 with only 128 KB L2 and poor showing of the first
P4 with 256 versus the current P4 at 512 KB? These are
all the same P7 core with the same small L1s.

I can't blame Intel for wanting to try more cache.
This is obviously a game of diminishing returns, and the
P4EE seems to be past. 512 KB seems optimal for current
datasets/problems/benchmarques. Cache MATTERS.

Notice also how the AMD K7 improved from 256 to 512.
The Duron, with the tiny 64 KB L2 performs amazingly well.
Decent L1s and the excellent organization of L2 (16 way,
exclusive) saves it from the Celeron4's fate.

-- Robert

  #5  
Old December 28th 03, 06:01 PM
Robert Myers
external usenet poster
 
Posts: n/a
Default

On Sun, 28 Dec 2003 15:32:00 GMT, Robert Redelmeier
wrote:

In comp.sys.ibm.pc.hardware.chips Robert Myers wrote:
The 6mb cache is an act of desperation on Intel's part. I don't


Agreed. yet ...

_think_ their strategy is to keep increasing cache size. It's a
losing strategy, anyway, unless you go to COMA. Itanium's in-order
architecture is just too inflexible, and the problem is still cache
misses.


Then how do you explain the _dismal_ performance of the
Celeron4 with only 128 KB L2 and poor showing of the first
P4 with 256 versus the current P4 at 512 KB? These are
all the same P7 core with the same small L1s.

I can't blame Intel for wanting to try more cache.
This is obviously a game of diminishing returns, and the
P4EE seems to be past. 512 KB seems optimal for current
datasets/problems/benchmarques. Cache MATTERS.

Notice also how the AMD K7 improved from 256 to 512.
The Duron, with the tiny 64 KB L2 performs amazingly well.
Decent L1s and the excellent organization of L2 (16 way,
exclusive) saves it from the Celeron4's fate.


Well of course cache matters, and if the latency is fixed, the
increase in cache size with the speed at which you are retiring
instructions (not clock speed) has to be superlinear, no matter how
you get there. That is to say, cache size will keep increasing,
assuming that processors are able to retire instructions at increasing
speeds.

My only point was that latency still matters. Superficial examination
of early results from the HP Superdome showed that Itanium is
apparently not very tolerant of increased latency, and HP engineers
with whom I've corresponded have not disagreed; i.e, there is a
substantial payoff to be had from a better memory subsystem.

I don't think it needs to be explained to you, but I will make the
point anyway: increased cache does no good if you have no way of
triggering memory fetches far enough ahead of time to make use of the
cache. An OoO processor can just juggle more instructions, but
Itanium currently retires instructions in order. Sooner or later,
Intel has to do something for Itanium other than to increase the cache
size.

RM
  #6  
Old December 28th 03, 07:08 PM
Robert Redelmeier
external usenet poster
 
Posts: n/a
Default

In comp.sys.ibm.pc.hardware.chips Robert Myers wrote:
My only point was that latency still matters. Superficial examination
of early results from the HP Superdome showed that Itanium is
apparently not very tolerant of increased latency, and HP engineers


Oh, fully agreed. For some apps, latency is _everything_
(linked-lists, TP dB). If the app hopscotches randomly
thru RAM memory (SETI?) nothing else matters much.

Modern systems have done wonders to deliver bandwidth.
Dual channell DDR at high clocks. But has much been done
to improve latency from ~130 ns? (old number)

I thought the main idea behind on-CPU memory controllers
was to reduce this to ~70 ns by reduced bufferin/queuing.

A smart hub might be able to detect patterns like 2-4-6-8,
4-8-16-20-24 or 5-4-3-2-1 but cannot possibly do anything
with data-driven pseudo-randoms except add latency.

Itanium currently retires instructions in order. Sooner or later,
Intel has to do something for Itanium other than to increase the cache
size.


Are you suggesting Out-of-Order retirement???
Intriguing possibility with a new arch.

Of course, SMT is just a different solution -- keep the CPU
busy with other work during the ~300 clock read stalls.
Good if there are parallel threads/tasks. Useless if not.

-- Robert


  #7  
Old December 28th 03, 07:19 PM
Bill Todd
external usenet poster
 
Posts: n/a
Default


"Robert Redelmeier" wrote in message
m...
In comp.sys.ibm.pc.hardware.chips Robert Myers wrote:
The 6mb cache is an act of desperation on Intel's part. I don't


Agreed. yet ...

_think_ their strategy is to keep increasing cache size. It's a
losing strategy, anyway, unless you go to COMA. Itanium's in-order
architecture is just too inflexible, and the problem is still cache
misses.


Then how do you explain the _dismal_ performance of the
Celeron4 with only 128 KB L2


Market segmentation: Celeron isn't *meant* to perform at levels comparable
to Pentium - else why would people shell out more for the latter?

and poor showing of the first
P4 with 256 versus the current P4 at 512 KB?


Compilers have gotten a lot better at optimizing for P4 too over the past
couple of years - the difference from the early P4s is not *just* cache
size.

These are
all the same P7 core with the same small L1s.


The above doesn't necessarily mean that P4 may not be somewhat more
sensitive to cache size than its predecessor - but it clearly doesn't
require many MB of cache to perform well, unlike Itanic.

....

Notice also how the AMD K7 improved from 256 to 512.


Doubling cache size usually helps. But doubling cache size from 256 KB to
512 KB is a hell of a lot less expensive (in terms of chip area) than
doubling cache size from 6 MB to 12 MB.

The Duron, with the tiny 64 KB L2 performs amazingly well.
Decent L1s and the excellent organization of L2 (16 way,
exclusive) saves it from the Celeron4's fate.


Er, no: having 128 KB of L1 cache plus an exclusive L2 that makes the total
cache size effectively 192 KB (vs. the older Athlon's effective cache size
of 128 KB + 256 KB = 384 KB), plus significantly better IPC, is what saves
it from being a dud like Celeron.

- bill



  #8  
Old December 28th 03, 08:26 PM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default

"Robert Myers" wrote in message
...
The 6mb cache is an act of desperation on Intel's part. I don't
_think_ their strategy is to keep increasing cache size. It's a
losing strategy, anyway, unless you go to COMA. Itanium's in-order
architecture is just too inflexible, and the problem is still cache
misses.


What's COMA?

Intel will, I gather, move the memory controller onto the die. Other
than that, the strategy of the day (and for the forseeable future) is
to hide latency, not to address it directly.


Yes, but AMD is also proposing something similar, and they've already moved
the memory controller onboard.

Yousuf Khan


  #9  
Old December 28th 03, 09:21 PM
CJT
external usenet poster
 
Posts: n/a
Default

Robert Redelmeier wrote:

In comp.sys.ibm.pc.hardware.chips Robert Myers wrote:

The 6mb cache is an act of desperation on Intel's part. I don't



Agreed. yet ...


_think_ their strategy is to keep increasing cache size. It's a
losing strategy, anyway, unless you go to COMA. Itanium's in-order
architecture is just too inflexible, and the problem is still cache
misses.



Then how do you explain the _dismal_ performance of the
Celeron4 with only 128 KB L2 and poor showing of the first
P4 with 256 versus the current P4 at 512 KB? These are
all the same P7 core with the same small L1s.

I can't blame Intel for wanting to try more cache.
This is obviously a game of diminishing returns, and the
P4EE seems to be past. 512 KB seems optimal for current
datasets/problems/benchmarques. Cache MATTERS.


The non-Intel crowd has known that for years. But cache is
also expensive.


Notice also how the AMD K7 improved from 256 to 512.
The Duron, with the tiny 64 KB L2 performs amazingly well.
Decent L1s and the excellent organization of L2 (16 way,
exclusive) saves it from the Celeron4's fate.

-- Robert



--
After being targeted with gigabytes of trash by the "SWEN" worm, I have
concluded we must conceal our e-mail address. Our true address is the
mirror image of what you see before the "@" symbol. It's a shame such
steps are necessary. ...Charlie
  #10  
Old December 28th 03, 10:00 PM
Robert Myers
external usenet poster
 
Posts: n/a
Default

On Sun, 28 Dec 2003 19:26:26 GMT, "Yousuf Khan"
wrote:

"Robert Myers" wrote in message
.. .
The 6mb cache is an act of desperation on Intel's part. I don't
_think_ their strategy is to keep increasing cache size. It's a
losing strategy, anyway, unless you go to COMA. Itanium's in-order
architecture is just too inflexible, and the problem is still cache
misses.


What's COMA?

Cache-only memory architecture. The original Cray's were effectively
COMA because Seymour used for main memory what everybody else used for
cache. That's why some three-letter-agencies with no use for vector
architectures bought the machines.

Intel will, I gather, move the memory controller onto the die. Other
than that, the strategy of the day (and for the forseeable future) is
to hide latency, not to address it directly.


Yes, but AMD is also proposing something similar, and they've already moved
the memory controller onboard.

Geez, Yousuf, not _everything_ is Intel vs. AMD. ;-). Sometimes a
technical issue is just a technical issue.

I cannot for the life of me get inside the head of whoever makes the
technical calls at Intel, because Intel seems to want to do everything
the hard way. Why, I do not know.

As it happens, Intel's bone-headed approach to computer architecture
works well enough for the kinds of problems I am most interested in,
which involve doing the same thing over and over again in ways that
are stupefyingly predictable and you just want to find a way to do it
very fast. I've often wondered if the secret of the origins of the
Itanium architecture isn't that the engineers who designed it didn't
adequately take into account that most of the world isn't doing
technical computing. That, and the fact that nothing works really
well for the applications that matters the most, which is OLTP
(on-line transaction processing).

Itanium happens to interest me also as an intellectual sandbox in
which I can come to grips with things that may be completely obvious
to some people, but not to me. It does well enough for the problems
that interest me, and over the long haul, I expect Intel's bulldozer
approach to architecture and marketing to win. Those things together
are why you think I am an Itanium bigot.

RM
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PC2100 versus PC2700 Marc Guyott Asus Motherboards 3 January 20th 05 03:32 AM
Aggressive memory settings questions Howie Asus Motherboards 4 November 6th 04 08:29 PM
I still don't completly understand FSB.... legion Homebuilt PC's 7 October 28th 04 03:20 AM
"Out Of Memory error when trying to start a program or while program is running" Dharmarajan.K General Hardware 0 June 11th 04 10:42 PM
Disk to disk copying with overclocked memory JT General 30 March 21st 04 03:22 AM


All times are GMT +1. The time now is 12:02 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.