If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Memory Transactions per Second.
"Robert Redelmeier" wrote in message ... In alt.lang.asm Skybuck Flying wrote in part: " Why don't you write a program that simply reads a lot of memory locations (say 100 Billion) and use the wall clock timer to determine how much time it takes and from there the real BW. You just have to make the program access a similar data-set (i.e. random locations) as the one you want to use in real life. " That's pretty hard to do when I don't have the system don't you think ? I want to know this information so I can decide which system to buy. And there are a lot of systems out there ! " Sure, but a friendly store would let you run test pgms. Some time ago, I wrote (apologies for the C): " LOL, Next time I will ask the webstore if they wanna run a program like that. Ofcourse there is a problem with your reasoning if it's a new system I might not yet know how to program it. " /* lat10m.c - Measure latency of 10 million fresh memory reads " Only 10 million ? I think my AMD x2 3800+ dual core can easily do 180 million per core per second. So your test is already flawed. Doesn't that scare you a bit ? I'd rather have the manufacturer figure it out... if they can't figure it out then who can ? Also another potential problem is with running out of memory. Suppose the processor is 1 Terrahertz and the memory is only 10 gigabytes then one can't simply generate a "unique" table and pointer-chase-walk it. There is a chance that some of the memory addresses will be cached and the more transactions happen the more chance it gets cached. However as long as the theoretical maximum number of memory transactions is less than the ammount of memory then this is for now not yet a problem, but could be a problem later on. A small problem perhaps but none the less. A simple solution would be an option in the bios/processor to turn caching off, for proper memory benchmarking. (My pentium III can actually do it, the AMD X2 3800+ cannot as far as I know ?!) Perhaps I should run a memtest on my Pentium III one day with caching off... just to see what happens ! LOL =D Bye, Skybuck. |
#2
|
|||
|
|||
Memory Transactions per Second.
In alt.lang.asm Skybuck Flying wrote in part:
LOL, Next time I will ask the webstore if they wanna run a program like that. Why not? All they need to do under Linux is make the systems visible via ssh and give you guest login. Ofcourse there is a problem with your reasoning if it's a new system I might not yet know how to program it. It'd better have a C compiler somewhere. Then you could run my code. " /* lat10m.c - Measure latency of 10 million fresh memory reads " Only 10 million ? I think my AMD x2 3800+ dual core can easily do 180 million per core per second. Maybe in-order. The quad Athlon/DDR3 I tested could only do 12 million pseudo-random. I could try it multicore. So your test is already flawed. Doesn't that scare you a bit ? Not in the least. I know it's flawed. Going to 100m iterations is an easy change. I'd rather have the manufacturer figure it out... if they can't figure it out then who can ? No-one -- memory performance will depend on the DRAM used (timings). Which the CPU mfr doesn't control. Also another potential problem is with running out of memory. Suppose the processor is 1 Terrahertz and the memory is only 10 gigabytes then one can't simply generate a "unique" table and pointer-chase-walk it. There is a chance that some of the memory addresses will be cached and the more transactions happen the more chance it gets cached. Not really -- if I have 10 GB RAM and 10 MB cache, the chances a random location will be in cache is 0.001 . -- Robert |
#3
|
|||
|
|||
Memory Transactions per Second.
"Robert Redelmeier" wrote in message ... In alt.lang.asm Skybuck Flying wrote in part: LOL, Next time I will ask the webstore if they wanna run a program like that. " Why not? All they need to do under Linux is make the systems visible via ssh and give you guest login. " Way to complex, way to dangerous to connect systems like that. And they don't trust it and don't have the time for it. And the systems I am interested in are probably in pieces. Ofcourse there is a problem with your reasoning if it's a new system I might not yet know how to program it. " It'd better have a C compiler somewhere. Then you could run my code. " Unlikely since I am interested in cuda and other parallel systems which require special programming and compilers. " /* lat10m.c - Measure latency of 10 million fresh memory reads " Only 10 million ? I think my AMD x2 3800+ dual core can easily do 180 million per core per second. " Maybe in-order. The quad Athlon/DDR3 I tested could only do 12 million pseudo-random. I could try it multicore. " Another big flaw in your test program. Your benchmark is probably limited by the random number generator. It make sense.. go view some benchmarks about integer performance. So your test is already flawed. Doesn't that scare you a bit ? " Not in the least. I know it's flawed. Going to 100m iterations is an easy change. " Unlikely since it's flaw after flaw.. it needs new approach It's gonna take you a while to do it properly. I'd rather have the manufacturer figure it out... if they can't figure it out then who can ? " No-one -- memory performance will depend on the DRAM used (timings). Which the CPU mfr doesn't control. " Timings is a start. My formula for now is: MemoryTransactionPerSecond = MemoryClockInHertz / BytesPerTransaction ^ Seems pretty good to me, and pretty simple too ! Also another potential problem is with running out of memory. Suppose the processor is 1 Terrahertz and the memory is only 10 gigabytes then one can't simply generate a "unique" table and pointer-chase-walk it. There is a chance that some of the memory addresses will be cached and the more transactions happen the more chance it gets cached. " Not really -- if I have 10 GB RAM and 10 MB cache, the chances a random location will be in cache is 0.001 . " Not quite, first of all the table requires 64 bit memory addresses, using anything less wouldn't make much sense since that would slow it down even further. Therefore 10 GB must be divided by 8 bytes which is = 1.342.177.280 possible memory locations. The cache itself is probably even worse and stores 64 bytes per cache line or so which means: 10 MB / 64 = 163.840 memory locations. It's highly likely that all cache lines will be overwritten each time and that the chance of actually hitting anything is non-existent. This is called cache trashing. So in a way I dismissed my own worries about the cache non the less it's an interesting calculation so let's go on. The real chance of hitting the cache is: 163.840 / 1.342.177.280 = 0,0001220703125 An order of magnitude worse then you thought, another big flaw in your reasoning ! =D One could wonder why caches are added at all there chance of being hit is 0,0001220703125% in this scenerio... which would be absolutely worthless. At least the data cache would be worthless... the instruction cache would be nice... though perhaps the data cache is also still nice for static pointers to tables or so... pointers which never change... like base addresses. However the story does not end yet... because you have another flaw in your reasoning... another missing factor in the equation. The chance must be multiplied by the number of tries. This depends on the actual speed of the memory system. I mentioned a 1 THZ processor, so let's keep it realistic and divide that by 100 for a realistic memory system. That's 10 GHZ for the memory system, which means 10.000.000.000 * 0,0001220703125 = 1.220.703 cache hits ! So it turns out the cache is not useless after all ! And does play a big roll... what a surprising turn of the story isn't it ! So again another big flaw in your reasoning. Let's calculate how many additional hz that cache gives: 1.220.703 * 100 = 122.070.312 That's roughly an additional 122 MHZ just because of the cache. That's not so bad... it could ofcourse be a lot better if the memory seeks where within it's range but that's just the point they are not. Let's see how much procentage that is of the entire memory system: (122 / 10000) * 100% = 1.22% 1% That's still a bit too much for my taste ! =D I hope you not a chip designer or anything, cause your tests suck big time ! =D Bye, Skybuck =D |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Do SD/MMC memory slots need driver updates to read SDHC memory cards? | ken | Storage (alternative) | 15 | February 26th 09 04:13 AM |
Does the addressable physical memory range depend on which slots are occupied by the memory? | Lighter | General | 4 | October 10th 06 01:24 AM |
memory mapped IO: device registers mapped to virtual memory or physical memory? | Olumide | General | 13 | February 9th 06 10:44 PM |
Cahoot "webcard" for one time transactions | des | UK Computer Vendors | 3 | November 13th 03 05:43 PM |
Acer travelmate 313T 31x 128Mb memory 144Mb memory | Paul Cahill | Acer Computers | 0 | September 15th 03 08:33 PM |