If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Why does this memory intensive C++ program get poor memory access speed?
MemTest86 and it showed:
Intel Core-i5 750 2.67 Ghz (quad core) 32K L1 88,893 MB/Sec 256K L2 37,560 MB/Sec 8 MB L3 26,145 MB/Sec 8.0 GB RAM 11,852 MB/Sec The resulting memory access speed is substantially slower than worst case cache hit ratio should provide. For example I am often getting 117 MB/Sec. It was compiled with the optimize for speed flags under VS C++ 9.0 #include stdio.h #include stdlib.h #include vector #include time.h typedef unsigned int uint32; uint32 Max = 0x2fffffff; uint32 GetRandom(uint32 size) { return (rand() * (RAND_MAX + 1) + rand()) % size; } void Initialize(std::vectoruint32& Data, uint32 size) { for (uint32 N = 0; N size; N++) Data[N] = GetRandom(size); } double Process(uint32 size, uint32 RandomSeed = 0) { std::vectoruint32 Data; double MBperSec; double duration; clock_t finish; Data.resize(size); Initialize(Data, size); clock_t start = clock(); uint32 num = 0; for (uint32 N = 0; N Max; N++) num = Data[num]; finish = clock(); duration = (double)(finish - start) / CLOCKS_PER_SEC; MBperSec = (double)(Max * 4) / (duration * 1024 * 1024); printf("%4d MegaBytes %7.2f Seconds %7.2f Megbytes per Second\n", (size*4) / 1048576, duration, MBperSec); return MBperSec; } int main() { uint32 Seed = (unsigned)time( NULL ); Seed = 0x4bae27d4; srand(Seed); printf("Random Number Seed---%x\n", Seed); for (uint32 size = 13107200; size = 268435456; size += 13107200) { double AverageMBperSec = 0; for (int N = 1; N = 10; N++) AverageMBperSec += Process(size); printf("Average Megabytes per Second---%7.2f\n\n", AverageMBperSec / 10.0 ); } return 0; } |
#2
|
|||
|
|||
Why does this memory intensive C++ program get poor memory accessspeed?
On 03/28/10 07:09 AM, Peter Olcott wrote:
MemTest86 and it showed: Intel Core-i5 750 2.67 Ghz (quad core) 32K L1 88,893 MB/Sec 256K L2 37,560 MB/Sec 8 MB L3 26,145 MB/Sec 8.0 GB RAM 11,852 MB/Sec The resulting memory access speed is substantially slower than worst case cache hit ratio should provide. From a C++ perspective, bad style? Seriously, this isn't a C++ question. snip double Process(uint32 size, uint32 RandomSeed = 0) { std::vectoruint32 Data; double MBperSec; double duration; clock_t finish; Data.resize(size); Initialize(Data, size); clock_t start = clock(); uint32 num = 0; for (uint32 N = 0; N Max; N++) num = Data[num]; With one exception: I'd expect most optimisers to reduce this loop to a op-op. -- Ian Collins |
#3
|
|||
|
|||
Why does this memory intensive C++ program get poor memory access speed?
"Ian Collins" wrote...
On 03/28/10 07:09 AM, Peter Olcott wrote: MemTest86 and it showed: Intel Core-i5 750 2.67 Ghz (quad core) 32K L1 88,893 MB/Sec 256K L2 37,560 MB/Sec 8 MB L3 26,145 MB/Sec 8.0 GB RAM 11,852 MB/Sec The resulting memory access speed is substantially slower than worst case cache hit ratio should provide. What makes you think that's a "worst case" number? I don't see any statement to that effect by memTest86 and, absent such, I'd consider it more of a "best case", or maybe "common usage pattern" speed, which your test is neither. From a C++ perspective, bad style? Seriously, this isn't a C++ question. Right, and sorry, can't help but keep it off-topic ;-) just a few comments below, mostly specific to 32b windows and vc++ v9. double Process(uint32 size, uint32 RandomSeed = 0) { std::vectoruint32 Data; double MBperSec; double duration; clock_t finish; Data.resize(size); Initialize(Data, size); clock_t start = clock(); uint32 num = 0; for (uint32 N = 0; N Max; N++) num = Data[num]; With one exception: I'd expect most optimisers to reduce this loop to a op-op. Reading that as a "no-op", and you are right, of course... Just a guess, however, since there was no command line or makefile given, but unless it had a "#define _SECURE_SCL 0" or equivalent somewhere else, the default compile would have used bounds checking for std::vector, which referenced 'num' and saved the loop from being optimized away. Also, the generated loop had an extra memory access since the compiler decided to save 'num' on the stack between iterations. Anyway, running similar mockup code (with the array holding pointers, rather than offsets, and 'num = Data[num];' replaced by 'pdata = (uint32 *)*pdata;') the resulting assembler code had just the intended single read of memory in a tight loop.With random addresses, as posted, it registered around 150MB/sec on my test machine. With sequential reads, instead, (i.e. replacing Initialize with 'Data[N] = (uint32)&Data[(N + 1) % size];') it went up to 2.5GB/sec, more than 15 times faster. My numbers above were on a 5+ year old machine with lesser specs than the OP's. For comparison, the PC Wizard memory benchmark listed the "memory bandwidth" at around 4.2GB/sec. Their page at http://www.cpuid.com/pcwizard.php explicitly says... || MEMORY and CACHE: These benchmarks measure || the maximum achiveable memory bandwidth. ....so I did not find the test numbers surprising. After all, hopping at random around memory could hardly ever be expected to achieve anything near the "maximum bandwidth". It's just another example of why "locality of reference" matters with real life caching schemes. Liviu |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Out Of Memory error when trying to start a program or while program is running" | Dharmarajan.K | General Hardware | 0 | June 11th 04 10:42 PM |
P2B poor memory performance | Erwin Dokter | Asus Motherboards | 17 | April 17th 04 11:47 AM |
program to change clock speed of gpu/memory instntly | Sebastian A. Potthoff | Nvidia Videocards | 2 | August 7th 03 01:00 PM |
P4S8X poor memory bandwidth | Xaero | Asus Motherboards | 0 | August 1st 03 04:19 AM |