If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
fastest floating point operation as possible
Hi,
I have a machine with a pentium4 2.52Ghz processor with 1Gig of Rambus memory. I think the bus speed is 500Mhz (or thereabouts)? The machine is about 1.5 years old. The question I have is this: Most of my computer work involves simulations that bring the processor to its knees (doing floating point math). I am wondering if by going to the newest Intel chipset (pentium4 extreme with 3.2HGz clock and 800Mhz bus) whether I'll get a significant increase in speed beyond the sheer clock speed increase? That is, will the speed improvement only be 3.2Ghz/2.5GHz = 1.28 (28 % speed increase), or, is the architecture and bus speed going to give me much more performance than I currently have?? Thanks, Paul |
#2
|
|||
|
|||
In article ,
Paul Spitalny wrote: Hi, I have a machine with a pentium4 2.52Ghz processor with 1Gig of Rambus memory. I think the bus speed is 500Mhz (or thereabouts)? The machine is about 1.5 years old. The question I have is this: Most of my computer work involves simulations that bring the processor to its knees (doing floating point math). I am wondering if by going to the newest Intel chipset (pentium4 extreme with 3.2HGz clock and 800Mhz bus) whether I'll get a significant increase in speed beyond the sheer clock speed increase? That is, will the speed improvement only be 3.2Ghz/2.5GHz = 1.28 (28 % speed increase), or, is the architecture and bus speed going to give me much more performance than I currently have?? The specfp site seems to indicate the fastest machine, under some cases, is an AMD Opteron on a ASUS SK8N Motherboard. There's lots of info there. http://www.specbench.org/cpu2000/results/res2003q4/ Thanks, Paul -- Al Dykes ----------- |
#3
|
|||
|
|||
Paul Spitalny wrote: Hi, I have a machine with a pentium4 2.52Ghz processor with 1Gig of Rambus memory. I think the bus speed is 500Mhz (or thereabouts)? The machine is about 1.5 years old. The question I have is this: Most of my computer work involves simulations that bring the processor to its knees (doing floating point math). I am wondering if by going to the newest Intel chipset (pentium4 extreme with 3.2HGz clock and 800Mhz bus) whether I'll get a significant increase in speed beyond the sheer clock speed increase? That is, will the speed improvement only be 3.2Ghz/2.5GHz = 1.28 (28 % speed increase), or, is the architecture and bus speed going to give me much more performance than I currently have?? Thanks, Paul I seem to recall one of the AMD chips being especially good at floating point ops??? (while browsing the site recently) Lurker |
#4
|
|||
|
|||
On Thu, 29 Jan 2004 09:39:52 -0800, Paul Spitalny
wrote: Hi, I have a machine with a pentium4 2.52Ghz processor with 1Gig of Rambus memory. I think the bus speed is 500Mhz (or thereabouts)? The machine is about 1.5 years old. The question I have is this: Most of my computer work involves simulations that bring the processor to its knees (doing floating point math). I am wondering if by going to the newest Intel chipset (pentium4 extreme with 3.2HGz clock and 800Mhz bus) whether I'll get a significant increase in speed beyond the sheer clock speed increase? That is, will the speed improvement only be 3.2Ghz/2.5GHz = 1.28 (28 % speed increase), or, is the architecture and bus speed going to give me much more performance than I currently have?? CPUs don't scale according to clockrate, so no, everything else being the same, you'll get less than 28% increase. OTH, 800MHz fsb seem to generally cheer up the P4 quite a bit, so that could be in favor. It depends on the code though. Unfortunatly fp-ish benchmarks like 3D rendition, show zero improvement from 800MHz FSB. -Sorry. But, going deeper on this fp math might be a good idea. What _kind_ of fp math is it? Is it compiled to old fashioned '387 operations? Or is it autovectorized/optimized for SSE2? Is it double precision or single? Does it contain division, how much? Is there a lot of conditional instructions, branches? The P4 is pretty much a wimp on everything fp, except vectorized, straightforward mul, add, sub, using SSE2. A lot of time consuming work like matrix/tensor multiplications, transformations etc. does fall into that category though. So it might be a good idea, to see to it, that the code is compiled with Intels auto vectorizing optimizing compiler. If everything is optimal, you can get 3-3.5 times the performance on single precision fp (this is the kind of performance you see in P4 video encoding). On the other hand, branches, division, ruin it all. Just reading some single benchmark, is not going to be of any use to you. P4/Xeon benchmarks tend to be 100% SSE2, outrageously optimized and highly flattering for Intel. Real applications might be a different thing (scalar '387?). Unless you know what the code looks like, and how it is compiled, you cannot be sure to get the performance common benchmarks imply. If you write the software yourself, Intels compiler is a free download. Try it if you haven't already. The other suggestion is to try AMD instead. All AMD families, AthlonXP, Athlon64, Opteron, are brutish on scalar '387 math. They also handle branches, division, underflow/overflow better than the P4. Try borrowing an AthlonXP and see if the code suits it better. Ancra |
#5
|
|||
|
|||
Paul Spitalny writes:
Most of my computer work involves simulations that bring the processor to its knees (doing floating point math). Depending on whether you have access to the simulation engine code or not and whether you want to put in the effort or not the floating point digital signal processing chips now routinely provide over 3 gigaflops/second if you can get your code to fit inside the constantly increasing memory that is inside these parts. Both Texas Instruments and Analog Devices produce such parts and boards and development tools, and there are an assortment of companies mounting these on boards and providing development tools. Some of these even provide multiple processors per board, if your job is suited to that and you decide you actually need to go fast. |
#6
|
|||
|
|||
|
#7
|
|||
|
|||
Don Taylor wrote:
Paul Spitalny writes: Most of my computer work involves simulations that bring the processor to its knees (doing floating point math). Depending on whether you have access to the simulation engine code or not and whether you want to put in the effort or not the floating point digital signal processing chips now routinely provide over 3 gigaflops/second if you can get your code to fit inside the constantly increasing memory that is inside these parts. Both Texas Instruments and Analog Devices produce such parts and boards and development tools, and there are an assortment of companies mounting these on boards and providing development tools. Some of these even provide multiple processors per board, if your job is suited to that and you decide you actually need to go fast. Hi Don, Unfortunately, I don't have access to the source code. But, your idea is an interesting one....I am not sure I have the expertise to pull it off though! Thanks! Paul |
#8
|
|||
|
|||
Paul Spitalny writes:
Don Taylor wrote: Paul Spitalny writes: Most of my computer work involves simulations that bring the processor to its knees (doing floating point math). Depending on whether you have access to the simulation engine code or not and whether you want to put in the effort or not the floating point digital signal processing chips now routinely provide over 3 gigaflops/second if you can get your code to fit inside the constantly increasing memory that is inside these parts. Unfortunately, I don't have access to the source code. But, your idea is an interesting one....I am not sure I have the expertise to pull it off though! Reading your other posts, I might suggest asking your Spice vendor to tell you how much improvement you are going to get if you switch to a different processor. They certainly should know the answer to this, even if it takes your handing over your spice model to them to run. And if there is money in the budget you might compare the speed of the Spice packages available from a few vendors, again perhaps needing to hand over a copy of your typical model. |
#9
|
|||
|
|||
On Mon, 02 Feb 2004 12:13:51 -0800, Paul Spitalny
wrote: Hi Ancra, Well, I asked the software vendor (the software that I run to do simulation work with) about the mathematics in their program and this is what they said: Q: Does the code use mostly floating point math operations. If so, then: What _kind_ of floating point math is it? Is it compiled to old fashioned '387 operations? A: Yes. We compile generic version which must be supported by most number existing x86 processors as possible. As result we don't optimize Sthe code for particular x86 instruction set extension. Q: Or is it autovectorized/optimized for SSE2? A: No. Q: Is it double precision or single? A: Double as in original Berkeley Spice 3. Q: Does it contain division, how much? A: It's hard to tell. It's depend on what you want to simulate with SmartSpice. Q: Is there a lot of conditional instructions, branches? A: Sure. Q: Is the code (for windows) compiled with Intels auto vectorizing optimizing compiler? A: No. That being the case I wonder how to proceed. I can halp but think that the newest "extreme" pentium (now up to 3.4Ghz clock and 800MHz FSB) has got to be significantly faster than my older 2.5GHz pentium 4 (with RAMBUS memory). The "extreme" processor has 1Meg of L2 cache and you would think that'd help too. Or, do you feel like the AMD chips might be better since they are known for better performance at floating point? You see, the guys I get my software from, as they mention above, don't compile for specific processors or to optimixe performance. By the way, thank you for your response to my posting!! With those answers, it might be worthwhile to try AMD! Is it just me, or wouldn't it be a simple matter for you to check? Just borrow an AthlonXP machine and try the software on it. You should get some definite indication. The P4 has weaknesses, and some of those are basically everything you listed... Whatever article you've read, I can almost guarantee you every single synthetic benchmark is virtually 100% SSE2. While I can't predict the outcome with any certainty, I think you should definitely try to see what an AMD cpu makes of it. So I wouldn't look to either P4, Extreme Edition or Prescott for a solution. Because my perception is that the Intel architecture might not cut it. Sure, it would be an improvement, but as you might gather I doubt it will be "fastest". Also, if a P4EE is going to run continuously on 100% for hours, you need some good cooling solution, or it will just throttle. It's also going to cost an awful lot of money, for very little more performance than a vanilla P4C. With that kind of money for a PC, I'd start to go crazy and drool over plans for a Prometheus case, DDR533 and CPU frozen to -40degC and overclocked 30-40%. If AMD checks out, a machine that would look attractive to me are the coming socket 939 Athlon64_3400+ or 3700+. Dual channel Athlon64s looks like the perfect science/math PC-workstation to me. Spice sounds vaguely familiar. Isn't that analog electric circuitry? ancra |
#10
|
|||
|
|||
Most of my computer work involves simulations that bring the processor
to its knees (doing floating point math). I am wondering if by going to the newest Intel chipset (pentium4 extreme with 3.2HGz clock and 800Mhz bus) whether I'll get a significant increase in speed beyond the sheer clock speed increase? I've been watching the answers to this question, because I am in a somewhat similar situation. I have some VERY floating-point intensive analysis programs that typically run for several hours on an Athlon XP2100+. These programs operate upon huge arrays of data, so I suspect that the choke point in my situation is memory bandwidth -- I am using an old ABIT KT7A that only supports SDRAM at 133 MHz. As for standard 387 vs. SSE vs. SSE2 optimizations, I wrote the programs myself, so I can compile them to use whatever features are available on the particular processor that I use (Visual Studio .NET Pro). Probably 85% of the fp operations are evenly split among mult, add/sub, and trig functions (sin, cos), while the other 15% are div or division-like (arctan, sqrt). About 10% of the instruction streams involve branches. Right now, everything is double-precision, but it might be possible to use single-precision; I haven't tried it. So my variation on the original poster's question is: What high-speed system would best solve my memory bandwidth problems, in addition to my processing power problems? How does a DDR 400 Athlon compare with an 800 MHz fsb P4? If I make the jump to an Athlon 64 or P4 Extreme, will their 64-bit data buses offer as much an advantage as it would seem? I'm fairly computer-savvy, but frankly I've lost track of how to compare memory speeds on the Athlon with those on the Pentium 4. If somebody could point me to a tutorial, it would be much appreciated. Thanks, GB |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Balance Point, AGP Overclocking | David B. | Overclocking | 6 | April 19th 05 01:42 PM |
Passmark Performance Test, Division, Floating Point Division, 2DShapes | @(none) | General | 0 | August 19th 04 11:57 PM |
Floating Point Operations & AMD | Keith B. Silverman | Overclocking AMD Processors | 1 | August 5th 04 02:07 PM |
my new mobo o/c's great | rockerrock | Overclocking AMD Processors | 9 | June 30th 04 08:17 PM |
AMD64 vs. a floating point operation (FLOP) | Only NoSpammers | AMD x86-64 Processors | 8 | June 27th 04 03:55 PM |