View Single Post
  #6  
Old February 2nd 04, 09:13 PM
Paul Spitalny
external usenet poster
 
Posts: n/a
Default

in wrote:
On Thu, 29 Jan 2004 09:39:52 -0800, Paul Spitalny
wrote:


Hi,
I have a machine with a pentium4 2.52Ghz processor with 1Gig of Rambus
memory. I think the bus speed is 500Mhz (or thereabouts)? The machine is
about 1.5 years old.


The question I have is this:

Most of my computer work involves simulations that bring the processor
to its knees (doing floating point math). I am wondering if by going to
the newest Intel chipset (pentium4 extreme with 3.2HGz clock and 800Mhz
bus) whether I'll get a significant increase in speed beyond the sheer
clock speed increase? That is, will the speed improvement only be
3.2Ghz/2.5GHz = 1.28 (28 % speed increase), or, is the architecture and
bus speed going to give me much more performance than I currently have??



CPUs don't scale according to clockrate, so no, everything else being
the same, you'll get less than 28% increase.

OTH, 800MHz fsb seem to generally cheer up the P4 quite a bit, so that
could be in favor. It depends on the code though. Unfortunatly fp-ish
benchmarks like 3D rendition, show zero improvement from 800MHz FSB.
-Sorry.

But, going deeper on this fp math might be a good idea.
What _kind_ of fp math is it?
Is it compiled to old fashioned '387 operations?
Or is it autovectorized/optimized for SSE2?
Is it double precision or single?
Does it contain division, how much?
Is there a lot of conditional instructions, branches?

The P4 is pretty much a wimp on everything fp, except vectorized,
straightforward mul, add, sub, using SSE2.
A lot of time consuming work like matrix/tensor multiplications,
transformations etc. does fall into that category though. So it might
be a good idea, to see to it, that the code is compiled with Intels
auto vectorizing optimizing compiler.
If everything is optimal, you can get 3-3.5 times the performance on
single precision fp (this is the kind of performance you see in P4
video encoding). On the other hand, branches, division, ruin it all.

Just reading some single benchmark, is not going to be of any use to
you. P4/Xeon benchmarks tend to be 100% SSE2, outrageously optimized
and highly flattering for Intel. Real applications might be a
different thing (scalar '387?). Unless you know what the code looks
like, and how it is compiled, you cannot be sure to get the
performance common benchmarks imply.
If you write the software yourself, Intels compiler is a free
download. Try it if you haven't already.

The other suggestion is to try AMD instead. All AMD families,
AthlonXP, Athlon64, Opteron, are brutish on scalar '387 math. They
also handle branches, division, underflow/overflow better than the P4.
Try borrowing an AthlonXP and see if the code suits it better.

Ancra

Hi Ancra,
Well, I asked the software vendor (the software that I run to do
simulation work with) about the mathematics in their program and this is
what they said:

Q: Does the code use mostly floating point math operations. If so, then:
What _kind_ of floating point math is it?
Is it compiled to old fashioned '387 operations?

A: Yes. We compile generic version which must be supported by most
number existing x86 processors as possible. As result we don't optimize
Sthe code for particular x86 instruction set extension.

Q: Or is it autovectorized/optimized for SSE2?
A: No.

Q: Is it double precision or single?
A: Double as in original Berkeley Spice 3.

Q: Does it contain division, how much?
A: It's hard to tell. It's depend on what you want to simulate with
SmartSpice.

Q: Is there a lot of conditional instructions, branches?
A: Sure.

Q: Is the code (for windows) compiled with Intels auto vectorizing
optimizing
compiler?
A: No.

That being the case I wonder how to proceed. I can halp but think that
the newest "extreme" pentium (now up to 3.4Ghz clock and 800MHz FSB) has
got to be significantly faster than my older 2.5GHz pentium 4 (with
RAMBUS memory). The "extreme" processor has 1Meg of L2 cache and you
would think that'd help too.

Or, do you feel like the AMD chips might be better since they are known
for better performance at floating point? You see, the guys I get my
software from, as they mention above, don't compile for specific
processors or to optimixe performance.

By the way, thank you for your response to my posting!!

Paul;