PDA

View Full Version : X2 vs X4


Dave[_33_]
September 18th 08, 07:48 PM
If an application does NOT support multiple CPU cores, will it run
slower on a Phenom 2.4 GHz CPU than it would on an X2 2.4 GHz CPU?

I currently have a 2.2 GHz X2 and I want to upgrade it. My motherboard
supports the Phemon X4 but from what I'm reading, software that doesn't
support multiple cores may run slower if I do.

Any advice?

Zootal
September 18th 08, 11:54 PM
Multithreading cpus can make some software slow down. Mulit-core cpus will
not unless the speed of a core itself is slower. Then, the slowdown is
caused by the slower core, not by the fact that it's a multi core cpu.

OTOH....let's stop and think a bit. If I have a multi-core cpu with
non-shared caches, then I now have cache coherency issues to deal with if
the cpu scheduler for some reason moves my task to a different core. Any
cache lines I try to access that aren't in the current cache will have to be
copied from the cache it resides in, or from memory if it's no longer in any
cache. So maybe the answer to the question of performance is "it depends"?

Where did you read that software that doesn't support multiple cores may run
slower?


"Dave" > wrote in message
s.com...
> If an application does NOT support multiple CPU cores, will it run
> slower on a Phenom 2.4 GHz CPU than it would on an X2 2.4 GHz CPU?
>
> I currently have a 2.2 GHz X2 and I want to upgrade it. My motherboard
> supports the Phemon X4 but from what I'm reading, software that doesn't
> support multiple cores may run slower if I do.
>
> Any advice?

Ed Light
September 19th 08, 01:14 AM
One core of a Phenom will run a little faster than one core of an X2,
given the same clock speed.
--
Ed Light

Better World News TV Channel:
http://realnews.com

Bring the Troops Home:
http://bringthemhomenow.org
http://antiwar.com

Iraq Veterans Against the War:
http://ivaw.org
http://couragetoresist.org

Send spam to the FTC at

Thanks, robots.

Dave[_33_]
September 19th 08, 01:16 PM
In article >,
says...
> Where did you read that software that doesn't support multiple cores may run
> slower?

A thread on CraigsList a few days ago. Several people were discussing
performance issues and stated that software that does not support
multiple cores runs slower on a multi-core CPU than on a non-multi-core
CPU. Nobody disagreed with that statement in the thread.

Dave[_33_]
September 19th 08, 01:17 PM
In article >,
says...
> One core of a Phenom will run a little faster than one core of an X2,
> given the same clock speed.

Thanks you SOO much. That is great to know.

--Dave

Dave Feustel
September 19th 08, 02:24 PM
The effective clock speed of a single core in a multiple core chip is
the chip clock speed divided by the number of cores, so an application
running on a single core in a multicore chip will run slower than the
same app running on a single core cpu. BUT in a multicore cpu the
app will experience fewer task switches for interrupts, etc because
there are other cores to run the interrupts, etc on. Since each
core has its own set of registers, less time is spent saving and
restoring register data, of which there is a lot on X64 cores.
So whether a single-threaded app runs faster or slower on a
multicore chip is a little hard to predict apriori.

Scott Lurndal
September 19th 08, 06:13 PM
Dave > writes:
>In article >,
says...
>> Where did you read that software that doesn't support multiple cores may run
>> slower?
>
>A thread on CraigsList a few days ago. Several people were discussing
>performance issues and stated that software that does not support
>multiple cores runs slower on a multi-core CPU than on a non-multi-core
>CPU. Nobody disagreed with that statement in the thread.

I'd not consider craigslist to be a top technical forum.

Given identical clock speeds and voltages, a single-threaded application
will perform equally on a single core or a multi-core box. The multi-core
box will, of course, be able to run multiple copies of the single-threaded
application much faster than the single core box.

When one considers that a typical operating system often has dozens of
processes running other than the "foreground application", an application
on a multicore system _may_ perform better, because the operating system processes
can run on the other core freeing capacity for the application. Now, this
really only holds if the application is using 80% or more of the processor (e.g.
mp3 encoders, video transcoders, numerical analysis applications, etc). Most
graphical applications seldom use significant amounts of processing power.

scott

Scott Lurndal
September 19th 08, 06:16 PM
Dave Feustel > writes:
>The effective clock speed of a single core in a multiple core chip is
>the chip clock speed divided by the number of cores,

This is incorrect.

All cores run at the same clock speed, which is the 'chip clock speed'. Of course
the power-management capabilities of the processor allow the operating system to
individually ramp-down the voltages and frequencies of each core to allow them to
run slower (when idle), but the norm is for all cores to run at the same clock
speed which is equal to (not a fraction of) the core clock speed.

So called SMT (aka Hyperthreading) is different, in that the secondary thread is
leveraging otherwise idle execution and load/store resources on a single core.

scott

Zootal
September 19th 08, 07:39 PM
> So called SMT (aka Hyperthreading) is different, in that the secondary
> thread is
> leveraging otherwise idle execution and load/store resources on a single
> core.
>
> scott

I don't get this - what can hyperthreading do that a good cpu scheduler
can't do? If I have two virtual cores, I have to have two schedulers running
(one for each virtual cpu), each with their own set of queues and each with
50% cpu time. Is that more efficient then one single scheduler that has 100%
cpu time?

Scott Lurndal
September 19th 08, 11:41 PM
"Zootal" > writes:
>> So called SMT (aka Hyperthreading) is different, in that the secondary
>> thread is
>> leveraging otherwise idle execution and load/store resources on a single
>> core.
>>
>> scott
>
>I don't get this - what can hyperthreading do that a good cpu scheduler
>can't do?

Leverage otherwise idle resources in the core. A core typically has
two or more integer ALU's and one or more floating point ALU's. These
allow superscaler behaviour (i.e. multiple instructions can be in flight
at the same time (multiple issue)). However, for many instruction streams, not all
of the ALU's and FPU's are used, so a second 'logical' processor (the
hyperthread) can be made available to the operating system to take advantage
of those idle resources.

Note that even with HT/SMT, the operating system sees them as two
distinct cores, even though they aren't really stand-alone cores.

A four physical core processor with SMT will appear to the
operating system as 8 logical cores.

>
> If I have two virtual cores, I have to have two schedulers running
>(one for each virtual cpu), each with their own set of queues and each with
>50% cpu time. Is that more efficient then one single scheduler that has 100%
>cpu time?

There is only one scheduler in a typical operating system. It schedules
across all logical cores and is typically NUMA and SMT aware in order to
make optimal scheduling decisions. NUMA awareness means scheduling
user threads/tasks on a CPU close to memory. SMT aware schedulers understand
that resources are shared and attempt to schedule related threads (i.e.
threads from the same process/job/task) on the secondary threads.

scott

DevilsPGD[_2_]
September 20th 08, 12:25 AM
In message > "Zootal"
> wrote:

>I don't get this - what can hyperthreading do that a good cpu scheduler
>can't do? If I have two virtual cores, I have to have two schedulers running
>(one for each virtual cpu), each with their own set of queues and each with
>50% cpu time. Is that more efficient then one single scheduler that has 100%
>cpu time?

The problem that Hyperthreading was designed to solve is that the P4
series has an extremely long pipeline.

In other words, it takes many cycles to get instructions to the CPU, and
for the CPU to send instructions to pull data to/from memory or other
hardware components.

Hyperthreading was designed to help/encourage existing OSes to schedule
multiple threads/workloads so that the CPU can run them, from the OS'
point of view, concurrently, rather then waiting for one workload to
finish before sending another.

Zootal
September 20th 08, 01:40 AM
> In other words, it takes many cycles to get instructions to the CPU, and
> for the CPU to send instructions to pull data to/from memory or other
> hardware components.

That isn't exactly correct - the long pipeline *is* the cpu, it just takes a
lot of cycles to make it through the pipeline. In order to get the
advertised clock speed, they had to make the pipeline longer. The P4
Prescott 3.8GHz has a 31 cycle pipeline.

Dave Feustel[_2_]
September 21st 08, 02:21 PM
Bill > wrote:
> In article >,
> says...
>> The effective clock speed of a single core in a multiple core chip is
>> the chip clock speed divided by the number of cores,
>
> Have you got a cite for that?
>
> <snip>
>
> Bill

The person who told me this is Miles R***, a person who sells computers
for a living. If the cores ran at the chip's nominal clock speed, a
four-core chip would perform 4 times faster than a single core chip at
the same clock speed, which they don't. And the power consumption would
be much higher. So I think Miles is correct.

Miles Bader[_2_]
September 21st 08, 02:35 PM
Dave Feustel > writes:
> The person who told me this is Miles R***, a person who sells computers
> for a living. If the cores ran at the chip's nominal clock speed, a
> four-core chip would perform 4 times faster than a single core chip at
> the same clock speed, which they don't. And the power consumption would
> be much higher. So I think Miles is correct.

No, this is not correct.

Either you misinterpreted "Miles R***", or he is quite ignorant about
his own product (or both).

-Miles

--
Genealogy, n. An account of one's descent from an ancestor who did not
particularly care to trace his own.

Rodney Pont[_3_]
September 21st 08, 03:13 PM
On Sun, 21 Sep 2008 08:21:19 -0500, Dave Feustel wrote:

>The person who told me this is Miles R***, a person who sells computers
>for a living. If the cores ran at the chip's nominal clock speed, a
>four-core chip would perform 4 times faster than a single core chip at
>the same clock speed, which they don't. And the power consumption would
>be much higher. So I think Miles is correct.

The four core chip can only run an application on all four cores if
it's threaded and at least 4 threads have work that can be run
simultaneously. Even in threaded applications this can't always happen
unless the threads are doing something that doesn't depend on others,
say converting a video file where each core can be given a section of
the file to convert.

I can see how he came to the conclusion though if he ran a single
threaded application and it ran four times slower than expected, since
it ran on only one core. Get him to run four of them at the same time
and they should complete in nearly the same time as one providing he
isn't running anything else at that time.

As for power consumption my dual core chip uses 45 watts and the quad
core version uses 95 watts. Taking into account the extra circuitry for
the 4 cores it's about right.

--
Regards - Rodney Pont
The from address exists but is mostly dumped,
please send any emails to the address below
e-mail ngpsm4 (at) infohitsystems (dot) ltd (dot) uk

Jim Beard[_2_]
September 21st 08, 05:45 PM
Rodney Pont wrote:
> On Sun, 21 Sep 2008 08:21:19 -0500, Dave Feustel wrote:
>
>> The person who told me this is Miles R***, a person who sells computers
>> for a living. If the cores ran at the chip's nominal clock speed, a
>> four-core chip would perform 4 times faster than a single core chip at
>> the same clock speed, which they don't. And the power consumption would
>> be much higher. So I think Miles is correct.
>
> The four core chip can only run an application on all four cores if
> it's threaded and at least 4 threads have work that can be run
> simultaneously. Even in threaded applications this can't always happen
> unless the threads are doing something that doesn't depend on others,
> say converting a video file where each core can be given a section of
> the file to convert.
>
> I can see how he came to the conclusion though if he ran a single
> threaded application and it ran four times slower than expected, since
> it ran on only one core. Get him to run four of them at the same time
> and they should complete in nearly the same time as one providing he
> isn't running anything else at that time.
>
> As for power consumption my dual core chip uses 45 watts and the quad
> core version uses 95 watts. Taking into account the extra circuitry for
> the 4 cores it's about right.

One must also bear in mind that a dual-core or quad-core CPU has to
devote some processing time to deciding what to run on which core,
when. This more intricate scheduling task routinely results in a
process running on only one core running more slowly (all things
included) than a process running on a single-core CPU that is lightly
loaded.

Whatever the CPU speed is in GHz or MHz, all cores will work at that
speed unless power management software readjusts the speed. That
does not mean that all that speed is usable, though. You still have
delays due to I/O requirements, scheduling delays, wait states, and a
host of other bottlenecks, real and potential. My home computer is
an AMD 64-bit 5000+ dual-core, and CPU usage typically is in the 1 to
3 percent range when I am not compiling or doing some other
CPU-intensive task. This does not mean that all tasks complete
instaneously nor that response time is zero (though it is very nice,
I will admit).

Specifically with respect to X2 vs X4, the kernel scheduler will do a
fairly good job of using two CPUs, but rarely does well with more
than two unless the applications are specifically tailored for
multi-CPU usage. Thus, the percentage gain in performance from
shifting from single to dual-core cpu is likely to be significantly
greater than the percentage gain from shifting from dual-core to
quad-core, unless you have software tailored for the additional cores.

The big question, of course, is, are your applications CPU-intensive
enough to make use of the available capacity, regardless of number of
cores? If the computer is not heavily loaded at least part of the
time, the answer is likely to be no.

Cheers!

jim b.




--
UNIX is not user unfriendly; it merely
expects users to be computer-friendly.

Scott Lurndal
September 21st 08, 08:56 PM
Jim Beard > writes:

>
>Specifically with respect to X2 vs X4, the kernel scheduler will do a
>fairly good job of using two CPUs, but rarely does well with more
>than two unless the applications are specifically tailored for

maybe with respect to windows, but linux schedulers are O(1) over
large numbers of cores.

scheduler overhead is pretty much non-existent.

scott

Richard P
September 22nd 08, 12:28 AM
Bill wrote:
> In article >,
> says...
>> Bill > wrote:
>>> In article >,
>>> says...
>>>> The effective clock speed of a single core in a multiple core chip is
>>>> the chip clock speed divided by the number of cores,
>>> Have you got a cite for that?
>>>
>>> <snip>
>>>
>>> Bill
>> The person who told me this is Miles R***, a person who sells computers
>> for a living. If the cores ran at the chip's nominal clock speed, a
>> four-core chip would perform 4 times faster than a single core chip at
>> the same clock speed, which they don't. And the power consumption would
>> be much higher. So I think Miles is correct.
>>
>
> You're entitled to your opinion, but as far as "The effective clock
> speed of a single core in a multiple core chip is the chip clock
> speed divided by the number of cores" is concerned Miles R*** is full
> of *****, and you can tell him I said so.
>
> You need to get to Intel's or AMD's website and do some reading.
>
> Bill
I have a X4 and each core at default is 2.5ghz.

DevilsPGD[_2_]
September 23rd 08, 07:14 AM
In message > Dave Feustel
> wrote:

>The person who told me this is Miles R***, a person who sells computers
>for a living.

"Never trust someone trying to sell you something" comes to mind.

>If the cores ran at the chip's nominal clock speed, a
>four-core chip would perform 4 times faster than a single core chip at
>the same clock speed, which they don't.

Depending on your task, a four-core CPU can perform reasonably close to
four times the clock speed of a single core CPU. Unfortunately, few
tasks parrallelize that well, and even less software takes full
advantage of modern CPUs.

That being said, aside from some shady marketing in the past advertising
dual CPU systems as double the clock speed of one CPU rather then
advertising the actual configuration, each core runs at the full clock
speed advertised.

Dave Feustel[_2_]
September 23rd 08, 01:44 PM
DevilsPGD > wrote:
> In message > Dave Feustel
> > wrote:
>
>>The person who told me this is Miles R***, a person who sells computers
>>for a living.
>
> "Never trust someone trying to sell you something" comes to mind.
>
>>If the cores ran at the chip's nominal clock speed, a
>>four-core chip would perform 4 times faster than a single core chip at
>>the same clock speed, which they don't.
>
> Depending on your task, a four-core CPU can perform reasonably close to
> four times the clock speed of a single core CPU. Unfortunately, few
> tasks parrallelize that well, and even less software takes full
> advantage of modern CPUs.
>
> That being said, aside from some shady marketing in the past advertising
> dual CPU systems as double the clock speed of one CPU rather then
> advertising the actual configuration, each core runs at the full clock
>> speed advertised.

So the 4 core chip cpu should run 4 independent identical tasks (compute
pi to 1 million digits) in essentially the same time that a single core
runs one instance of that task?

Richard P
September 23rd 08, 06:07 PM
Dave Feustel wrote:
> DevilsPGD > wrote:
>> In message > Dave Feustel
>> > wrote:
>>
>>> The person who told me this is Miles R***, a person who sells computers
>>> for a living.
>> "Never trust someone trying to sell you something" comes to mind.
>>
>>> If the cores ran at the chip's nominal clock speed, a
>>> four-core chip would perform 4 times faster than a single core chip at
>>> the same clock speed, which they don't.
>> Depending on your task, a four-core CPU can perform reasonably close to
>> four times the clock speed of a single core CPU. Unfortunately, few
>> tasks parrallelize that well, and even less software takes full
>> advantage of modern CPUs.
>>
>> That being said, aside from some shady marketing in the past advertising
>> dual CPU systems as double the clock speed of one CPU rather then
>> advertising the actual configuration, each core runs at the full clock
>>> speed advertised.
>
> So the 4 core chip cpu should run 4 independent identical tasks (compute
> pi to 1 million digits) in essentially the same time that a single core
> runs one instance of that task?

Yes

DevilsPGD[_2_]
September 23rd 08, 08:39 PM
In message > Dave Feustel
> wrote:

>DevilsPGD > wrote:
>> Depending on your task, a four-core CPU can perform reasonably close to
>> four times the clock speed of a single core CPU. Unfortunately, few
>> tasks parrallelize that well, and even less software takes full
>> advantage of modern CPUs.
>>
>> That being said, aside from some shady marketing in the past advertising
>> dual CPU systems as double the clock speed of one CPU rather then
>> advertising the actual configuration, each core runs at the full clock
>>> speed advertised.
>
>So the 4 core chip cpu should run 4 independent identical tasks (compute
>pi to 1 million digits) in essentially the same time that a single core
>runs one instance of that task?

More or less, yes. However, in the real world, not all tasks will scale
quite this well as many tasks require not only CPU resources, but also
other resources which may become starved before you load all four cores.

For something that can be done entirely on-chip, you'll get four times
the performance using all four cores of a quad 2.4GHz CPU then a single
core version of the same 2.4GHz CPU.

Zootal
September 24th 08, 10:06 PM
"Scott Lurndal" > wrote in message
...
> Jim Beard > writes:
>
>>
>>Specifically with respect to X2 vs X4, the kernel scheduler will do a
>>fairly good job of using two CPUs, but rarely does well with more
>>than two unless the applications are specifically tailored for
>
> maybe with respect to windows, but linux schedulers are O(1) over
> large numbers of cores.
>
> scheduler overhead is pretty much non-existent.
>
> scott

Are you sure about that? Each cpu has its own set of runqueues. If I have 4
cpus, I have 4 sets of runqueues to manage, and 4 sets of runqueues to
search. The runqueue itself can be searched for the next entry in O(1)
time - this is where the O(1) comes from, because the amount of time it
takes to find the next task in the queue is constant and not dependant by
the number of tasks in the queue.

I would think that that the default linux scheduler is O(n) over large
number of cores, where n = the number of cores.

Scott Lurndal
September 24th 08, 11:12 PM
"Zootal" > writes:
>
>"Scott Lurndal" > wrote in message
...
>> Jim Beard > writes:
>>
>>>
>>>Specifically with respect to X2 vs X4, the kernel scheduler will do a
>>>fairly good job of using two CPUs, but rarely does well with more
>>>than two unless the applications are specifically tailored for
>>
>> maybe with respect to windows, but linux schedulers are O(1) over
>> large numbers of cores.
>>
>> scheduler overhead is pretty much non-existent.
>>
>> scott
>
>Are you sure about that? Each cpu has its own set of runqueues. If I have 4
>cpus, I have 4 sets of runqueues to manage, and 4 sets of runqueues to
>search. The runqueue itself can be searched for the next entry in O(1)
>time - this is where the O(1) comes from, because the amount of time it
>takes to find the next task in the queue is constant and not dependant by
>the number of tasks in the queue.
>
>I would think that that the default linux scheduler is O(n) over large
>number of cores, where n = the number of cores.
>
>

If you have a runqueue per core, then you simply schedule the next
entry in the queue for each core. O(1). Remember that code is shared by all
processors, and scheduling happens in-context - there is not a
scheduler "thread" or "job" or "task" per se.

scott