Can a x86/x64 cpu/memory system be changed into a barrel processor ?

**Joel Koltner**

You know, Skybuck, if you don't like the way the x86 behaves, these days it's
entirely straightforward to sit down and whip up your own "dream CPU" in an
FPGA of your choice. :-) OK, it's not going to run at 2.8GHz, but you can
pretty readily get one to operate at 100MHz or so which, in some specialized
applications, can be faster than a much more highly-clocked general-purpose
CPU.

And you'd get to start cross-posting to comp.arch.fpga as well! Wouldn't that
be fun? Time to start reading up on VHDL or Verilog, perhaps? :-)

I scanned it a little bit, the code assummes insequence memory... pretty
lame, it has nothing to do with R in RAM.

As has been mentioned, these days RAM is so much slower than the CPU that, for
truly random access, you can sometimes kill a hundred CPU clock cycles waiting
for each result. Painful indeed...

---Joel

**Dave Platt**

In article e.nl,
Skybuck Flying wrote:

Already tried prefetching for RAM it's pretty useless...

Especially for random access, especially for dependancies, especially when
the software doesn't yet know what to ask next.

However the problem may be parallized.

However the CPU stills blocks.

Therefore making it parallel doesn't help.

Only threading helps, but with two or four cores that doesn't impress.

And, depending on the threading model of the processor, it may not
help at all. In many CPUs, multiple threads running in the same core
are sharing the same local cache and memory bus - run two threads
which "fight" over the bus doing the same sorts of inefficient
accesses, and the throughput of each thread drops by half (roughly).

I might read the article later on but I fear I will be wasting my time.

I scanned it a little bit, the code assummes insequence memory... pretty
lame, it has nothing to do with R in RAM.

"Random access" does *not* imply "equal, constant-time access". I
don't think it has ever done so, at least not in the computing
industry. Certain forms of sequential or nearly-sequential access
have always been faster, on most "random access" devices.

All that "random" means, in this context, is that you are *allowed* to
access memory locations in an arbitrary sequence - you are not being
*forced* into a purely sequential mode of access.

Also my memory seeks are very short, 4 to 6 bytes, therefore fetching more
is pretty useless.

You're facing a characteristic which is inherent in the way that DRAM
works. Your "barrel processor" approach really won't help with this.

The characteristic is this: DRAM is organized, internally, into
blocks. It takes the DRAM chips a significant amount of time to
prepare to transfer data in or out over the memory bus, and it takes a
significant amount of time to transfer each byte (or word, or
whatever) over the bus to/from the CPU. Every time you want to access
a different area of the DRAM, you have to "pay the price" for the time
needed to access that part of the chip and transfer the data.

This is, in a sense, no different that what happens when you access a
hard drive (which is also "random access"). Time is required to move
the head/arm, and wait for the platter to rotate.

In the case of DRAM, the "motion" is that of electrical charge, rather
than a physical arm... but it's motion nevertheless (it takes work and
expends energy) and it takes time.

In *any* CPU architecture (single, multi-threaded, multi-core, barrel,
etc.) that depends on DRAM, you'll run into memory-bus stalls if you
try accessing memory in patterns or ways which exceed the capacity of
the CPU's own local (static) registers and cache.

Your barrel architecture, with a queue of requests submitted but not
yet satisfied by the DRAM controller, will run into trouble in just
the same way. Eventually your queue of requests will fill up (unless
your CPU has an infinite amount of queue space) and you won't be able
to queue up any more requests until DRAM gets around to delivering
some of the data you asked for a while ago.

A big part of smart programming design, is figuring out when solving
your problem in the "obvious" way (e.g. accessing memory at random) is
going to be inherently inefficient, and then figuring out ways to
"rewrite the problem" so that it's easier to solve more efficiently.

A common approach (dating back many decades) is to figure out ways of
sorting some of your inputs, so that you can process them in sorted
order more efficiently.

--
Dave Platt AE6EO
Friends of Jade Warrior home page: http://www.radagast.org/jade-warrior
I do _not_ wish to receive unsolicited commercial email, and I will
boycott any company which has the gall to send me such ads!

**Skybuck Flying[_7_]**

What you wrote is old and foolish wisdom, I will give a for simple example
of how foolish it is:

You can spent a great deal of time trying to come up with a better algorithm
for the "travelling salesman" problem or whatever.

But if you never take a look at the actual transportation device and it
turns out it was implemented with snails it's useless none the less.

Only god knows how many programmers have wasted time after time after time
trying to implement something, some algorithm, some program and ultimately
end up with useless slow crap that nobody in this world needs.

If your software competitor does understand hardware better and does come up
with an optimized design from the start guess who is going to loose:

You, you, you and you.

To be able to write good/fast software at all requires some understanding of
how the hardware works, what it's performance characteristics are, what the
numbers are etc.

The deeper the understanding the better, however with all this "magic"
(crap?) going on in the background/cpu tricks it's hard for programmers to
understand what's going on.

These tricks might also be counter-productive, some have already mentioned
hyperthreading as counter productive.

Compilers don't optimize algorithms, they don't determine your algorithm or
data structure or if you should use blocking or non blocking code, compilers
are usually about the little things, the instructions, some instructions
optimizations here and there... these are usually little optimizations,
perhaps up to 30% or so from human written code, but that won't help if the
program is 1000 to 10000% inefficient.

Not all programmers are equal, some are noobs and some are frustrated
"experts" or "experienced" programmers seeking more performance from their
hardware.

Noobs are nice but when it comes to writing high performance programs it's
pretty safe to dismiss them, since they are still struggling to learn how to
write decent programs, and have enough theory to understand first.

For the experts there is also a danger that knowing to much about the
hardware, trying to seek to much about the hardware might actually prevent
them from writing anything at all, because they either can't make up their
mind, or they know itâ€™s not going to give the desired performance, or always
seeking more.

For some it might be wise not to write anything and to wait it out until
some good hardware comes along so they can pour their energy into that.

Shall we forget about the noobs for a moment, shall we move on towards
experts for a moment, which have actually already written many programs, and
now these experts are looking for ways to make these programs run faster,
these programs are trying to solve problems and it takes a lot of time for
the program to solve the problem.

In other words they want to solve the problem faster.

So far perhaps multi-core makes it possible because it has local data cache,
every core has it's own data cache, this could be one reason why multi-core
works.

However it could also be because of more memory accesses, I am not yet sure
which of the reasons leads to the higher performance.

Is multi-core a "cache solution" ? Or is it more like a "barrel processor"
solution ?

^ This is important question and important answer to find out.

If it's the first case then it's not the second case and my assumption that
second case might lead to be better performance might be wrong.

However not really, because a barrel processor could also "simply" divide
it's work onto multiple chips which would also all be connected to their own
processor.
^ Still a bit vague but I am getting an idea which I shall sketch below:

Memory cells:
0 1 2
012345678901234567890123456789
##############################

Queues:

Q Q Q

Processors:
P P P P P P P P

Each queue takes responsibility for certain parts of the memory chips.

Instead of the processor communicating directly with the entire memory chip,
the processors start communicating with the queues and place their requests
in the appriorate queue.

This divides the work somewhat, especially for random access.

The queues now communicate with the memory chips, the queues never overlap
with each other's memory responsibility.

So Q1 takes 0 to 9
So Q2 takes 10 to 19
So Q3 takes 20 to 29

This way multiple memory address requests can be forfilled at the same time.

The processors might also be able to go on and not worry about it to much
the queue's take care of it.

The question is if the processors can queue it fast enough, probably so...

Some queue locking might have to be done if multiple processors try to
request from same memory region... though smarter programmers/programs might
not do that and take responsibility for their own memory sections and use
their own memory sections and make sure it don't overlap.

Seems like a pretty good plan to me... I would be kinda surprised if
processors/memories not already do this ?!

Bye,
Skybuck.

**Joel Koltner**

"Skybuck Flying" wrote in message
b.home.nl...
If your software competitor does understand hardware better and does come up
with an optimized design from the start guess who is going to loose:

You, you, you and you.

Actually "conventional wisdom" in business today is that "first to market" is
often far more important than "bug-free and feature-laden." Sadly this is
true in many cases, although there are plenty of counter-examples as well:
Tablet PCs were largely ignored (even though they'd been around for a decade
or so) until Apple introduced the iPad, and now they're the fastest growing
segment of PCs.

To be able to write good/fast software at all requires some understanding of
how the hardware works, what it's performance characteristics are, what the
numbers are etc.

Again, it really depends on the application. If you're writing a web browser,
of the dozen guys you might have on the team doing so, I doubt more than 1 or
2 really need to understand the underlying hardware all that well. Heck, a
lot of people -- myself included -- use library files for cross-platform
development specifically so that we don't *have* to understand the low-level
architecture of every last OS and CPU we're targeting; many applications just
don't need every last once of CPU power available.

The deeper the understanding the better, however with all this "magic"
(crap?) going on in the background/cpu tricks it's hard for programmers to
understand what's going on.

That's very true.

But look... I grew up with a Commodore 64. It was very cool, and I knew a
large fraction of everything there was to know about it, both at the hardware
and the software levels. But today's PCs are different -- there's *no one
single person at Intel who thoroughly understands every last little technical
detail of a modern Pentium CPU*, just as there's *no one single person at
Microsoft who thoroughly understands every last little technical detail of
Windows*. That's just how it is for desktop PCs -- they're so complex, very
few people are going to code at, e.g., the raw assembly level for an entire
application (a notable exception might be someone like Steve Gibson -- and
even there, his assembly code ends up calling OS routines that were written in
C...); might find some comfortable balance between development time and
performance.

(One can have that same sort of "Commodore 64" experience today with the
myriad of microcontrollers available. Or heck, build your own system-on-chip
in an FPGA... cool beans!)

Compilers don't optimize algorithms, they don't determine your algorithm or
data structure or if you should use blocking or non blocking code, compilers
are usually about the little things, the instructions, some instructions
optimizations here and there... these are usually little optimizations,
perhaps up to 30% or so from human written code, but that won't help if the
program is 1000 to 10000% inefficient.

Agreed, although I think you underestimate just how good optimizing compilers
are as well -- in many cases they're far better than the average programmer in
rearranging code so as to optimize cache access and otherwise prevent pipeline
stalls.

However it could also be because of more memory accesses, I am not yet sure
which of the reasons leads to the higher performance.

Join the crowd. As has been mentioned, Intel and AMD spend many millions of
dollars every year simulating all sorts of different CPU architectures in
their attempts to improve performance.

---Joel

**[email protected]**

On Fri, 10 Jun 2011 18:25:41 -0700, "Joel Koltner"
wrote:

"Skybuck Flying" wrote in message
. nb.home.nl...
If your software competitor does understand hardware better and does come up
with an optimized design from the start guess who is going to loose:

You, you, you and you.

Actually "conventional wisdom" in business today is that "first to market" is
often far more important than "bug-free and feature-laden."

The real problem is "feature-laden" trumps "bug-free" every time.

Sadly this is
true in many cases, although there are plenty of counter-examples as well:
Tablet PCs were largely ignored (even though they'd been around for a decade
or so) until Apple introduced the iPad, and now they're the fastest growing
segment of PCs.

Yup. Couldn't give 'em away until Jobs put the "cool" label on them. ...and
there was still resistance. Anyone remember the iMaxi for the iPad?

To be able to write good/fast software at all requires some understanding of
how the hardware works, what it's performance characteristics are, what the
numbers are etc.

Again, it really depends on the application. If you're writing a web browser,
of the dozen guys you might have on the team doing so, I doubt more than 1 or
2 really need to understand the underlying hardware all that well. Heck, a
lot of people -- myself included -- use library files for cross-platform
development specifically so that we don't *have* to understand the low-level
architecture of every last OS and CPU we're targeting; many applications just
don't need every last once of CPU power available.

He did state "good/fast" as assumptions. ;-)

The deeper the understanding the better, however with all this "magic"
(crap?) going on in the background/cpu tricks it's hard for programmers to
understand what's going on.

That's very true.

Yup. Having debugged the "magic", even with insider scoop, I can agree that
it's a bitch. ;-)

But look... I grew up with a Commodore 64. It was very cool, and I knew a
large fraction of everything there was to know about it, both at the hardware
and the software levels. But today's PCs are different -- there's *no one
single person at Intel who thoroughly understands every last little technical
detail of a modern Pentium CPU*, just as there's *no one single person at
Microsoft who thoroughly understands every last little technical detail of
Windows*. That's just how it is for desktop PCs -- they're so complex, very
few people are going to code at, e.g., the raw assembly level for an entire
application (a notable exception might be someone like Steve Gibson -- and
even there, his assembly code ends up calling OS routines that were written in
C...); might find some comfortable balance between development time and
performance.

If you "ignore" things like the process, physics, and other gooey stuff, I bet
you're wrong. I can well imagine that there are CPU architects in Intel who
do know all the gory details of a particular CPU. They may not know the
circuit-level functioning but from a micro-architecture standpoint, I'm sure
there are some who do.

(One can have that same sort of "Commodore 64" experience today with the
myriad of microcontrollers available. Or heck, build your own system-on-chip
in an FPGA... cool beans!)

Too much like work. ;-)

Compilers don't optimize algorithms, they don't determine your algorithm or
data structure or if you should use blocking or non blocking code, compilers
are usually about the little things, the instructions, some instructions
optimizations here and there... these are usually little optimizations,
perhaps up to 30% or so from human written code, but that won't help if the
program is 1000 to 10000% inefficient.

Agreed, although I think you underestimate just how good optimizing compilers
are as well -- in many cases they're far better than the average programmer in
rearranging code so as to optimize cache access and otherwise prevent pipeline
stalls.

The compilers are smarter than the "average programmer"? That's supposed to
be surprising?

However it could also be because of more memory accesses, I am not yet sure
which of the reasons leads to the higher performance.

Join the crowd. As has been mentioned, Intel and AMD spend many millions of
dollars every year simulating all sorts of different CPU architectures in
their attempts to improve performance.

....and millions more verifying that their CPUs actually do what they're
supposed to.

**mikea**

In alt.comp.periphs.mainboard.asus zzzzzzzzzz wrote:
On Fri, 10 Jun 2011 18:25:41 -0700, "Joel Koltner"
wrote:

"Skybuck Flying" wrote in message
.nb.home.nl...
If your software competitor does understand hardware better and does come up
with an optimized design from the start guess who is going to loose:

You, you, you and you.

Actually "conventional wisdom" in business today is that "first to market" is
often far more important than "bug-free and feature-laden."

The real problem is "feature-laden" trumps "bug-free" every time.

Sadly this is
true in many cases, although there are plenty of counter-examples as well:
Tablet PCs were largely ignored (even though they'd been around for a decade
or so) until Apple introduced the iPad, and now they're the fastest growing
segment of PCs.

Yup. Couldn't give 'em away until Jobs put the "cool" label on them. ...and
there was still resistance. Anyone remember the iMaxi for the iPad?

To be able to write good/fast software at all requires some understanding of
how the hardware works, what it's performance characteristics are, what the
numbers are etc.

Again, it really depends on the application. If you're writing a web browser,
of the dozen guys you might have on the team doing so, I doubt more than 1 or
2 really need to understand the underlying hardware all that well. Heck, a
lot of people -- myself included -- use library files for cross-platform
development specifically so that we don't *have* to understand the low-level
architecture of every last OS and CPU we're targeting; many applications just
don't need every last once of CPU power available.

He did state "good/fast" as assumptions. ;-)

The deeper the understanding the better, however with all this "magic"
(crap?) going on in the background/cpu tricks it's hard for programmers to
understand what's going on.

That's very true.

Yup. Having debugged the "magic", even with insider scoop, I can agree that
it's a bitch. ;-)

But look... I grew up with a Commodore 64. It was very cool, and I knew a
large fraction of everything there was to know about it, both at the hardware
and the software levels. But today's PCs are different -- there's *no one
single person at Intel who thoroughly understands every last little technical
detail of a modern Pentium CPU*, just as there's *no one single person at
Microsoft who thoroughly understands every last little technical detail of
Windows*. That's just how it is for desktop PCs -- they're so complex, very
few people are going to code at, e.g., the raw assembly level for an entire
application (a notable exception might be someone like Steve Gibson -- and
even there, his assembly code ends up calling OS routines that were written in
C...); might find some comfortable balance between development time and
performance.

If you "ignore" things like the process, physics, and other gooey stuff, I bet
you're wrong. I can well imagine that there are CPU architects in Intel who
do know all the gory details of a particular CPU. They may not know the
circuit-level functioning but from a micro-architecture standpoint, I'm sure
there are some who do.

(One can have that same sort of "Commodore 64" experience today with the
myriad of microcontrollers available. Or heck, build your own system-on-chip
in an FPGA... cool beans!)

Too much like work. ;-)

Compilers don't optimize algorithms, they don't determine your algorithm or
data structure or if you should use blocking or non blocking code, compilers
are usually about the little things, the instructions, some instructions
optimizations here and there... these are usually little optimizations,
perhaps up to 30% or so from human written code, but that won't help if the
program is 1000 to 10000% inefficient.

Agreed, although I think you underestimate just how good optimizing compilers
are as well -- in many cases they're far better than the average programmer in
rearranging code so as to optimize cache access and otherwise prevent pipeline
stalls.

The compilers are smarter than the "average programmer"? That's supposed to
be surprising?

However it could also be because of more memory accesses, I am not yet sure
which of the reasons leads to the higher performance.

Join the crowd. As has been mentioned, Intel and AMD spend many millions of
dollars every year simulating all sorts of different CPU architectures in
their attempts to improve performance.

...and millions more verifying that their CPUs actually do what they're
supposed to.

Can you say "F00F"? How about "2+2=3.9999999999999"? Sometimes they
miss an important case -- or, even worse, an important _class_ of cases.

--
I suspect that if the whole world agreed to run on GMT, France would still
insist on GMT+1 just to annoy the British.
-- Seen in a newsgroup thread on Daylight Saving Time

**[email protected]**

On Fri, 10 Jun 2011 23:30:31 -0500, mikea wrote:

In alt.comp.periphs.mainboard.asus zzzzzzzzzz wrote:
On Fri, 10 Jun 2011 18:25:41 -0700, "Joel Koltner"
wrote:

"Skybuck Flying" wrote in message
1.nb.home.nl...
If your software competitor does understand hardware better and does come up
with an optimized design from the start guess who is going to loose:

You, you, you and you.

Actually "conventional wisdom" in business today is that "first to market" is
often far more important than "bug-free and feature-laden."

The real problem is "feature-laden" trumps "bug-free" every time.

Sadly this is
true in many cases, although there are plenty of counter-examples as well:
Tablet PCs were largely ignored (even though they'd been around for a decade
or so) until Apple introduced the iPad, and now they're the fastest growing
segment of PCs.

Yup. Couldn't give 'em away until Jobs put the "cool" label on them. ...and
there was still resistance. Anyone remember the iMaxi for the iPad?

To be able to write good/fast software at all requires some understanding of
how the hardware works, what it's performance characteristics are, what the
numbers are etc.

Again, it really depends on the application. If you're writing a web browser,
of the dozen guys you might have on the team doing so, I doubt more than 1 or
2 really need to understand the underlying hardware all that well. Heck, a
lot of people -- myself included -- use library files for cross-platform
development specifically so that we don't *have* to understand the low-level
architecture of every last OS and CPU we're targeting; many applications just
don't need every last once of CPU power available.

He did state "good/fast" as assumptions. ;-)

The deeper the understanding the better, however with all this "magic"
(crap?) going on in the background/cpu tricks it's hard for programmers to
understand what's going on.

That's very true.

Yup. Having debugged the "magic", even with insider scoop, I can agree that
it's a bitch. ;-)

But look... I grew up with a Commodore 64. It was very cool, and I knew a
large fraction of everything there was to know about it, both at the hardware
and the software levels. But today's PCs are different -- there's *no one
single person at Intel who thoroughly understands every last little technical
detail of a modern Pentium CPU*, just as there's *no one single person at
Microsoft who thoroughly understands every last little technical detail of
Windows*. That's just how it is for desktop PCs -- they're so complex, very
few people are going to code at, e.g., the raw assembly level for an entire
application (a notable exception might be someone like Steve Gibson -- and
even there, his assembly code ends up calling OS routines that were written in
C...); might find some comfortable balance between development time and
performance.

If you "ignore" things like the process, physics, and other gooey stuff, I bet
you're wrong. I can well imagine that there are CPU architects in Intel who
do know all the gory details of a particular CPU. They may not know the
circuit-level functioning but from a micro-architecture standpoint, I'm sure
there are some who do.

(One can have that same sort of "Commodore 64" experience today with the
myriad of microcontrollers available. Or heck, build your own system-on-chip
in an FPGA... cool beans!)

Too much like work. ;-)

Compilers don't optimize algorithms, they don't determine your algorithm or
data structure or if you should use blocking or non blocking code, compilers
are usually about the little things, the instructions, some instructions
optimizations here and there... these are usually little optimizations,
perhaps up to 30% or so from human written code, but that won't help if the
program is 1000 to 10000% inefficient.

Agreed, although I think you underestimate just how good optimizing compilers
are as well -- in many cases they're far better than the average programmer in
rearranging code so as to optimize cache access and otherwise prevent pipeline
stalls.

The compilers are smarter than the "average programmer"? That's supposed to
be surprising?

However it could also be because of more memory accesses, I am not yet sure
which of the reasons leads to the higher performance.

Join the crowd. As has been mentioned, Intel and AMD spend many millions of
dollars every year simulating all sorts of different CPU architectures in
their attempts to improve performance.

...and millions more verifying that their CPUs actually do what they're
supposed to.

Can you say "F00F"? How about "2+2=3.9999999999999"? Sometimes they
miss an important case -- or, even worse, an important _class_ of cases.

Sure, data dependencies are a bitch to find, particularly when your algorithm
has the error (the same error shows up in the simulator).

**Skybuck Flying[_7_]**

It's not only about "cpu" and "os" as you mention.

It's general system architecture.

Very important numbers like:

ethernet speed,
modem speed,
harddisk speed,
memory speed,
pci-express speed,
gpu speed,
texture speed,
triangles speed,
pixel processing/shader speed,
frames per second speed,
compression speed,
usb speed,
flash speed,
cd rom/dvd rom/blue ray speed,
floppy disk speed,
mouse speed,
keyboard speed,
editor speed,
gui speed,
monitor refresh/gdi repainting speed,
opengl speed,
video codec speed,
audio codec speed,
Random memory access speed,

Speed, speed, speed, speed, speed, the list goes on and on and on and on.

Under estimate or over estimate any of these speeds which your program might
relay on given a certain user situation/scenerion and you are going to find
yourself in a whole lot of ****, hoping/wanting that you had designed that
software/algorith a little bit better or differently... if only you knew the
proper numbers and had considered those and did some basic calculations to
see if it was possible or not within some desired time/user scenerio , it
takes only one little botlleneck somewhere to fok you up real good !

=D

Bye,
Skybuck.

"Joel Koltner" wrote in message
...

"Skybuck Flying" wrote in message
b.home.nl...
If your software competitor does understand hardware better and does come
up with an optimized design from the start guess who is going to loose:

You, you, you and you.

Actually "conventional wisdom" in business today is that "first to market"
is
often far more important than "bug-free and feature-laden." Sadly this is
true in many cases, although there are plenty of counter-examples as well:
Tablet PCs were largely ignored (even though they'd been around for a decade
or so) until Apple introduced the iPad, and now they're the fastest growing
segment of PCs.

To be able to write good/fast software at all requires some understanding
of how the hardware works, what it's performance characteristics are, what
the numbers are etc.

Again, it really depends on the application. If you're writing a web
browser,
of the dozen guys you might have on the team doing so, I doubt more than 1
or
2 really need to understand the underlying hardware all that well. Heck, a
lot of people -- myself included -- use library files for cross-platform
development specifically so that we don't *have* to understand the low-level
architecture of every last OS and CPU we're targeting; many applications
just
don't need every last once of CPU power available.

The deeper the understanding the better, however with all this "magic"
(crap?) going on in the background/cpu tricks it's hard for programmers to
understand what's going on.

That's very true.

But look... I grew up with a Commodore 64. It was very cool, and I knew a
large fraction of everything there was to know about it, both at the
hardware
and the software levels. But today's PCs are different -- there's *no one
single person at Intel who thoroughly understands every last little
technical
detail of a modern Pentium CPU*, just as there's *no one single person at
Microsoft who thoroughly understands every last little technical detail of
Windows*. That's just how it is for desktop PCs -- they're so complex, very
few people are going to code at, e.g., the raw assembly level for an entire
application (a notable exception might be someone like Steve Gibson -- and
even there, his assembly code ends up calling OS routines that were written
in
C...); might find some comfortable balance between development time and
performance.

(One can have that same sort of "Commodore 64" experience today with the
myriad of microcontrollers available. Or heck, build your own
system-on-chip
in an FPGA... cool beans!)

Compilers don't optimize algorithms, they don't determine your algorithm
or data structure or if you should use blocking or non blocking code,
compilers are usually about the little things, the instructions, some
instructions optimizations here and there... these are usually little
optimizations, perhaps up to 30% or so from human written code, but that
won't help if the program is 1000 to 10000% inefficient.

Agreed, although I think you underestimate just how good optimizing
compilers
are as well -- in many cases they're far better than the average programmer
in
rearranging code so as to optimize cache access and otherwise prevent
pipeline
stalls.

However it could also be because of more memory accesses, I am not yet
sure which of the reasons leads to the higher performance.

Join the crowd. As has been mentioned, Intel and AMD spend many millions of
dollars every year simulating all sorts of different CPU architectures in
their attempts to improve performance.

---Joel

**Ken Hagan[_2_]**

On Fri, 10 Jun 2011 17:41:57 +0100, MitchAlsup wrote:

One can EASILY detect blocking (or not) by comparing the wall clock
time on multi-million memory access codes.

Oops. Yes. Silly me. Sorry. Too focussed on introspective instruction
sequences when all along there was an independent observer next door.

**Joel Koltner**

"Skybuck Flying" wrote in message
b.home.nl...
It's not only about "cpu" and "os" as you mention.

It's general system architecture.

Very important numbers like:

[etc.]

Yeah, I take your point... but I'll also mention that I think the majority of
the population these days is perfectly served by a sub-$500 desktop PC or a
sub-$750 laptop, even though such machines rank relatively low compared to the
"cutting edge" in most of the aspects you mention.

Under estimate or over estimate any of these speeds which your program might
relay on given a certain user situation/scenerion and you are going to find
yourself in a whole lot of ****, hoping/wanting that you had designed that
software/algorith a little bit better or differently... if only you knew the
proper numbers and had considered those and did some basic calculations to
see if it was possible or not within some desired time/user scenerio , it
takes only one little botlleneck somewhere to fok you up real good !

In many home-use scenarios the bottleneck is the speed at which someone can
type or mouse or the speed of the Internet.

And even in FPS games, most people are far more limited by their own innate
gaming skills than whether their video card is cranking out 30FPS or 60FPS.
:-)

---Joel

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
New Windows Vista x86 and x64 v7.1	Jadranko Albert	Ati Videocards	2	January 30th 07 02:24 AM
CPU is unworkable or has been changed, please recheck CPU soft menu.	[email protected]	Overclocking	1	December 27th 05 01:53 AM
P4B533 hard drive slows down the whole system 2 gig processor and 1mb memory	Rich	Asus Motherboards	2	April 7th 05 05:36 AM
accessing memory mapped I/O in REAL MODE of x86 processor Arch..	banu	Intel	3	May 13th 04 02:40 PM
Changed Processor - Serious Problems!	Peter	General	0	May 1st 04 10:33 AM