If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#341
|
|||
|
|||
"bill davidsen" wrote in message ... | And how exactly do you plan to create an OS that is this "decent"? Every OS | eventually comes across this problem. Look at the Unixes and their load | average statistics. A load average of 1.00 or less on a single-processor | system means that the system is keeping up with its processes; whereas a | load average above 1.00 means that there are more requests for time slices | than there are available time slices in this same system. You can often seen | some systems running at 2.00 or 5.00 or higher. The load avarage is the number of processes average on the run queue. Depending on the UNIX version that may include some processes which are waiting for semiphore, swap, etc. AIX is nice and responsive with a high load average, I've been hapily editing text file with an editor and not noticed the load average was 100+ until the alarm went off. Try this in a graphical OS where echoing a keypress in the editor requires several process to operate and not all of them are interactive. And, of course, this argument doesn't apply if you care about *server* performance. This isn't a 'not enough CPU' problem. This is a 'CPU gets stuck' problem. It may be due to OS problems, it may be due to hardware issues, but every OS on PC hardware suffers from it. But systems which are usable for desktop, such as Linux, may actually have a lot of processes and still be able to give the CPU to the one with the human attached. Not when you're in X and there is no 'one' with the human attached. And, of course, this doesn't help for the server case. As noted elsewhere, a slow machine is nicer to use with Linux than Windows, the memory use seems better. Even under comparable usage, with a graphical environment? Linux is definitely nicer to use in text mode than Windows is, on comparable slow hardware. However, you are definitely right that Windows memory management is pure crap. That said, making best use of memory means that unused parts of processes do get swapped, and changing virtual desktops often takes 400-800ms. Of course Windows doesn't *have* virtual destops in the same way, so there's no way to compare. I think low memory is a different problem than CPU latency due to ambush. Trying to talk about both at the same time, just because you generally encounter both on low end hardware, obscures things if you're talking about system sizing. DS |
#342
|
|||
|
|||
On Mon, 29 Sep 2003 21:55:41 -0400, "Bill Todd"
wrote: snip So databases aren't candidates for being rewritten (according to your original suggestion) to leverage SMT's potential to achieve greater per-core throughput - because they're *already* multi-threaded for other existing (SMP and I/O) reasons. Now I understand the disconnect. The post in which I stated (reiterated, actually) that I thought multi-threading was underutilized follows: RMProcessors do the best job single-threaded applications because that's RMwhat people know how to write without getting themselves into a big RMmuddle. People program in a single-threaded style because there is no RMincentive for them to do otherwise. If they try, they risk getting RMthemselves into a muddle with very little prospect of a payoff. No RMmarket, no software. No software, no market. RM RMSingle-threaded software and processors geared to support single RMthreaded software are self-reinforcing habits. I know of very few RMproblems that don't exhibit a significant degree of exploitable RMparallelism. The problem is finding the parallelism at the RMgranularity that the processor supports efficiently and writing RMsoftware to support it. That sounds too much like work and, except in RMthe HPC world, the OS kernel world, the enterprise computing world, RMand increasingly the world of games, it doesn't get done. You apparently interpret that (or something else I said, but I don't know what) as meaning that I think all software needs to be rewritten to exploit multi-threading. Most PC users only benefit from having more than one processor (real or virtual) available if they are trying to do more than one thing at a time because most PC software isn't written to exploit multiple processors. That situation isn't likely to change for the reasons mentioned in my post. HPC, OS Kernels, and enterprise computing are another matter, and I mentioned such performance-critical applications as places where multi-threading is already used. I didn't state, and I didn't intend to imply, that it is underutilized in those areas, and I would include OLTP workloads in what I meant by enterprise computing. In other words, I had no intention of implying that multi-threading is underutilized for OLTP. The only question is whether it would make any sense to over-subscribe each SMT core with *more* threads than it can execute concurrently to attempt to further leverage available memory bandwidth: my suspicion is that the answer is "No" because of the increased level of multi-programming and resulting inter-thread run-time contention that would occur for what would be likely only a marginal throughput increase in the *absence* of such considerations, and I suggest that the dramatically sub-linear increase in throughput reported in the paper you cited tends to support that suspicion (though with only a single data point one can only suspect, rather than assume, that the improvement was rapidly approaching an asymptote). Nor did I intend to propose aggressively oversubscribing processors. I only meant to reiterate a point that I thought had been discussed and agreed up; viz, that SMT was one of the very few ways you could hide the effects of cache-miss stalls for OLTP workloads. As discussed elsewhere, SMT probably doesn't help the P4 much because it doesn't have the resources to take advantage of it, but (and I really think we are just struggling to agree on something we already agreed upon), a processor with sufficient computational resources could benefit from SMT for OLTP workloads. While you (and Jon Forrest and many others) seem to feel that PC's are plenty poweful enough, that isn't my experience of them. I'd love to have a multi-threaded grep and a multi-threaded gcc, but I don't expect them to appear any time soon. Single threaded programming is so deeply entrenched that I don't expect any significant change at any time in the forseeable future, but other programming paradigms are possible and would be more useful than most people seem to think. That's all I was trying to say. RM |
#343
|
|||
|
|||
On Tue, 30 Sep 2003 03:59:08 -0400, "Bill Todd"
wrote: "Robert Myers" wrote in message .. . ... I only meant to reiterate a point that I thought had been discussed and agreed up; viz, that SMT was one of the very few ways you could hide the effects of cache-miss stalls for OLTP workloads. Perhaps it's mostly just a difference in viewpoint, but I see nothing about SMT (or CMP) that hides the effects of cache-miss stalls: each individual thread still takes just as long to execute as ever. What SMT, CMP, and for that matter plain old SMP do is allow more parallel use of memory bandwidth by multiple threads (plus in the case of SMT somewhat more efficient/flexible utilization of fine-grained processor resources) *in cases where the workload otherwise lends itself to multiple concurrent threads of execution* (either within a single process or between multiple processes). No matter how you say it, Itanium is alot of watts, alot of transistors, and alot of real estate on a motherboard to leave sitting idle while waiting for a cache line to fill. Back to SuperDome and its scaling problems (HP doesn't like it when I refer to "scaling problems", but I can't remember the alternative language they wanted me to use). One approach: turn up the heat on the engineers to design better/faster crossbar circuitry (program probably already underway). Another approach (and probably the direction the industry is headed in general): stop trying to hook up so many separate chips and get a single chip to process more threads one way or another. Haggling over names and details of what resources to share and how left to other readers and posters. snip Single threaded programming is so deeply entrenched that I don't expect any significant change at any time in the forseeable future, but other programming paradigms are possible and would be more useful than most people seem to think. That's where we largely part company, ... herculean efforts to parallelize memory accesses (in situations where there are no *other* factors that would benefit from such parallization) just likely aren't normally justifiable. As I said earlier, it may come to pass that *compilers* will start doing transparent tricks to speed up execution of individual threads by concurrent execute-ahead mechanisms in separate helper threads (though they'll need to be careful not to squander processing resources that could be used more effectively by other, independent threads), but the idea that any significant amount of software will be developed (or rewritten) simply to take advantage of some potential increase in CPU parallelism just doesn't seem realistic (because CPUs are *already* fast enough for the vast majority of the work that they do - some of your work may be an exception to that, but if so it's likely a *rare* exception). As I've already said elsewhere, I expect CPU's to be spinning off threads without human intervention in the no-too-distant future (not a very bold prediction). A much bolder prediction: the search for executable threads will go higher than the low-hanging fruit already identified--fork on call, simple run-ahead, and helper threads--and the search will be successful. ...For that matter, this is something akin to a universal truth in softwa it's seldom worth expending major efforts in performance optimization outside of a few very carefully selected critical areas - otherwise, just let hardware advances solve any problem that may exist. Unless you work on problems that simply cannot be done without massive parallelism, in which case you are constantly seeking new ways of looking at the same old problems. RM |
#344
|
|||
|
|||
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
AMD to demonstrate dual-core chips | Tony Hill | AMD x86-64 Processors | 11 | September 16th 04 11:49 AM |
Itanium sales hit $14bn (w/ -$13.4bn adjustment)! Uh, Opteron sales too | Yousuf Khan | AMD x86-64 Processors | 43 | September 7th 04 09:34 AM |
Power supply EXPLOSION | Peter Hucker | Overclocking | 137 | July 28th 04 10:35 PM |
Bad news for ATI: Nvidia to 'own' ATI at CeBit - no pixel shader 3.0 support in R420 (long) | NV55 | Ati Videocards | 12 | February 24th 04 06:29 AM |
Inq update on future ATI & Nvidia chips | Radeon350 | Ati Videocards | 0 | August 13th 03 10:41 PM |