If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#31
|
|||
|
|||
Bill Todd wrote:
I'd say probably not before they get its vanilla-x86 emulation up to snuff - i.e., probably never. Well, ia32el-4.4-1.2.ia64.rpm from SuSe seems to work quite well around these parts. But what do I know. -- Alexis Cousein Senior Systems Engineer SGI/Silicon Graphics Brussels opinions expressed here are my own, not those of my employer If I have seen further, it is by standing on reference manuals. |
#32
|
|||
|
|||
Stephen Sprunk wrote:
That's also rumored to be how future Itanics are going to handle x86 emulation (i.e. FX!32); hopefully it'll be better than the direct hardware support of earlier models... As I posted earlier, that's even how most current Itanium2 users would like to run their code (with an IA32EL layer that's recent enough to work). Some of the codes I've tried are 4-5 times faster using FX!32^WIA32EL than using the hardware engine (which you can still use by doing /etc/init.d/ia32el stop). -- Alexis Cousein Senior Systems Engineer SGI/Silicon Graphics Brussels opinions expressed here are my own, not those of my employer If I have seen further, it is by standing on reference manuals. |
#33
|
|||
|
|||
"Alexis Cousein" wrote in message
... Stephen Sprunk wrote: That's also rumored to be how future Itanics are going to handle x86 emulation (i.e. FX!32); hopefully it'll be better than the direct hardware support of earlier models... As I posted earlier, that's even how most current Itanium2 users would like to run their code (with an IA32EL layer that's recent enough to work). Some of the codes I've tried are 4-5 times faster using FX!32^WIA32EL than using the hardware engine (which you can still use by doing /etc/init.d/ia32el stop). Oops... Last I heard it was a future thing; I must have missed the announcement in January when it shipped for Win2k3. Can you say if the hardware engine will be removed in future chips? S -- Stephen Sprunk "Those people who think they know everything CCIE #3723 are a great annoyance to those of us who do." K5SSS --Isaac Asimov |
#34
|
|||
|
|||
"Alexis Cousein" wrote in message ... Bill Todd wrote: I'd say probably not before they get its vanilla-x86 emulation up to snuff - i.e., probably never. Well, ia32el-4.4-1.2.ia64.rpm from SuSe seems to work quite well around these parts. But what do I know. Hard to say: does the performance evoke anything but laughter when compared with current IA32 Intel and AMD competition, or is it still pretty much in the toilet? Last I heard, the only thing that made the software emulation look particularly good was the fact that it was less utterly abysmal than the hardware kludge (i.e., might now be approaching 1.5 GHz P4/Xeon speeds - hardly inspiring, though probably adequate for a somewhat wider range of loads than the hardware IA32 Itanic box is). - bill |
#35
|
|||
|
|||
Bruce Hoult wrote in message ...
In article , Terje Mathisen wrote: Couldn't you make teh IA64 set reside in scratchpad ram, and JIT towards a 32 reg arch that only kept the most often / lately used regs in actual registers and the rest in scratch? it would then be a pretty ordinary target with more or less a couple of extra quirks... Yes, you could, except that all the sw for which IA64 is currently fast, i.e. relatively regular fp codes, are fast specifically because they fit the rotating registers/sw pipelining model of IA64. This model will use all the regs, or at least all the regs that can be live at the same time: Since the L2 latency used to be 9 cycles, this means that you have to expect up to (at least?) N*9, with N = number of regs required by the base algorithm, to be active at any given time. I.e. 128 regs isn't just a hint, it's a requirement for a fast emulator, unless you want to completely unravel all the logic behind those predicated/pipelined/unrolled sw loops. But how are you going to efficiently emulate the register rotation itself, if the IA64 emulated registers are in 128 ordinary registers in a conventional CPU? Depending on the ratio of rotates to computation you could be much better off keeping them in an array and changing a base pointer (and doing a mod on each index into it). -- Bruce Do you mean something like a little wp workspace ptr:-) regards johnjakson_usa_com |
#36
|
|||
|
|||
john jakson wrote:
Bruce Hoult wrote in message ... But how are you going to efficiently emulate the register rotation itself, if the IA64 emulated registers are in 128 ordinary registers in a conventional CPU? Depending on the ratio of rotates to computation you could be much better off keeping them in an array and changing a base pointer (and doing a mod on each index into it). I had this particular problem a long time ago, when implementing the sliding window extension to my version of Kermit. Today the fastest way is probably to use one (or even two) compares and then a conditional/predicated move/subtraction to adjust, right? Since Kermit could settle on arbitrary window sizes, I had to find a fast way to determine not just the current packet, but also [curr-N]. The solution I settled on was to waste a little memory, and use a level of indirection that used a power-of-two-sized table which pointed into the real packet buffer array. :-) Do you mean something like a little wp workspace ptr:-) What impresses me is that HP/Intel decided they could implement a mod-96 register indirect access without causing this to become a critical path. With OOE I would expect all this logic to be handled early in the (decoding?) pipeline, so that the actual execution logic wouldn't see it at all, right? Terje -- - "almost all programming can be viewed as an exercise in caching" |
#37
|
|||
|
|||
Bill Todd wrote:
Last I heard, the only thing that made the software emulation look particularly good was the fact that it was less utterly abysmal than the hardware kludge (i.e., might now be approaching 1.5 GHz P4/Xeon speeds - hardly inspiring, though probably adequate for a somewhat wider range of loads than the hardware IA32 Itanic box is). That's a correct assessment, IMO (if it weren't using such laden terms). Still, OpenOffice does run quite good on a 1.5GHz P4, and finding 128 CPU P4 1.5GHz NUMA-machines *is* rather hard . You wouldn't want to run your performance-critical applications with it (surprise, surprise: you'd better have little-endian 64-bit clean source code, or an IA64 binary, for those) -- but at least your glue logic/GUI/ toolsets etc. do work (even though it takes some Linux gymnastics with alternate glibc versions to make very *old* IA32 binaries work). For other applications that I shan't detail, IA32EL is now fast enough to make other parts of applications be the bottleneck -- which couldn't exactly be said of the hardware engine. Especially when they can be multi-threaded. -- Alexis Cousein Senior Systems Engineer SGI/Silicon Graphics Brussels opinions expressed here are my own, not those of my employer If I have seen further, it is by standing on reference manuals. |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HP's Q&A about OpenVMS, x86-64, and Itanium | Yousuf Khan | General | 36 | June 28th 04 12:25 PM |
OpenVMS on Itanium almost ready | Yousuf Khan | General | 1 | December 24th 03 12:02 AM |
OpenVMS on Itanium almost ready | Yousuf Khan | Intel | 1 | December 24th 03 12:02 AM |
Itanium experts- Building Itanium 1 systems from old parts | Matt Simis | Intel | 5 | December 20th 03 02:41 PM |
Itanium Experts - Building Itanium 1 systems (parts)? | Matt Simis | General | 1 | December 18th 03 07:02 PM |