A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Homebuilt PC's
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Transcendental floating point functions are now unfixably brokenon Intel processors



 
 
Thread Tools Display Modes
  #1  
Old October 10th 14, 11:58 AM posted to comp.sys.ibm.pc.hardware.chips,comp.sys.intel,alt.comp.hardware.pc-homebuilt
Yousuf Khan[_2_]
external usenet poster
 
Posts: 1,296
Default Transcendental floating point functions are now unfixably brokenon Intel processors

" This error has tragically become un-fixable because of the
compatibility requirements from one generation to the next. The fix for
this problem was figured out quite a long time ago. In the excellent
paper The K5 transcendental functions by T. Lynch, A. Ahmed, M. Schulte,
T. Callaway, and R. Tisdale a technique is described for doing argument
reduction as if you had an infinitely precise value for pi. As far as I
know, the K5 is the only x86 family CPU that did sin/cos accurately. AMD
went back to being bit-for-bit compatibile with the old x87 behavior,
assumably because too many applications broke. Oddly enough, this is
fixed in Itanium.

What we do in the JVM on x86 is moderately obvious: we range check the
argument, and if it's outside the range [-pi/4, pi/4]we do the precise
range reduction by hand, and then call fsin.

So Java is accurate, but slower. I've never been a fan of "fast, but
wrong" when "wrong" is roughly random(). Benchmarks rarely test
accuracy. "double sin(double theta) { return 0; }" would be a great
benchmark-compatible implementation of sin(). For large values of theta,
0 would be arguably more accurate since the absolute error is never
greater than 1. fsin/fcos can have absolute errors as large as 2
(correct answer=1; returned result=-1). "

https://blogs.oracle.com/jag/entry/t...tal_meditation
  #2  
Old October 10th 14, 02:52 PM posted to comp.sys.ibm.pc.hardware.chips,comp.sys.intel,alt.comp.hardware.pc-homebuilt
Mark F[_2_]
external usenet poster
 
Posts: 164
Default Transcendental floating point functions are now unfixably broken on Intel processors

On Fri, 10 Oct 2014 06:58:43 -0400, Yousuf Khan
wrote:

" This error has tragically become un-fixable because of the
compatibility requirements from one generation to the next. The fix for
this problem was figured out quite a long time ago. In the excellent
paper The K5 transcendental functions by T. Lynch, A. Ahmed, M. Schulte,
T. Callaway, and R. Tisdale a technique is described for doing argument
reduction as if you had an infinitely precise value for pi. As far as I
know, the K5 is the only x86 family CPU that did sin/cos accurately. AMD
went back to being bit-for-bit compatibile with the old x87 behavior,
assumably because too many applications broke. Oddly enough, this is
fixed in Itanium.

What we do in the JVM on x86 is moderately obvious: we range check the
argument, and if it's outside the range [-pi/4, pi/4]we do the precise
range reduction by hand, and then call fsin.

So Java is accurate, but slower. I've never been a fan of "fast, but
wrong" when "wrong" is roughly random(). Benchmarks rarely test
accuracy. "double sin(double theta) { return 0; }" would be a great
benchmark-compatible implementation of sin(). For large values of theta,
0 would be arguably more accurate since the absolute error is never
greater than 1. fsin/fcos can have absolute errors as large as 2
(correct answer=1; returned result=-1). "

https://blogs.oracle.com/jag/entry/t...tal_meditation


I wanted to see what the algorithm was, so I found the
paper:
"The K5 Transcendental Functions"
Tom Lynch, Ashraf Ahmed, Mike Schulte,
Tom Callaway, and Robert Tisdale.
"ARITH '95 Proceedings of the 12th
Symposium on Computer Arithmetic", pages 163-167, 1995
ISBN:0-8186-7089-4

https://www.researchgate.net/publica...ntal_functions

The paper describes an elegant algorithm for argument reduction.

However, if I am reading things correctly,
"2.1 Multiprecision Arithmetic" (page 164)
says the arguments have at most 88 bits of precision.

The range reduction is done, per:
"2 Algorithms" (page 164)
from [-2^63,2^63] to [- pi/4,pi/4]

Because of the allowable range of the pre-reduction
arguments, only about 21 bits (=88-63) of precision remain.

In particular, at the extremes of the pre-reduction
argument range, while some function values can have slightly more
than 21 bits of precision, other functions have much less precision.

In particular:
tan(x) near pi/4
has fewer than 21 bits of precision.

Another example of bad use of argument reduction was in the
VAX VMS math library circa 1978.

I couldn't find the earlier pre-1980 paper that
describes the argument reduction algorithm used in detail,
but I did find a reference in the ACM Digital Library (dl.acm.org):
"Radian reduction for trigonometric functions"
Authors: Mary H. Payne, Robert N. Hanek
"ACM SIGNUM Newsletter"
Volume 18 Issue 1, January 1983, Pages 19 - 24

The math library described in the paper uses the constant
pi/4 to stored to about 32768 bits to do an elegant
argument reduction for arguments up to more than 2^16000 radians.

Once again, the problem is that the arguments are not
exact, but rather truncated numbers. The VAX-11 H Floating point
numbers have about 113 bits of precision, so most any argument
that is the result of a computation only has 113 bits,
so any pre-reduction argument that was larger than about
113 bits actually could be any angle, and raising
a loss-of-significance error is the best action that
the library should have taken.
  #3  
Old October 20th 14, 11:00 AM posted to alt.comp.hardware.pc-homebuilt
[email protected]
external usenet poster
 
Posts: 220
Default Transcendental floating point functions are now unfixably brokenon Intel processors

On Friday, October 10, 2014 6:58:43 PM UTC+8, Yousuf Khan wrote:
" This error has tragically become un-fixable because of the
compatibility requirements from one generation to the next. The fix for
this problem was figured out quite a long time ago. In the excellent
paper The K5 transcendental functions by T. Lynch, A. Ahmed, M. Schulte,
T. Callaway, and R. Tisdale a technique is described for doing argument
reduction as if you had an infinitely precise value for pi. As far as I
know, the K5 is the only x86 family CPU that did sin/cos accurately. AMD
went back to being bit-for-bit compatibile with the old x87 behavior,
assumably because too many applications broke. Oddly enough, this is
fixed in Itanium.


Wot about Cuda and other GPU?
A company I used to work for tried them for a heavy duty number-crunching
algorithm (that took about 4 days with Opterons) and found they got
different answers. Which was a problem with "time-lapse processing" when
they compared data recorded years apart, so required you would have to redo
the old data with the new hardware.
  #4  
Old October 20th 14, 11:27 AM posted to alt.comp.hardware.pc-homebuilt
Yousuf Khan[_2_]
external usenet poster
 
Posts: 1,296
Default Transcendental floating point functions are now unfixably brokenonIntel processors

On 20/10/2014 6:00 AM, wrote:
Wot about Cuda and other GPU?
A company I used to work for tried them for a heavy duty number-crunching
algorithm (that took about 4 days with Opterons) and found they got
different answers. Which was a problem with "time-lapse processing" when
they compared data recorded years apart, so required you would have to redo
the old data with the new hardware.


Well, I don't think that CUDA or other GPU-based API's actually have any
native transcendental functions built-in. In fact, as far as I know most
floating point hardware don't have these functions built-in, it's only
the x87's CISC instruction set that had ever had these high-order
functions. All other FPU's emulate these instructions in software, using
series expansion methods.

Even the x86's latest FP hardware instruction sets, such as SSE and AVX
don't have native support for these high-order functions. So to use
these functions: (1) you would have to emulate them in software, or (2)
you would have to do some of your calculations in the new generation SSE
instructions, and then pass some of it to the old x87 hardware.
Interestingly when AMD created the 64-bit instruction set extensions to
x86, it actually removed all further support for x87. All future
floating point now has to be done through SSE2+. So without access to
the x87 hardware in 64-bit programs, you have no choice but to use
software-based techniques.

Yousuf Khan
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
floating point speed compare of AMD and Intel chips Bob Fry AMD x86-64 Processors 3 October 22nd 07 04:25 PM
Examining Intel's Woodcrest performance claims on TPC-C, Floating point, Integer, Java, Web, HPC and application sharikou AMD x86-64 Processors 0 June 8th 06 10:26 PM
Floating point format for Intel math coprocessors Dave Hansen Intel 26 July 6th 03 10:22 AM


All times are GMT +1. The time now is 09:48 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.