A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » Intel
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Computation slow with float than double.



 
 
Thread Tools Display Modes
  #1  
Old June 7th 05, 04:03 PM
Michele Guidolin
external usenet poster
 
Posts: n/a
Default Computation slow with float than double.

Hello to everybody.

I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid of different size and type, I have some strange result
when I change the computation from double to float.

Here are the time of test with different grid SIZE and type:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s

As you can see when the grid has a size of 512 node the code with float
type increase the time drastically.
The number of loops is proportional to the SIZE of grid, so the time
should be similar with different SIZE of grid.

Should the float computation always fastest than double?
I would like to know if is a gcc problem (I don't have other compiler)
and if it is not what could be the problem?

Hope to receive an answer as soon as possible,
Thanks

Michele Guidolin.

P.S.
Here are some more information about the test:

The code that I'm testing is the follow and it is the same for the
double version (the constant are not 0.25f but 0.25).

------------- CODE -------------

#define SHIFT_S 9
#define SIZE (1SHIFT_S)
#define DUMP 0

#define MAT(i,j) ((i)SHIFT_S) + (j)


inline void gs_relax(int i,int j,float *u, float *rhs)
{

u[MAT(i,j)] = (float)( rhs[MAT(i,j)] +
0.0f * u[MAT(i,j)] +
0.25f* u[MAT(i+1,j)]+
0.25f* u[MAT(i-1,j)]+
0.25f* u[MAT(i,j+1)]+
0.25f* u[MAT(i,j-1)]);
}

void gs_step_fusion(float *u, float *rhs)
{
int i,j;

/* update the red points:
*/

for(j=1; jSIZE-1; j=j+2)
{
gs_relax(1,j,u,rhs);
}
for(i=2; iSIZE-1; i++)
{
for(j=1+(i+1)%2; jSIZE-1; j=j+2)
{
gs_relax(i,j,u,rhs);
gs_relax(i-1,j,u,rhs);
}

}
for(j=1; jSIZE-1; j=j+2)
{
gs_relax(SIZE-2,j,u,rhs);
}

}

int main(void) {
int iter;

int ITERATIONS = ((int)(pow(2.0,28.0))/(pow((double)SIZE,2.0)));

float u[SIZE*SIZE];
float rhs[SIZE*SIZE];

double time;

printf("-----START SEQUENTIAL FUSION------------\n\n");
printf("size: %d\n",SIZE);
printf("loops: %d\n",ITERATIONS);
init_boundaries(u,rhs);

gettimeofday(&submit_time, 0);

for(iter=0; iterITERATIONS; iter++)
gs_step_fusion(u,rhs);

gettimeofday(&complete_time, 0);


time = timeval_diff(&submit_time, &complete_time);
printf("\ntime: %fs\n",time);

printf("-----END SEQUENTIAL FUSION------------\n\n");

}
---------------CODE--------------

I'm testing this code on this machine:

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping : 1
cpu MHz : 3192.311
cache size : 1024 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl cid
bogomips : 6324.22

with Hyper threading enable on GNU\Linux 2.6.8.

The compiler is gcc 3.4.4 and the flags a
CFLAGS = -g -O2 -funroll-loops -msse2 -march=pentium4 -Wall

I tried with also -ffast-math and -mfpmath=sse but I have the same result.
  #2  
Old June 7th 05, 08:41 PM
Mark Hittinger
external usenet poster
 
Posts: n/a
Default

Michele Guidolin "michele dot guidolin at ucd dot ie" writes:
Should the float computation always fastest than double?
I would like to know if is a gcc problem (I don't have other compiler)
and if it is not what could be the problem?


I wonder if the problem with float being slower might be an alignment issue.

Later

Mark Hittinger

  #3  
Old June 7th 05, 08:44 PM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default

Seeing as you're using a P4 processor, and using the SSE2. If so, then
I've seen in the past discussion where it's been shown that P4's
single-precision float doesn't work nearly as well as its
double-precision float. It might have something to do with how it
conglomerates the floating point operands together prior to performing
the operations. Apparently, the AMD version of SSE2 doesn't show any
difference in performance whether you're using single or double. It's
just one of those wierd architectural issues in P4.

Yousuf Khan

  #4  
Old June 8th 05, 03:16 AM
Beemer Biker
external usenet poster
 
Posts: n/a
Default


"Michele Guidolin" "michele dot guidolin at ucd dot ie" wrote in message
...
Hello to everybody.

I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid of different size and type, I have some strange result
when I change the computation from double to float.

Here are the time of test with different grid SIZE and type:

SIZE 128 256 512

float 2.20s 2.76s 7.86s

double 2.30s 2.47s 2.59s

As you can see when the grid has a size of 512 node the code with float
type increase the time drastically.
The number of loops is proportional to the SIZE of grid, so the time
should be similar with different SIZE of grid.

Should the float computation always fastest than double?
I would like to know if is a gcc problem (I don't have other compiler)
and if it is not what could be the problem?

Hope to receive an answer as soon as possible,
Thanks

Michele Guidolin.

P.S.
Here are some more information about the test:

The code that I'm testing is the follow and it is the same for the
double version (the constant are not 0.25f but 0.25).

------------- CODE -------------

#define SHIFT_S 9
#define SIZE (1SHIFT_S)
#define DUMP 0

#define MAT(i,j) ((i)SHIFT_S) + (j)


inline void gs_relax(int i,int j,float *u, float *rhs)
{

u[MAT(i,j)] = (float)( rhs[MAT(i,j)] +
0.0f * u[MAT(i,j)] +
0.25f* u[MAT(i+1,j)]+
0.25f* u[MAT(i-1,j)]+
0.25f* u[MAT(i,j+1)]+
0.25f* u[MAT(i,j-1)]);
}



look at the assembly code and see if the compiler is converting float to
double in the above code. could be that the doubles are being loaded
directly into the floating processor stack and the singles are being
converted in a gp register then loaded into the fp stack. Recompile with
double then look at the assembly code difference.

  #5  
Old June 8th 05, 04:47 AM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default

Beemer Biker wrote:
look at the assembly code and see if the compiler is converting float to
double in the above code. could be that the doubles are being loaded
directly into the floating processor stack and the singles are being
converted in a gp register then loaded into the fp stack. Recompile with
double then look at the assembly code difference.


He's using SSE2. Check out his compiler flags.

Yousuf Khan
  #6  
Old June 8th 05, 10:16 AM
Michele Guidolin
external usenet poster
 
Posts: n/a
Default

Beemer Biker wrote:

look at the assembly code and see if the compiler is converting float to
double in the above code. could be that the doubles are being loaded
directly into the floating processor stack and the singles are being
converted in a gp register then loaded into the fp stack. Recompile with
double then look at the assembly code difference.


Here is the assembler code of float version:

------------- ASM -----------------
inline void gs_relax(int i,int j,float *u, float *rhs)
{
fb: 55 push %ebp

u[MAT(i,j)] = (float)( rhs[MAT(i,j)] +
fc: d9 ee fldz
fe: d9 05 00 00 00 00 flds 0x0
104: d9 c9 fxch %st(1)
106: 89 e5 mov %esp,%ebp
108: 56 push %esi
109: 8b 45 08 mov 0x8(%ebp),%eax
10c: 8b 75 0c mov 0xc(%ebp),%esi
10f: 53 push %ebx
110: c1 e0 09 shl $0x9,%eax
113: 8b 4d 10 mov 0x10(%ebp),%ecx
116: 8b 55 14 mov 0x14(%ebp),%edx
119: 8d 1c 30 lea (%eax,%esi,1),%ebx
11c: c1 e0 00 shl $0x0,%eax
11f: d8 0c 99 fmuls (%ecx,%ebx,4)
122: d8 04 9a fadds (%edx,%ebx,4)
125: 8d 94 30 00 02 00 00 lea 0x200(%eax,%esi,1),%edx
12c: c1 e0 00 shl $0x0,%eax
12f: d9 04 91 flds (%ecx,%edx,4)
132: 8d 84 30 00 fe ff ff lea 0xfffffe00(%eax,%esi,1),%eax
139: d8 ca fmul %st(2),%st
13b: de c1 faddp %st,%st(1)
13d: d9 04 81 flds (%ecx,%eax,4)
140: d8 ca fmul %st(2),%st
142: de c1 faddp %st,%st(1)
144: d9 44 99 04 flds 0x4(%ecx,%ebx,4)
148: d8 ca fmul %st(2),%st
14a: d9 ca fxch %st(2)
14c: d8 4c 99 fc fmuls 0xfffffffc(%ecx,%ebx,4)
150: d9 c9 fxch %st(1)
152: de c2 faddp %st,%st(2)
154: de c1 faddp %st,%st(1)
156: d9 1c 99 fstps (%ecx,%ebx,4)
159: 5b pop %ebx
15a: 5e pop %esi
15b: 5d pop %ebp
15c: c3 ret
------------- ASM -----------------

and here is the assembler code of double version

------------- ASM -----------------
inline void gs_relax(int i,int j,double *u, double *rhs)
{
112: 55 push %ebp

u[MAT(i,j)] = ( rhs[MAT(i,j)] +
113: d9 ee fldz
115: d9 05 00 00 00 00 flds 0x0
11b: d9 c9 fxch %st(1)
11d: 89 e5 mov %esp,%ebp
11f: 56 push %esi
120: 8b 45 08 mov 0x8(%ebp),%eax
123: 8b 75 0c mov 0xc(%ebp),%esi
126: 53 push %ebx
127: c1 e0 09 shl $0x9,%eax
12a: 8b 4d 10 mov 0x10(%ebp),%ecx
12d: 8b 55 14 mov 0x14(%ebp),%edx
130: 8d 1c 30 lea (%eax,%esi,1),%ebx
133: c1 e0 00 shl $0x0,%eax
136: dc 0c d9 fmull (%ecx,%ebx,8)
139: dc 04 da faddl (%edx,%ebx,8)
13c: 8d 94 30 00 02 00 00 lea 0x200(%eax,%esi,1),%edx
143: c1 e0 00 shl $0x0,%eax
146: dd 04 d1 fldl (%ecx,%edx,8)
149: 8d 84 30 00 fe ff ff lea 0xfffffe00(%eax,%esi,1),%eax
150: d8 ca fmul %st(2),%st
152: de c1 faddp %st,%st(1)
154: dd 04 c1 fldl (%ecx,%eax,8)
157: d8 ca fmul %st(2),%st
159: de c1 faddp %st,%st(1)
15b: dd 44 d9 08 fldl 0x8(%ecx,%ebx,8)
15f: d8 ca fmul %st(2),%st
161: d9 ca fxch %st(2)
163: dc 4c d9 f8 fmull 0xfffffff8(%ecx,%ebx,8)
167: d9 c9 fxch %st(1)
169: de c2 faddp %st,%st(2)
16b: de c1 faddp %st,%st(1)
16d: dd 1c d9 fstpl (%ecx,%ebx,8)
170: 5b pop %ebx
171: 5e pop %esi
172: 5d pop %ebp
173: c3 ret
------------- ASM -----------------

It's alot of time that I don't look in assembler code, but for me look
like that in float version is doing all the operation in float.
Maybe Yousuf is right and is the P4 that do very bad.

Somebody that know better the asm can help me?
Thanks.

Michele.

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Western Digital 80GB SE click on hot day and now a slow drive Darren Storage (alternative) 6 January 4th 05 04:20 AM
Please Help me choose momory for AMD64 Synapse Syndrome Asus Motherboards 11 August 26th 04 02:43 PM
Slow network Michael Culley General 0 May 9th 04 02:17 PM
Slow BIOS News Groupie General 0 March 18th 04 04:30 PM
Slow hard drive in windows XP Wayne Morgan General 1 January 25th 04 08:11 PM


All times are GMT +1. The time now is 09:18 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.