Main Page | Report this Page
 
   
Science Forum Index  »  Math - Numerical Analysis Forum  »  [LAPACK] DGEQRF performs worse than DGEQR2
Page 1 of 1    
Author Message
Jeremy Tai
Posted: Mon Mar 26, 2007 10:39 pm
Guest
Greetings...

My impression is that DGEQRF should perform better than DGEQR2, as
much as a factor of 3 according to some literature. However, my use
of DGEQRF is almost always slightly slower than DGEQR2. I need help
to figure out why.

My guess is that I'm using a BLAS that's not optimized. I'm testing
my program with CLAPACK3 on a regular Linux PC (I've also tried to run
on an old Sun workstation but with simular results.) Due to the
limitation of my actual target platform, I cannot afford to "install"
the whole LAPACK so I just digest only the needed LAPACK and BLAS
routines for DGEQRF and then compile them into my executable. That
is, I use the general BLAS that comes with CLAPACK3 and did not go
through the optimization process.

Despite using the general BLAS, I thought originally that DGEQRF
should still outperform DGEQR2 because of the underlying level 3 BLAS
used. However this was not the case. I've tried using different
block size for DGEQRF instead of the suggested "optimal" block size of
32. But my tests showed that in general smaller block size had better
result, and the unblocked DGEQR2 had the best performance. This
contradicts what I expected.

Without the optimized BLAS, I don't expect DGEQRF to run 3 times
faster than DGEQR2, but I didn't expect that DGEQRF using DGEMM would
run slower than DGEQR2 using DGEMV either. Could anyone please give
my some suggestion or thoughts on the problem?

Thanks a lot,
Jeremy


P.S. In my program the execution time is measured somewhat like this:

t1 = gethrtime();
dgeqr2_();
t2 = gethrtime();

t3 = gethrtime();
dgeqrf_();
t4 = gethrtime();

Where t2-t1 is the time spent in DGEQR2 and t4-t3 is the time of
DGEQRF.
shadeofgray
Posted: Tue Mar 27, 2007 1:45 am
Guest
It depends on the size of your problem. For N = 10..100 the speed of DGEQRF is comparable with the speed of DGEQR2 (not faster, but comparable). Block-matrix algorithms are suited for large matrices, N = 500 and larger... Exact boundary depends on the cache size of your CPU and other system parameters.
Jeremy Tai
Posted: Tue Mar 27, 2007 2:46 pm
Guest
I've tried various matrix size from 100x100 to 96000x2000, ie, either
smaller or larger than the crossover point nx=128, in addition to
different block size from 8, 10, 16, ... to 120, but DGEQRF is usually
slightly slower than (or comparable to, since they are pretty close)
DGEQR2. That makes me wonder if I did something wrong...

Thanks!


On Mar 27, 6:45 am, shadeofgray <shadeofg...@yandex.ru> wrote:
> It depends on the size of your problem. For N = 10..100 the speed of DGEQRF is comparable with the speed of DGEQR2 (not faster, but comparable). Block-matrix algorithms are suited for large matrices, N = 500 and larger... Exact boundary depends on the cache size of your CPU and other system parameters.
Jeremy Tai
Posted: Wed Mar 28, 2007 5:56 pm
Guest
I guess the unoptimized BLAS is the reason. I happened to have a
slightly optimized TN type DGEMM of my own so I tried it. I noticed
the difference immediately so think I'm going to try ATLAS. But my
next problem might be that ATLAS may not be suitable on my target
platform (there's no OS on it and ATLAS could not be tuned easily) so
I need an optimized, but easy to handle, better be a single file,
source code...

Thanks.
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Tue Oct 07, 2008 8:40 pm