|
|
|
||||||||||
P. Bientinesi, F.D. Igual, D. Kressner, E.S. Quintana-Orti
Abstract: We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric eigenvalue problem. The target architecture is a current general purpose multi-core processor, where parallelism is extracted using a tuned multi-threaded implementation of BLAS. Also, in response to the advances of hardware accelerators, we modify the code in SBR to accelerate the computation by off-loading a significant part of the operations to a graphics processor (GPU). Our results on a system with two Intel QuadCore processors and a Tesla C1060 GPU illustrate the performance and scalability delivered by these architectures.
Paper: Available as PDF (242 KB) or as hardcopy to order reports@sam.math.ethz.ch.
Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne
graphische Elemente dargestellt. Die Funktionalität der
Website ist aber trotzdem gewährleistet. Wenn Sie diese
Website regelmässig benutzen, empfehlen wir Ihnen, auf
Ihrem Computer einen aktuellen Browser zu installieren. Weitere
Informationen finden Sie auf
folgender
Seite.
Important Note:
The content in this site is accessible to any browser or
Internet device, however, some graphics will display correctly
only in the newer versions of Netscape. To get the most out of
our site we suggest you upgrade to a newer browser.
More
information