A memory efficient and fast sparse matrix vector product on a Gpu

Adam Dziekoński; Adam Lamęcki; Michał Mrozowski

A memory efficient and fast sparse matrix vector product on a Gpu

Abstrakt

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.

Autorzy (3)

Cytuj jako

Pełna treść

pełna treść publikacji nie jest dostępna w portalu

pełna treść artykułu zobacz w serwisie zewnętrznym otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:: Publikacja w czasopiśmie
Typ:: artykuł w czasopiśmie wyróżnionym w JCR
Opublikowano w:: Progress in Electromagnetics Research-PIER nr 116, strony 49 - 63,
ISSN: 1559-8985
Język:: angielski
Rok wydania:: 2011
Opis bibliograficzny:: Dziekoński A., Lamęcki A., Mrozowski M.: A memory efficient and fast sparse matrix vector product on a Gpu// Progress in Electromagnetics Research-PIER. -Vol. 116, (2011), s.49-63
Weryfikacja:: Politechnika Gdańska

wyświetlono 122 razy

Publikacje, które mogą cię zainteresować

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations

2011

A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM

2018

Finite element matrix generation on a GPU

2012

Tuning matrix-vector multiplication on GPU

2010

Meta Tagi