Abstrakt
In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from the traditional OpenMP+CUDA API in a multi-node environment. For optimization, the implementation takes advantage of the MPI THREAD MULTIPLE mode allowing: multiple threads handling distinct GPUs as well as overlapping communication and computations transparently using multiple CUDA streams. The solution allows data parallelization across available GPUs in order to minimize execution time and supports a power-aware mode in which GPUs are automatically selected for computations using a greedy approach in order not to exceed an imposed power limit. We have implemented and benchmarked three parallel applications including: finding the largest divisors; verification of the Collatz conjecture; finding patterns in vectors. These were tested on three various systems: a GPU cluster with 16 nodes, each with NVIDIA GTX 1060 GPU; a powerful 2-node system – one node with 8x NVIDIA Quadro RTX 6000 GPUs, the second with 4x NVIDIA Quadro RTX 5000 GPUs; a heterogeneous environment with one node with 2x NVIDIA RTX 2080 and 2 nodes with NVIDIA GTX 1060 GPUs. We demonstrated effectiveness of the framework through execution times versus power caps within ranges of 100-1400W, 250-3000W and 125-600W for these systems respectively as well as gains from using two versus one CUDA streams per GPU. Finally, we have shown that for the testbed applications the solution allows to obtain high speed-ups between 89.3% to 97.4% of the theoretically assessed ideal ones, for 16 nodes and 2 CUDA streams, demonstrating very good parallel efficiency.
Cytowania
-
1
CrossRef
-
0
Web of Science
-
2
Scopus
Autor (1)
Cytuj jako
Pełna treść
- Wersja publikacji
- Accepted albo Published Version
- DOI:
- Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1002/cpe.7897
- Licencja
- Copyright (2023 John Wiley & Sons, Inc.)
Słowa kluczowe
Informacje szczegółowe
- Kategoria:
- Publikacja w czasopiśmie
- Typ:
- artykuły w czasopismach
- Opublikowano w:
-
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
nr 35,
ISSN: 1532-0626 - Język:
- angielski
- Rok wydania:
- 2023
- Opis bibliograficzny:
- Czarnul P.: A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems// CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE -Vol. 35,iss. 25 (2023), s.e7897-
- DOI:
- Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1002/cpe.7897
- Źródła finansowania:
-
- Działalność statutowa/subwencja
- Weryfikacja:
- Politechnika Gdańska
wyświetlono 186 razy