A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems - Publikacja - MOST Wiedzy

Wyszukiwarka

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Abstrakt

In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from the traditional OpenMP+CUDA API in a multi-node environment. For optimization, the implementation takes advantage of the MPI THREAD MULTIPLE mode allowing: multiple threads handling distinct GPUs as well as overlapping communication and computations transparently using multiple CUDA streams. The solution allows data parallelization across available GPUs in order to minimize execution time and supports a power-aware mode in which GPUs are automatically selected for computations using a greedy approach in order not to exceed an imposed power limit. We have implemented and benchmarked three parallel applications including: finding the largest divisors; verification of the Collatz conjecture; finding patterns in vectors. These were tested on three various systems: a GPU cluster with 16 nodes, each with NVIDIA GTX 1060 GPU; a powerful 2-node system – one node with 8x NVIDIA Quadro RTX 6000 GPUs, the second with 4x NVIDIA Quadro RTX 5000 GPUs; a heterogeneous environment with one node with 2x NVIDIA RTX 2080 and 2 nodes with NVIDIA GTX 1060 GPUs. We demonstrated effectiveness of the framework through execution times versus power caps within ranges of 100-1400W, 250-3000W and 125-600W for these systems respectively as well as gains from using two versus one CUDA streams per GPU. Finally, we have shown that for the testbed applications the solution allows to obtain high speed-ups between 89.3% to 97.4% of the theoretically assessed ideal ones, for 16 nodes and 2 CUDA streams, demonstrating very good parallel efficiency.

Cytowania

  • 1

    CrossRef

  • 0

    Web of Science

  • 2

    Scopus

Cytuj jako

Słowa kluczowe

Informacje szczegółowe

Kategoria:
Publikacja w czasopiśmie
Typ:
artykuły w czasopismach
Opublikowano w:
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE nr 35,
ISSN: 1532-0626
Język:
angielski
Rok wydania:
2023
Opis bibliograficzny:
Czarnul P.: A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems// CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE -Vol. 35,iss. 25 (2023), s.e7897-
DOI:
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1002/cpe.7897
Źródła finansowania:
  • Działalność statutowa/subwencja
Weryfikacja:
Politechnika Gdańska

wyświetlono 175 razy

Publikacje, które mogą cię zainteresować

Meta Tagi