A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Paweł Czarnul

doi:10.1002/cpe.7897

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Abstrakt

In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from the traditional OpenMP+CUDA API in a multi-node environment. For optimization, the implementation takes advantage of the MPI THREAD MULTIPLE mode allowing: multiple threads handling distinct GPUs as well as overlapping communication and computations transparently using multiple CUDA streams. The solution allows data parallelization across available GPUs in order to minimize execution time and supports a power-aware mode in which GPUs are automatically selected for computations using a greedy approach in order not to exceed an imposed power limit. We have implemented and benchmarked three parallel applications including: finding the largest divisors; verification of the Collatz conjecture; finding patterns in vectors. These were tested on three various systems: a GPU cluster with 16 nodes, each with NVIDIA GTX 1060 GPU; a powerful 2-node system – one node with 8x NVIDIA Quadro RTX 6000 GPUs, the second with 4x NVIDIA Quadro RTX 5000 GPUs; a heterogeneous environment with one node with 2x NVIDIA RTX 2080 and 2 nodes with NVIDIA GTX 1060 GPUs. We demonstrated effectiveness of the framework through execution times versus power caps within ranges of 100-1400W, 250-3000W and 125-600W for these systems respectively as well as gains from using two versus one CUDA streams per GPU. Finally, we have shown that for the testbed applications the solution allows to obtain high speed-ups between 89.3% to 97.4% of the theoretically assessed ideal ones, for 16 nodes and 2 CUDA streams, demonstrating very good parallel efficiency.

Cytowania

2

CrossRef
0

Web of Science
2

Scopus

Autor (1)

Paweł Czarnul dr hab. inż.

Cytuj jako

Pełna treść

pobierz publikację

pobrano 42 razy

Wersja publikacji: Accepted albo Published Version
DOI:: Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1002/cpe.7897
Licencja: Copyright (2023 John Wiley & Sons, Inc.)

pełna treść artykułu zobacz w serwisie zewnętrznym otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:

Publikacja w czasopiśmie

Typ:

artykuły w czasopismach

Opublikowano w:

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE nr 35,
ISSN: 1532-0626

Język:

angielski

Rok wydania:

2023

Opis bibliograficzny:

Czarnul P.: A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems// CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE -Vol. 35,iss. 25 (2023), s.e7897-

DOI: