A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems - Publication - Bridge of Knowledge

Search

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Abstract

In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from the traditional OpenMP+CUDA API in a multi-node environment. For optimization, the implementation takes advantage of the MPI THREAD MULTIPLE mode allowing: multiple threads handling distinct GPUs as well as overlapping communication and computations transparently using multiple CUDA streams. The solution allows data parallelization across available GPUs in order to minimize execution time and supports a power-aware mode in which GPUs are automatically selected for computations using a greedy approach in order not to exceed an imposed power limit. We have implemented and benchmarked three parallel applications including: finding the largest divisors; verification of the Collatz conjecture; finding patterns in vectors. These were tested on three various systems: a GPU cluster with 16 nodes, each with NVIDIA GTX 1060 GPU; a powerful 2-node system – one node with 8x NVIDIA Quadro RTX 6000 GPUs, the second with 4x NVIDIA Quadro RTX 5000 GPUs; a heterogeneous environment with one node with 2x NVIDIA RTX 2080 and 2 nodes with NVIDIA GTX 1060 GPUs. We demonstrated effectiveness of the framework through execution times versus power caps within ranges of 100-1400W, 250-3000W and 125-600W for these systems respectively as well as gains from using two versus one CUDA streams per GPU. Finally, we have shown that for the testbed applications the solution allows to obtain high speed-ups between 89.3% to 97.4% of the theoretically assessed ideal ones, for 16 nodes and 2 CUDA streams, demonstrating very good parallel efficiency.

Citations

  • 1

    CrossRef

  • 0

    Web of Science

  • 2

    Scopus

Cite as

Full text

full text is not available in portal

Keywords

Details

Category:
Articles
Type:
artykuły w czasopismach
Published in:
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE no. 35,
ISSN: 1532-0626
Language:
English
Publication year:
2023
Bibliographic description:
Czarnul P.: A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems// CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE -Vol. 35,iss. 25 (2023), s.e7897-
DOI:
Digital Object Identifier (open in new tab) 10.1002/cpe.7897
Sources of funding:
  • Statutory activity/subsidy
Verified by:
Gdańsk University of Technology

seen 127 times

Recommended for you

Meta Tags