A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Paweł Czarnul

doi:10.1002/cpe.7897

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Abstract

In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from the traditional OpenMP+CUDA API in a multi-node environment. For optimization, the implementation takes advantage of the MPI THREAD MULTIPLE mode allowing: multiple threads handling distinct GPUs as well as overlapping communication and computations transparently using multiple CUDA streams. The solution allows data parallelization across available GPUs in order to minimize execution time and supports a power-aware mode in which GPUs are automatically selected for computations using a greedy approach in order not to exceed an imposed power limit. We have implemented and benchmarked three parallel applications including: finding the largest divisors; verification of the Collatz conjecture; finding patterns in vectors. These were tested on three various systems: a GPU cluster with 16 nodes, each with NVIDIA GTX 1060 GPU; a powerful 2-node system – one node with 8x NVIDIA Quadro RTX 6000 GPUs, the second with 4x NVIDIA Quadro RTX 5000 GPUs; a heterogeneous environment with one node with 2x NVIDIA RTX 2080 and 2 nodes with NVIDIA GTX 1060 GPUs. We demonstrated effectiveness of the framework through execution times versus power caps within ranges of 100-1400W, 250-3000W and 125-600W for these systems respectively as well as gains from using two versus one CUDA streams per GPU. Finally, we have shown that for the testbed applications the solution allows to obtain high speed-ups between 89.3% to 97.4% of the theoretically assessed ideal ones, for 16 nodes and 2 CUDA streams, demonstrating very good parallel efficiency.

Citations

2

CrossRef
0

Web of Science
2

Scopus

Author (1)

Paweł Czarnul dr hab. inż.

Cite as

Full text

download paper

downloaded 42 times

Publication version: Accepted or Published Version
DOI:: Digital Object Identifier (open in new tab) 10.1002/cpe.7897
License: Copyright (2023 John Wiley & Sons, Inc.)

full content of the article see on external site open in new tab

Keywords

Details

Category:

Articles

Type:

artykuły w czasopismach

Published in:

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE no. 35,
ISSN: 1532-0626

Language:

English

Publication year:

2023

Bibliographic description:

Czarnul P.: A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems// CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE -Vol. 35,iss. 25 (2023), s.e7897-

DOI: