Didn't find any results in this catalog!
But we have some results in other catalogs.Filters
total: 1592
-
Catalog
displaying 1000 best results Help
Search results for: OVERLAPPING COMPUTATIONS AND COMMUNICATION
-
Benchmarking overlapping communication and computations with multiple streams for modern GPUs
PublicationThe paper presents benchmarking a multi-stream application processing a set of input data arrays. Tests have been performed and execution times measured for various numbers of streams and various compute intensities measured as the ratio of kernel compute time and data transfer time. As such, the application and benchmarking is representative of frequently used operations such as vector weighted sum, matrix multiplication etc....
-
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
PublicationThe paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...
-
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
PublicationThe paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...
-
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
PublicationThe paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...
-
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
PublicationIn the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...
-
Parallel Programming for Modern High Performance Computing Systems
PublicationIn view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...
-
Performance and Power-Aware Modeling of MPI Applications for Cluster Computing
PublicationThe paper presents modeling of performance and power consumption when running parallel applications on modern cluster-based systems. The model includes basic so-called blocks representing either computations or communication. The latter includes both point-to-point and collective communication. Real measurements were performed using MPI applications and routines run on three different clusters with both Infiniband and Gigabit Ethernet...
-
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
PublicationIn the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...
-
ENGINEERING COMPUTATIONS
Journals -
Modeling SPMD Application Execution Time
PublicationParallel applications in a Single Process Multiple Data paradigm assume splitting huge amounts of data to multiple processors working in parallel at small data packets. As the individual data packets are not independent, the processors must interact with each other to exchange results of the calculations with their adjacent partners and take these results into account in their own computations. An example of SPMD is geometric parallelism...