Wyniki wyszukiwania dla: parallel programming
-
Testing for conformance of parallel programming pattern languages
PublikacjaThis paper reports on the project being run by TUG and IMAG, aimed at reducing the volume of tests required to exercise parallel programming language compilers and libraries. The idea is to use the ISO STEP standard scheme for conformance testing of software products. A detailed example illustrating the ongoing work is presented.
-
Parallel Programming for Modern High Performance Computing Systems
PublikacjaIn view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...
-
Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems
PublikacjaThis paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals,...
-
Effective methods for functional confermance testing of parallel and distributed programming libraries.
PublikacjaRozprawa przedstawia kompletna metodykę tworzenia Zestawów Testów Zgodności dla języków programowania, bibliotek i API, ze szczególnym uwzględnieniem języków i bibliotek programowania równoleglego i rozproszonego. Autor rozpoczął badania w dziedzinie testowania zgodności dla bibliotek programowania równoleglego i rozproszonego, ale Metodyka Kolejnych zawężeń (ang. Consecutive Confinenments Method -CoCoM, stworzona przez Autora,...
-
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
PublikacjaIn the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...
-
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
Czasopisma -
Principles and Practice of Parallel Programming
Konferencje -
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
PublikacjaThe paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...
-
International workshop on High-Level Parallel Programming and Applications
Konferencje -
International Workshop on Formal Methods for Parallel Programming: Theory and Applications
Konferencje -
International Workshop on High-Level Parallel Programming Models and Supportive Environments
Konferencje -
Paweł Czarnul dr hab. inż.
OsobyPaweł Czarnul uzyskał stopień doktora habilitowanego w dziedzinie nauk technicznych w dyscyplinie informatyka w roku 2015 zaś stopień doktora nauk technicznych w zakresie informatyki(z wyróżnieniem) nadany przez Radę Wydziału Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej w roku 2003. Dziedziny jego zainteresowań obejmują: przetwarzanie równoległei rozproszone w tym programowanie równoległe na klastrach obliczeniowych,...
-
Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi
PublikacjaParallel algorithms are popular method of increasing system performance. Apart from showing their properties using asymptotic analysis, proof-of-concept implementation and practical experiments are often required. In order to speed up the development and provide simple and easily accessible testing environment that enables execution of reliable experiments, the paper proposes a platform with multi-core computational accelerator:...
-
Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption
PublikacjaMany important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various characteristics. Utilizing full power of such systems requires programming parallel applications that are hybrid in two meanings: they can utilize parallelism on multiple levels at the same time and combine together programming interfaces...
-
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
PublikacjaThis paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...
-
Jerzy Proficz dr hab. inż.
OsobyJerzy Proficz – dyrektor Centrum Informatycznego Trójmiejskiej Akademickiej Sieci Komputerowej (CI TASK) na Politechnice Gdańskiej. Uzyskał stopień naukowy doktora habilitowanego (2022) w dyscyplinie: Informatyka techniczna i telekomunikacja. Autor i współautor ponad 50 artykułów w czasopismach i na konferencjach naukowych związanych głównie z równoległym przetwarzaniem danych na komputerach dużej mocy (HPC, chmura obliczeniowa). Udział...
-
Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
PublikacjaThe aim of this paper is to evaluate performance of new CUDA mechanisms—unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically,...
-
Acceleration of the discrete Green's function computations
PublikacjaResults of the acceleration of the 3-D discrete Green's function (DGF) computations on the multicore processor are presented. The code was developed in the multiple precision arithmetic with use of the OpenMP parallel programming interface. As a result, the speedup factor of three orders of magnitude compared to the previous implementation was obtained thus applicability of the DGF in FDTD simulations was significantly improved.
-
Scheduling with Complete Multipartite Incompatibility Graph on Parallel Machines
PublikacjaIn this paper we consider a problem of job scheduling on parallel machines with a presence of incompatibilities between jobs. The incompatibility relation can be modeled as a complete multipartite graph in which each edge denotes a pair of jobs that cannot be scheduled on the same machine. Our research stems from the works of Bodlaender, Jansen, and Woeginger (1994) and Bodlaender and Jansen (1993). In particular, we pursue the...
-
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
PublikacjaIn the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...
-
Fast implementation of FDTD-compatible green's function on multicore processor
PublikacjaIn this letter, numerically efficient implementation of the finite-difference time domain (FDTD)-compatible Green's function on a multicore processor is presented. Recently, closed-form expression of this discrete Green's function (DGF) was derived, which simplifies its application in the FDTD simulations of radiation and scattering problems. Unfortunately, the new DGF expression involves binomial coefficients, whose computations...
-
Scheduling with Complete Multipartite Incompatibility Graph on Parallel Machines: Complexity and Algorithms
PublikacjaIn this paper, the problem of scheduling on parallel machines with a presence of incompatibilities between jobs is considered. The incompatibility relation can be modeled as a complete multipartite graph in which each edge denotes a pair of jobs that cannot be scheduled on the same machine. The paper provides several results concerning schedules, optimal or approximate with respect to the two most popular criteria of optimality:...
-
Towards an efficient multi-stage Riemann solver for nuclear physics simulations
PublikacjaRelativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....
-
Scheduling of compatible jobs on parallel machines
PublikacjaThe dissertation discusses the problems of scheduling compatible jobs on parallel machines. Some jobs are incompatible, which is modeled as a binary relation on the set of jobs; the relation is often modeled by an incompatibility graph. We consider two models of machines. The first model, more emphasized in the thesis, is a classical model of scheduling, where each machine does one job at time. The second one is a model of p-batching...
-
An facile Fortran-95 algorithm to simulate complex instabilities in three-dimensional hyperbolic systems
Dane BadawczeIt is well know that the simulation of fractional systems is a difficult task from all points of view. In particular, the computer implementation of numerical algorithms to simulate fractional systems of partial differential equations in three dimensions is a hard task which has no been solved satisfactorily. Here, we provide a Fortran-95 code to solve...
-
Use of ICT infrastructure for teaching HPC
PublikacjaIn this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...
-
Tuning matrix-vector multiplication on GPU
PublikacjaA matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
-
Accurate modeling of quasi-resonant inverter fed IM drive
PublikacjaIn this paper wide-band modeling methodology of a parallel quasi-resonant dc link inverter (PQRDCLI) fed induction machine (IM) is presented. The modeling objective is early-design stage prediction of conductive electromagnetic interference (EMI) emissions of the considered converter fed IM drive system. Operation principles of the selected topology of PQRDCLI feeding IM drive are given. Modeling of the converter drive system is...
-
Performance Analysis of the OpenCL Environment on Mobile Platforms
PublikacjaToday’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...
-
Laboratory investigation with subbottom parametric echosounder SES-2000 standard with an emphasis on reflected pure signals analysis
PublikacjaThe main goal of the paper is to describe correlations between measurements results of trials taken on Gulf of Gdańsk bottom sounded with parametric echosounder SES-2000 Standard and laboratory research where collected during survey sediments were measured. Stationary tests took place at Gdansk University of Technology where 30 meters long 1.8 meter deep and 3 meters wide water tank is located. Main lobe of antenna was directed...
-
Application of mechanistic and data-driven models for nitrogen removal in wastewater treatment systems
PublikacjaIn this dissertation, the application of mechanistic and data-driven models in nitrogen removal systems including nitrification and deammonification processes was evaluated. In particular, the influential parameters on the activity of the Nitrospira activity were assessed using response surface methodology (RSM). Various long-term biomass washout experiments were operated in two parallel sequencing batch reactor (SBR) with a different...
-
Nieliniowa statyka 6-parametrowych powłok sprężysto plastycznych. Efektywne obliczenia MES
PublikacjaGłównym zagadnieniem omawianym w monografii jest sformułowanie sprężysto-plastycznego prawa konstytutywnego w nieliniowej 6-parametrowej teorii powłok. Wyróżnikiem tej teorii jest występujący w niej w naturalny sposób tzw. stopień 6 swobody, czyli owinięcie (drilling rotation). Podstawowe założenie pracy to przyjęcie płaskiego stanu naprężenia uogólnionego na ośrodek typu Cosseratów. Takie podejście stanowi oryginalny aspekt opracowania....
-
Computer controlled systems - 2022/2023
Kursy Onlinemateriały wspierające wykład na studiach II stopnia na kierunku ACR pod tytułem komputerowe systemy automatyki 1. Computer system – controlled plant interfacing technique; simple interfacing and with both side acknowledgement; ideas, algorithms, acknowledge passing. 2. Methods of acknowledgement passing: software checking and passing, using interrupt techniques, using readiness checking (ready – wait lines). The best solution...
-
CCS-lecture-2023-2024
Kursy Onlinemateriały wspierające wykład na studiach II stopnia na kierunku ACR pod tytułem komputerowe systemy automatyki 1. Computer system – controlled plant interfacing technique; simple interfacing and with both side acknowledgement; ideas, algorithms, acknowledge passing. 2. Methods of acknowledgement passing: software checking and passing, using interrupt techniques, using readiness checking (ready – wait lines). The best solution optimization...
-
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
PublikacjaThe paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...
-
Parallelization of Selected Algorithms on Multi-core CPUs, a Cluster and in a Hybrid CPU+Xeon Phi Environment
PublikacjaIn the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The parallel codes, implementing the master-slave model in C+MPI, differ in computation to communication ratios. The considered problems include: a genetic algorithm with various ratios of master processing time to communication...
-
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
PublikacjaThe paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...