Wyniki wyszukiwania dla: openmp

Wyniki wyszukiwania dla: openmp

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 20

wyczyść wszystkie filtry niedostępne

Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
Publikacja
- P. Czarnul
- Electronics - Rok 2021
The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Pełny tekst do pobrania w portalu
Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs
Publikacja
- B. Gawrych
- P. Czarnul
- Rok 2023
Considering ongoing developments of both modern CPUs, especially in the context of increasing numbers of cores, cache memory and architectures as well as compilers there is a constant need for benchmarking representative and frequently run workloads. The key metric is speed-up as the computational power of modern CPUs stems mainly from using multiple cores. In this paper, we show and discuss results from running codes such as:...

Pełny tekst do pobrania w serwisie zewnętrznym
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
Publikacja
- P. Czarnul
- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2023
In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...

Pełny tekst do pobrania w portalu
Multi-core and Multiprocessor Implementation of Numerical Integration in Finite Element Method
Publikacja
- Rok 2012
The paper presents techniques for accelerating a numerical integration process which appears in the Finite Element Method. The acceleration is achieved by taking advantages of multi-core and multiprocessor devices. It is shown that using multi-core implementation with OpenMP and a GPU acceleration using CUDA architecture allows one to achieve the speedups by a factor of 5 and 10 on a CPU and GPUs, respectively.
Acceleration of the discrete Green's function computations
Publikacja
- T. Stefański
- Rok 2012
Results of the acceleration of the 3-D discrete Green's function (DGF) computations on the multicore processor are presented. The code was developed in the multiple precision arithmetic with use of the OpenMP parallel programming interface. As a result, the speedup factor of three orders of magnitude compared to the previous implementation was obtained thus applicability of the DGF in FDTD simulations was significantly improved.

Pełny tekst do pobrania w serwisie zewnętrznym
An facile Fortran-95 algorithm to simulate complex instabilities in three-dimensional hyperbolic systems
Dane Badawcze
open access
- J. Macias-Diaz
- G. Graff
It is well know that the simulation of fractional systems is a difficult task from all points of view. In particular, the computer implementation of numerical algorithms to simulate fractional systems of partial differential equations in three dimensions is a hard task which has no been solved satisfactorily. Here, we provide a Fortran-95 code to solve...
Performance Evaluation of the Parallel Codebook Algorithm for Background Subtraction in Video Stream
Publikacja
- G. Szwoch
- Communications in Computer and Information Science - Rok 2011
A background subtraction algorithm based on the codebook approach was implemented on a multi-core processor in a parallel form, using the OpenMP system. The aim of the experiments was to evaluate performance of the multithreaded algorithm in processing video streams recorded from monitoring cameras, depending on a number of computer cores used, method of task scheduling, image resolution and degree of image content variability....

Pełny tekst do pobrania w serwisie zewnętrznym
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
Publikacja
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Rok 2018
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu
Fast implementation of FDTD-compatible green's function on multicore processor
Publikacja
- T. Stefański
- IEEE Antennas and Wireless Propagation Letters - Rok 2012
In this letter, numerically efficient implementation of the finite-difference time domain (FDTD)-compatible Green's function on a multicore processor is presented. Recently, closed-form expression of this discrete Green's function (DGF) was derived, which simplifies its application in the FDTD simulations of radiation and scattering problems. Unfortunately, the new DGF expression involves binomial coefficients, whose computations...

Pełny tekst do pobrania w serwisie zewnętrznym
Use of ICT infrastructure for teaching HPC
Publikacja
- P. Czarnul
- M. Matuszek
- Rok 2019
In this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...

Pełny tekst do pobrania w serwisie zewnętrznym
BeesyCluster as Front-End for High Performance Computing Services
Publikacja
- P. Czarnul
- TASK Quarterly - Rok 2015
The paper presents the BeesyCluster system as a middleware allowing invocation of services on high performance computing resources within the NIWA Centre of Competence project. Access is possible through both WWW and SOAP Web Service interfaces. The former allows non-experienced users to invoke both simple and complex services exposed through easyto-use servlets. The latter is meant for integration of external applications with...

Pełny tekst do pobrania w portalu
JMATRIX - a package for relativistic J-matrix calculations in elastic scattering of electrons from model potentials
Publikacja
- P. Syty
- J. E. Sienkiewicz
- TASK Quarterly - Rok 2017
We present a software package JMATRIX, consisting of two computer codes written in FORTRAN 95 and parallelized with OpenMP, implementing the so-called J-matrix method, applied to elastic scattering of electrons on the radial potential, vanishing faster than Coulomb one. In the J-matrix method, physical scattering problem is replaced by using well-defined model, which is solved analytically. Presented software implements both non-relativistic...

Pełny tekst do pobrania w portalu
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
Publikacja
- T. Stefański
- Progress in Electromagnetics Research-PIER - Rok 2013
This paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...

Pełny tekst do pobrania w serwisie zewnętrznym
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
Publikacja
- J. Skrzypczak
- P. Czarnul
- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023
In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
Publikacja
- A. Sieradzan
- J. Sans‐Duñó
- E. Lubecka
- C. Czaplewski
- A. Lipska
- H. Leszczyński
- K. Ocetkiewicz
- J. Proficz
- P. Czarnul
- H. Krawczyk
- A. Liwo
- JOURNAL OF COMPUTATIONAL CHEMISTRY - Rok 2023
We report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...

Pełny tekst do pobrania w portalu
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
Publikacja
- P. Czarnul
- INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING - Rok 2016
The paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...

Pełny tekst do pobrania w portalu
Parallel Programming for Modern High Performance Computing Systems
Publikacja
- P. Czarnul
- Rok 2018
In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Pełny tekst do pobrania w serwisie zewnętrznym
DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing
Publikacja
- SOFTWARE-PRACTICE & EXPERIENCE - Rok 2022
In the article we propose an automatic power capping software tool DEPO that allows one to perform runtime optimization of performance and energy related metrics. For an assumed application model with an initialization phase followed by a running phase with uniform compute and memory intensity, the tool performs automatic tuning engaging one of the two exploration algorithms—linear search (LS) and golden section search (GSS), finds...

Pełny tekst do pobrania w serwisie zewnętrznym
Multi-GPU-powered UNRES package for physics-based coarse-grained simulations of structure, dynamics, and thermodynamics of protein systems at biological size- and timescales
Publikacja
- C. Czaplewski
- P. Czarnul
- H. Krawczyk
- A. Lipska
- E. Lubecka
- K. Ocetkiewicz
- J. Proficz
- A. Sieradzan
- R. Ślusarz
- J. Liwo
- BIOPHYSICAL JOURNAL - Rok 2024
Coarse-grained models are nowadays extensively used in biomolecular simulations owing to the tremendous extension of size- and time-scale of simulations. The physics-based UNRES (UNited RESidue) model of proteins developed in our laboratory has only two interaction sites per amino-acid residue (united peptide groups and united side chains) and implicit solvent. However, owing to rigorous physics-based derivation, which enabled...

Pełny tekst do pobrania w serwisie zewnętrznym
Massively parallel linear-scaling Hartree–Fock exchange and hybrid exchange–correlation functionals with plane wave basis set accuracy
Publikacja
- J. Dziedzic
- J. C. Womack
- R. Ali
- C. Skylaris
- JOURNAL OF CHEMICAL PHYSICS - Rok 2021
We extend our linear-scaling approach for the calculation of Hartree–Fock exchange energy using localized in situ optimized orbitals [Dziedzic et al., J. Chem. Phys. 139, 214103 (2013)] to leverage massive parallelism. Our approach has been implemented in the ONETEP (Order-N Electronic Total Energy Package) density functional theory framework, which employs a basis of non-orthogonal generalized Wannier functions (NGWFs) to achieve...

Pełny tekst do pobrania w portalu

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: openmp