Search results for: GPU COMPUTING

Search results for: GPU COMPUTING

results on page:
embed this view on your website

Filters

total: 24

clear all filters disabled

Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool
Publication
- Future Generation Computer Systems-The International Journal of Grid Computing-Theory Methods and Applications - Year 2023
GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...

Full text to download in external service
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
Publication
- J. Skrzypczak
- P. Czarnul
- SIMULATION MODELLING PRACTICE AND THEORY - Year 2023
In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Full text to download in external service
Paweł Czarnul dr hab. inż.

People

Dział Usług Chmurowych, Faculty of Electronics, Telecommunications and Informatics, Department of Computer Architecture

Paweł Czarnul obtained a D.Sc. degree in computer science in 2015, a Ph.D. in computer science granted by a council at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology in 2003. His research interests include:parallel and distributed processing including clusters, accelerators, coprocessors; distributed information systems; architectures of distributed systems; programming mobile devices....
Parallel Background Subtraction in Video Streams Using OpenCL on GPU Platforms
Publication
- G. Szwoch
- Year 2014
Implementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU architecture, such as memory access,...

Full text to download in external service
Acceleration of the DGF-FDTD method on GPU using the CUDA technology
Publication
- Year 2015
We present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...

Full text to download in external service
Tuning matrix-vector multiplication on GPU
Publication
- A. Dziekoński
- M. Mrozowski
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2010
A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge
Publication
- P. Czarnul
- P. Rościszewski
- Year 2020
Auto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...

Full text available to download
How to render FDTD computations more effective using agraphics accelerator.
Publication
- IEEE TRANSACTIONS ON MAGNETICS - Year 2009
Graphics processing units (GPUs) for years have been dedicated mostly to real time rendering. Recently leading GPU manufactures have extended their research area and decided to support also graphics computing. In this paper, we describe an impact of new GPU features on development process of an efficient finite difference time domain (FDTD) implementation.

Full text to download in external service
Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
Publication
- A. Dziekoński
- M. Mrozowski
- IEEE Antennas and Wireless Propagation Letters - Year 2018
In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Full text to download in external service
Towards an efficient multi-stage Riemann solver for nuclear physics simulations
Publication
- S. Cygert
- J. Porter-Sobieraj
- D. Kikoła
- J. Sikorski
- M. Słodkowski
- Year 2013
Relativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....

Full text to download in external service
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
Publication
- ENERGIES - Year 2023
High-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...

Full text available to download
Parallel multithread computing for spectroscopic analysis in optical coherence tomography
Publication
- Year 2014
Spectroscopic Optical Coherence Tomography (SOCT) is an extension of Optical Coherence Tomography (OCT). It allows gathering spectroscopic information from individual scattering points inside the sample. It is based on time-frequency analysis of interferometric signals. Such analysis requires calculating hundreds of Fourier transforms while performing a single A-scan. Additionally, further processing of acquired spectroscopic information...

Full text to download in external service
Nowoczesne koncepcje integracji usług w systemie BeesyCluster
Publication
- P. Czarnul
- Year 2010
Opisano funkcje aktualnej wersji systemu BeesyCluster jakowarstwy pośredniej w dostępie do rozproszonych zasobów wraz podsystemami integracji usług, wyboru usług oraz ich wykonania. Zaprezentowano rozszerzenia podsystemu integracji usług zorientowane na green computing. Omówiono problemy inteligentnego wyszukiwania usług, wykorzystanie GPU, współpracę z urządzeniami mobilnymi oraz przetwarzanie w przestrzeniach inteligentnych.Dodatkowo...
The impact of the AC922 Architecture on Performance of Deep Neural Network Training
Publication
- P. Rościszewski
- M. Iwański
- P. Czarnul
- Year 2020
Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report...

Full text to download in external service
A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems
Publication
- Year 2014
Modern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...
Performance/energy aware optimization of parallel applications on GPUs under power capping
Publication
- A. Krzywaniak
- P. Czarnul
- Year 2020
In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Full text available to download
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Publication
- P. Rościszewski
- J. Kaliski
- Year 2017
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Full text to download in external service
Modelling and simulation of GPU processing in the MERPSYS environment
Publication
- T. Gajger
- P. Czarnul
- Scalable Computing: Practice and Experience - Year 2018
In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...

Full text available to download
Advanced Potential Energy Surfaces for Molecular Simulation
Publication
- A. Albaugh
- H. Boateng
- R. Bradshaw
- O. Demerdash
- J. Dziedzic
- Y. Mao
- D. Margul
- J. Swails
- Q. Zeng
- D. Case... and 10 others
- JOURNAL OF PHYSICAL CHEMISTRY B - Year 2016
Advanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models...

Full text available to download
TensorHive: Management of Exclusive GPU Access for Distributed Machine Learning Workloads
Publication
- JOURNAL OF MACHINE LEARNING RESEARCH - Year 2021
TensorHive is a tool for organizing work of research and engineering teams that use servers with GPUs for machine learning workloads. In a comprehensive web interface, it supports reservation of GPUs for exclusive usage, hardware monitoring, as well as configuring, executing and queuing distributed computational jobs. Focusing on easy installation and simple configuration, the tool automatically detects the available computing...

Full text available to download
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
Publication
- SIAM JOURNAL ON SCIENTIFIC COMPUTING - Year 2015
In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....

Full text to download in external service
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
Publication
- P. Czarnul
- COMPUTING AND INFORMATICS - Year 2020
The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...

Full text available to download
Mobile Cloud computing architecture for massively parallelizablegeometric computation
Publication
- V. Sánchez Ribes
- H. Mora-Mora
- A. Sobecki
- F. José Mora Gimeno
- COMPUTERS IN INDUSTRY - Year 2020
Cloud Computing is one of the most disruptive technologies of this century. This technology has been widely adopted in many areas of the society. In the field of manufacturing industry, it can be used to provide advantages in the execution of the complex geometric computation algorithms involved on CAD/CAM processes. The idea proposed in this research consists in outsourcing part of the load to be com- puted in the client machines...

Full text available to download
GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method
Publication
- RADIOENGINEERING - Year 2017
This paper discusses a strategy for speeding up the mesh deformation process in the design-byoptimization of high-frequency components involving electromagnetic field simulations using the 3D finite element method (FEM). The mesh deformation is assumed to be described by a linear elasticity model of a rigid body; therefore, each time the shape of the device is changed, an auxiliary elasticity finite-element problem must be solved....

Full text available to download

Search

Filters

Catalog

Search results for: GPU COMPUTING

Paweł Czarnul dr hab. inż.