Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING

Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING

results on page:
embed this view on your website

Filters

total: 121

clear all filters disabled

Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method
Publication
- A. Dziekoński
- M. Mrozowski
- RADIOENGINEERING - Year 2018
This paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...

Full text available to download
Jacobi and gauss-seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method
Publication
- Year 2010
In this paper two implementations of iterative solvers for solving complex symmetric and sparse systems resulting from finite element method applied to wave equation are discussed. The problem under investigation is a dielectric resonator antenna (DRA) discretized by FEM with vector elements of the second order (LT/QN). The solvers use the preconditioned conjugate gradient (pcg) method implemented on Graphics Processing Unit (GPU)...

Full text to download in external service
Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge
Publication
- P. Czarnul
- P. Rościszewski
- Year 2020
Auto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...

Full text available to download
A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM
Publication
- A. Dziekoński
- M. Mrozowski
- IEEE Access - Year 2018
The paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...

Full text available to download
Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation
Publication
- IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES - Year 2017
This paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral meshes that lead to sparse matrices too large to be dealt with on a typical workstation using direct methods. We discuss the problem of rapid matrix generation and assembly, as well as...

Full text to download in external service
Tuning matrix-vector multiplication on GPU
Publication
- A. Dziekoński
- M. Mrozowski
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2010
A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
Paweł Czarnul dr hab. inż.

People

Dział Usług Chmurowych, Faculty of Electronics, Telecommunications and Informatics, Department of Computer Architecture

Paweł Czarnul obtained a D.Sc. degree in computer science in 2015, a Ph.D. in computer science granted by a council at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology in 2003. His research interests include:parallel and distributed processing including clusters, accelerators, coprocessors; distributed information systems; architectures of distributed systems; programming mobile devices....
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
Publication
- J. Skrzypczak
- P. Czarnul
- SIMULATION MODELLING PRACTICE AND THEORY - Year 2023
In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Full text to download in external service
Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
Publication
- A. Dziekoński
- M. Mrozowski
- IEEE Antennas and Wireless Propagation Letters - Year 2018
In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Full text to download in external service
GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method
Publication
- IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS - Year 2011
The letter discusses a fast implementation of the conjugate gradient iterative method with ${rm E}$-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results...

Full text to download in external service
Parallel Background Subtraction in Video Streams Using OpenCL on GPU Platforms
Publication
- G. Szwoch
- Year 2014
Implementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU architecture, such as memory access,...

Full text to download in external service
Performance evaluation of parallel background subtraction on GPU platforms
Publication
- G. Szwoch
- Elektronika : konstrukcje, technologie, zastosowania - Year 2015
Implementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...

Full text to download in external service
Krylov Space Iterative Solvers on Graphics Processing Units
Publication
- A. Dziekoński
- M. Mrozowski
- Year 2010
CUDA architecture was introduced by Nvidia three years ago and since then there have been many promising publications demonstrating a huge potential of Graphics Processing Units (GPUs) in scientific computations. In this paper, we investigate the performance of iterative methods such as cg, minres, gmres, bicg that may be used to solve large sparse real and complex systems of equations arising in computational electromagnetics.

Full text to download in external service
Parallel Programming for Modern High Performance Computing Systems
Publication
- P. Czarnul
- Year 2018
In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Full text to download in external service
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
Publication
- T. Stefański
- Progress in Electromagnetics Research-PIER - Year 2013
This paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...

Full text to download in external service
Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool
Publication
- Future Generation Computer Systems-The International Journal of Grid Computing-Theory Methods and Applications - Year 2023
GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...

Full text to download in external service
Parallel multithread computing for spectroscopic analysis in optical coherence tomography
Publication
- Year 2014
Spectroscopic Optical Coherence Tomography (SOCT) is an extension of Optical Coherence Tomography (OCT). It allows gathering spectroscopic information from individual scattering points inside the sample. It is based on time-frequency analysis of interferometric signals. Such analysis requires calculating hundreds of Fourier transforms while performing a single A-scan. Additionally, further processing of acquired spectroscopic information...

Full text to download in external service
Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform
Publication
- G. Szwoch
- M. Szczodrak
- Year 2017
Performance evaluation of selected complex video processing algorithms, implemented on a parallel, embedded GPU platform Tegra X1, is presented. Three algorithms were chosen for evaluation: a GMM-based object detection algorithm, a particle filter tracking algorithm and an optical flow based algorithm devoted to people counting in a crowd flow. The choice of these algorithms was based on their computational complexity and parallel...

Full text to download in external service
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
Publication
- P. Czarnul
- COMPUTING AND INFORMATICS - Year 2020
The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...

Full text available to download
Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems
Publication
- Scientific Programming - Year 2020
This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals,...

Full text available to download
GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM
Publication
- Communications in Computational Physics - Year 2017
This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...

Full text to download in external service
Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption
Publication
- P. Rościszewski
- Year 2018
Many important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various characteristics. Utilizing full power of such systems requires programming parallel applications that are hybrid in two meanings: they can utilize parallelism on multiple levels at the same time and combine together programming interfaces...

Full text to download in external service
Highly parallel distributed computing systems with optical interconnections
Publication
- J. Just
- R. Romaniuk
- R. S. Romaniuk
- Microprocessing and Microprogramming - Year 1989
Full text to download in external service
Highly Parallel Distributed Computing System With Optical Interconnections
Publication
- J. Just
- R. Romaniuk
- R. S. Romaniuk
- Year 1990
Full text to download in external service
Review of parallel computing methods and tools for FPGA technology
Publication
- R. Cieszewski
- M. Linczuk
- K. Pozniak
- R. Romaniuk
- R. S. Romaniuk
- Year 2013
Full text to download in external service
Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment
Publication
- Year 2014
The paper presents design, implementation and real life uses of a visualization subsystem for a distributed framework for parallelization of workflow-based computations among clusters with nodes that feature both CPUs and GPUs. Firstly, the proposed system presents a graphical view of the infrastructure with clusters, nodes and compute devices along with parameters and runtime graphs of load, memory available, fan speeds etc. Secondly,...

Full text to download in external service
Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology
Publication
- Year 2016
The discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...

Full text to download in external service
Jerzy Konorski dr hab. inż.

People

Department of Computer Communications

Jerzy Konorski received his M. Sc. degree in telecommunications from Gdansk University of Technology, Poland, and his Ph. D. degree in computer science from the Polish Academy of Sciences, Warsaw, Poland. In 2007, he defended his D. Sc. thesis at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology. He has authored over 150 papers, led scientific projects funded by the European Union,...
ACM Transactions on Parallel Computing

Journals

ISSN: 2329-4949 , eISSN: 2329-4957
Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
Publication
- IEEE Antennas and Wireless Propagation Letters - Year 2011
This letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow...

Full text to download in external service
PARALLEL COMPUTING

Journals

ISSN: 0167-8191 , eISSN: 1872-7336
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

Journals

ISSN: 0743-7315 , eISSN: 1096-0848
Acceleration of the DGF-FDTD method on GPU using the CUDA technology
Publication
- Year 2015
We present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...

Full text to download in external service
Towards an efficient multi-stage Riemann solver for nuclear physics simulations
Publication
- S. Cygert
- J. Porter-Sobieraj
- D. Kikoła
- J. Sikorski
- M. Słodkowski
- Year 2013
Relativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....

Full text to download in external service
Paweł Rościszewski dr inż.

People

Paweł Rościszewski received his PhD in Computer Science at Gdańsk University of Technology in 2018 based on PhD thesis entitled: "Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption". Currently, he is an Assistant Professor at the Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Poland....
Jerzy Proficz dr hab. inż.

People

Academic Computer Centre TASK, Department of Computer Architecture

Jerzy Proficz, Ph.D. is the director of the Centre of Informatics – Tricity Academic Supercomputer & networK (CI TASK) at Gdansk University of Technology, Poland. He earned his Ph.D. (2012) in HPC (High Performance Computing) in the subject of supercomputer resource provisioning and management for on-line data processing D.Sc. (2022) in the discipline: Information and Communication Technology. Author and co-author of over 50...
PPAM 2022

Events

11-09-2022 07:00 - 14-09-2022 13:56

The PPAM 2022 conference, will cover topics in parallel and distributed computing, including theory and applications, as well as applied mathematics.
How to render FDTD computations more effective using agraphics accelerator.
Publication
- IEEE TRANSACTIONS ON MAGNETICS - Year 2009
Graphics processing units (GPUs) for years have been dedicated mostly to real time rendering. Recently leading GPU manufactures have extended their research area and decided to support also graphics computing. In this paper, we describe an impact of new GPU features on development process of an efficient finite difference time domain (FDTD) implementation.

Full text to download in external service
Machine Learning in Multi-Agent Systems using Associative Arrays
Publication
- P. Spychalski
- R. Arendt
- PARALLEL COMPUTING - Year 2018
In this paper, a new machine learning algorithm for multi-agent systems is introduced. The algorithm is based on associative arrays, thus it becomes less complex and more efficient substitute of artificial neural networks and Bayesian networks, which is confirmed by performance measurements. Implementation of machine learning algorithm in multi-agent system for aided design of selected control systems allowed to improve the performance...

Full text available to download
Parallel Computing

Conferences
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
Publication
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Year 2018
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Full text available to download
Performance evaluation of the parallel object tracking algorithm employing the particle filter
Publication
- G. Szwoch
- Year 2016
An algorithm based on particle filters is employed to track moving objects in video streams from fixed and non-fixed cameras. Particle weighting is based on color histograms computed in the iHLS color space. Particle computations are parallelized with CUDA framework. The algorithm was tested on various GPU devices: a desktop GPU card, a mobile chipset and two embedded GPU platforms. The processing speed depending on the number...
GPU-accelerated finite element method
Publication
- Year 2016
In this paper the results of the acceleration of computations involved in analysing electromagnetic problems by means of the finite element method (FEM), obtained with graphics processors (GPU), are presented. A 4.7-fold acceleration was achieved thanks to the massive parallelization of the most time-consuming steps of FEM, namely finite-element matrix-generation and the solution of a sparse system of linear equations with the...

Full text to download in external service
GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]
Publication
- IEEE ANTENNAS AND PROPAGATION MAGAZINE - Year 2014
This paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...

Full text to download in external service
Accuracy, Memory and Speed Strategies in GPU-based Finite-Element Matrix-Generation
Publication
- IEEE Antennas and Wireless Propagation Letters - Year 2012
This paper presents strategies on how to optimize GPU-based finite-element matrix-generation that occurs in the finite-element method (FEM) using higher order curvilinear elements. The goal of the optimization is to increase the speed of evaluation and assembly of large finite-element matrices on a single GPU (Graphics Processing Unit) while maintaining the accuracy of numerical integration at the desired level. For this reason,...

Full text to download in external service
A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems
Publication
- Year 2014
Modern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...
Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
Publication
- Year 2024
In this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...

Full text to download in external service
Performance/energy aware optimization of parallel applications on GPUs under power capping
Publication
- A. Krzywaniak
- P. Czarnul
- Year 2020
In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Full text available to download
Inverse Analysis as a Key Element of Safety Assessment under the Snow Load For The Large Suspension Roofs Structure
Publication
- K. Żółtowski
- M. Drawc
- Archives of Civil Engineering - Year 2020
The paper presents a concept and realization of monitoring system for the Silesian Stadium in Chorzow. The idea of the system lies in fusion of structure monitoring with a calibrated numerical FEM model [1]. The inverse problem is solved. On the base of measured selected displacements, the numerical FEM model of the structure combined with iterative method, develops the current snow load distribution. Knowing the load, we can calculate...

Full text available to download
Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs
Publication
- A. Dziekoński
- G. Fotyga
- M. Mrozowski
- IEEE Access - Year 2018
This paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...

Full text available to download

Search

Filters

Catalog

Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING

Paweł Czarnul dr hab. inż.

Jerzy Konorski dr hab. inż.

Paweł Rościszewski dr inż.

Jerzy Proficz dr hab. inż.