Wyniki wyszukiwania dla: gpu

Wyniki wyszukiwania dla: gpu

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 78

wyczyść wszystkie filtry niedostępne

Finite element matrix generation on a GPU
Publikacja
- Progress in Electromagnetics Research-PIER - Rok 2012
This paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a Fermi GPU (1x Tesla C2075) and a CPU (2x twelve-core...

Pełny tekst do pobrania w serwisie zewnętrznym
Tuning matrix-vector multiplication on GPU
Publikacja
- A. Dziekoński
- M. Mrozowski
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010
A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
GPU-accelerated finite element method
Publikacja
- Rok 2016
In this paper the results of the acceleration of computations involved in analysing electromagnetic problems by means of the finite element method (FEM), obtained with graphics processors (GPU), are presented. A 4.7-fold acceleration was achieved thanks to the massive parallelization of the most time-consuming steps of FEM, namely finite-element matrix-generation and the solution of a sparse system of linear equations with the...

Pełny tekst do pobrania w serwisie zewnętrznym
Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool
Publikacja
- Future Generation Computer Systems-The International Journal of Grid Computing-Theory Methods and Applications - Rok 2023
GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...

Pełny tekst do pobrania w serwisie zewnętrznym
Modelling and simulation of GPU processing in the MERPSYS environment
Publikacja
- T. Gajger
- P. Czarnul
- Scalable Computing: Practice and Experience - Rok 2018
In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...

Pełny tekst do pobrania w portalu
Performance evaluation of parallel background subtraction on GPU platforms
Publikacja
- G. Szwoch
- Elektronika : konstrukcje, technologie, zastosowania - Rok 2015
Implementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...

Pełny tekst do pobrania w serwisie zewnętrznym
A memory efficient and fast sparse matrix vector product on a Gpu
Publikacja
- Progress in Electromagnetics Research-PIER - Rok 2011
This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Pełny tekst do pobrania w serwisie zewnętrznym
Acceleration of the DGF-FDTD method on GPU using the CUDA technology
Publikacja
- Rok 2015
We present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...

Pełny tekst do pobrania w serwisie zewnętrznym
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
Publikacja
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Rok 2018
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu
Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology
Publikacja
- Rok 2016
The discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...

Pełny tekst do pobrania w serwisie zewnętrznym
Parallel Background Subtraction in Video Streams Using OpenCL on GPU Platforms
Publikacja
- G. Szwoch
- Rok 2014
Implementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU architecture, such as memory access,...

Pełny tekst do pobrania w serwisie zewnętrznym
Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment
Publikacja
- Rok 2014
The paper presents design, implementation and real life uses of a visualization subsystem for a distributed framework for parallelization of workflow-based computations among clusters with nodes that feature both CPUs and GPUs. Firstly, the proposed system presents a graphical view of the infrastructure with clusters, nodes and compute devices along with parameters and runtime graphs of load, memory available, fan speeds etc. Secondly,...

Pełny tekst do pobrania w serwisie zewnętrznym
TensorHive: Management of Exclusive GPU Access for Distributed Machine Learning Workloads
Publikacja
- JOURNAL OF MACHINE LEARNING RESEARCH - Rok 2021
TensorHive is a tool for organizing work of research and engineering teams that use servers with GPUs for machine learning workloads. In a comprehensive web interface, it supports reservation of GPUs for exclusive usage, hardware monitoring, as well as configuring, executing and queuing distributed computational jobs. Focusing on easy installation and simple configuration, the tool automatically detects the available computing...

Pełny tekst do pobrania w portalu
Accuracy, Memory and Speed Strategies in GPU-based Finite-Element Matrix-Generation
Publikacja
- IEEE Antennas and Wireless Propagation Letters - Rok 2012
This paper presents strategies on how to optimize GPU-based finite-element matrix-generation that occurs in the finite-element method (FEM) using higher order curvilinear elements. The goal of the optimization is to increase the speed of evaluation and assembly of large finite-element matrices on a single GPU (Graphics Processing Unit) while maintaining the accuracy of numerical integration at the desired level. For this reason,...

Pełny tekst do pobrania w serwisie zewnętrznym
GPU based implementation of Temperature-Vegetation Dryness Index for AVHRR3 Satellite Data
Publikacja
- T. Bieliński
- A. Chybicki
- Rok 2014
Paper presents an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland. All phases of TVDI implementation on GPU are modified in respect to CUDA platform....
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
Publikacja
- SIAM JOURNAL ON SCIENTIFIC COMPUTING - Rok 2015
In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....

Pełny tekst do pobrania w serwisie zewnętrznym
GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method
Publikacja
- IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS - Rok 2011
The letter discusses a fast implementation of the conjugate gradient iterative method with ${rm E}$-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results...

Pełny tekst do pobrania w serwisie zewnętrznym
GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method
Publikacja
- RADIOENGINEERING - Rok 2017
This paper discusses a strategy for speeding up the mesh deformation process in the design-byoptimization of high-frequency components involving electromagnetic field simulations using the 3D finite element method (FEM). The mesh deformation is assumed to be described by a linear elasticity model of a rigid body; therefore, each time the shape of the device is changed, an auxiliary elasticity finite-element problem must be solved....

Pełny tekst do pobrania w portalu
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
Publikacja
- T. Stefański
- Progress in Electromagnetics Research-PIER - Rok 2013
This paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...

Pełny tekst do pobrania w serwisie zewnętrznym
Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
Publikacja
- A. Dziekoński
- M. Mrozowski
- IEEE Antennas and Wireless Propagation Letters - Rok 2018
In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Pełny tekst do pobrania w serwisie zewnętrznym
Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform
Publikacja
- G. Szwoch
- M. Szczodrak
- Rok 2017
Performance evaluation of selected complex video processing algorithms, implemented on a parallel, embedded GPU platform Tegra X1, is presented. Three algorithms were chosen for evaluation: a GMM-based object detection algorithm, a particle filter tracking algorithm and an optical flow based algorithm devoted to people counting in a crowd flow. The choice of these algorithms was based on their computational complexity and parallel...

Pełny tekst do pobrania w serwisie zewnętrznym
Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem
Publikacja
- Ł. Syrocki
- G. Pestka
- Open Computer Science - Rok 2016
Pełny tekst do pobrania w serwisie zewnętrznym
Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
Publikacja
- Rok 2024
In this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...

Pełny tekst do pobrania w serwisie zewnętrznym
Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation
Publikacja
- IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES - Rok 2017
This paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral meshes that lead to sparse matrices too large to be dealt with on a typical workstation using direct methods. We discuss the problem of rapid matrix generation and assembly, as well as...

Pełny tekst do pobrania w serwisie zewnętrznym
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
Publikacja
- P. Czarnul
- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2023
In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...

Pełny tekst do pobrania w serwisie zewnętrznym
Jacobi and gauss-seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method
Publikacja
- Rok 2010
In this paper two implementations of iterative solvers for solving complex symmetric and sparse systems resulting from finite element method applied to wave equation are discussed. The problem under investigation is a dielectric resonator antenna (DRA) discretized by FEM with vector elements of the second order (LT/QN). The solvers use the preconditioned conjugate gradient (pcg) method implemented on Graphics Processing Unit (GPU)...

Pełny tekst do pobrania w serwisie zewnętrznym
GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]
Publikacja
- IEEE ANTENNAS AND PROPAGATION MAGAZINE - Rok 2014
This paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...

Pełny tekst do pobrania w serwisie zewnętrznym
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
Publikacja
- J. Skrzypczak
- P. Czarnul
- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023
In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
Publikacja
- P. Czarnul
- COMPUTING AND INFORMATICS - Rok 2020
The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...

Pełny tekst do pobrania w portalu
UNRES-GPU for Physics-Based Coarse-Grained Simulations of Protein Systems at Biological Time- and Size-Scales
Publikacja
- BIOINFORMATICS - Rok 2023
The dynamics of the virus like particles (VLPs) corresponding to the GII.4 Houston, GII.2 SMV, and GI.1 Norwalk strains of human noroviruses (HuNoV) that cause gastroenteritis was investigated by means of long-time (about 30 μs in the laboratory timescale) molecular dynamics simulations with the coarse-grained UNRES force field. The main motion of VLP units turned out to be the bending at the junction between the P1 subdomain (that...

Pełny tekst do pobrania w portalu
Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge
Publikacja
- P. Czarnul
- P. Rościszewski
- Rok 2020
Auto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...

Pełny tekst do pobrania w portalu
GPU Power Capping for Energy-Performance Trade-Offs in Training of Deep Convolutional Neural Networks for Image Recognition
Publikacja
- Rok 2022
In the paper we present performance-energy trade-off investigation of training Deep Convolutional Neural Networks for image recognition. Several representative and widely adopted network models, such as Alexnet, VGG-19, Inception V3, Inception V4, Resnet50 and Resnet152 were tested using systems with Nvidia Quadro RTX 6000 as well as Nvidia V100 GPUs. Using GPU power capping we found other than default configurations minimizing...

Pełny tekst do pobrania w portalu
A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM
Publikacja
- A. Dziekoński
- M. Mrozowski
- IEEE Access - Rok 2018
The paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...

Pełny tekst do pobrania w portalu
Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
Publikacja
- IEEE Antennas and Wireless Propagation Letters - Rok 2011
This letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow...

Pełny tekst do pobrania w serwisie zewnętrznym
GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM
Publikacja
- Communications in Computational Physics - Rok 2017
This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...

Pełny tekst do pobrania w serwisie zewnętrznym
Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method
Publikacja
- A. Dziekoński
- M. Mrozowski
- RADIOENGINEERING - Rok 2018
This paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...

Pełny tekst do pobrania w portalu
Using GPUs for Parallel Stencil Computations in Relativistic Hydrodynamic Simulation
Publikacja
- S. Cygert
- D. Kikoła
- J. Porter-Sobieraj
- J. Sikorski
- M. Słodkowski
- Rok 2014
This paper explores the possibilities of using a GPU for complex 3D finite difference computation. We propose a new approach to this topic using surface memory and compare it with 3D stencil computations carried out via shared memory, which is currently considered to be the best approach. The case study was performed for the extensive computation of collisions between heavy nuclei in terms of relativistic hydrodynamics.

Pełny tekst do pobrania w serwisie zewnętrznym
Quality of Cryptocurrency Mining on Previous Generation NVIDIA GTX GPUs
Publikacja
- Rok 2022
Currently, there is a lot of previous generation NVIDIA GTX graphical processing units (GPUs) available on the market, which were ousted from by next-gen RTX units. Due to this fact, numerous fully-operational devices remain underused, which are available at an affordable price. First, this paper presents an analysis of the cryptocurrency market. Next, in this context, the results of research on the performance of NVIDIA graphics...

Pełny tekst do pobrania w serwisie zewnętrznym
Benchmarking overlapping communication and computations with multiple streams for modern GPUs
Publikacja
- P. Czarnul
- Annals of Computer Science and Information Systems - Rok 2018
The paper presents benchmarking a multi-stream application processing a set of input data arrays. Tests have been performed and execution times measured for various numbers of streams and various compute intensities measured as the ratio of kernel compute time and data transfer time. As such, the application and benchmarking is representative of frequently used operations such as vector weighted sum, matrix multiplication etc....

Pełny tekst do pobrania w portalu
Performance/energy aware optimization of parallel applications on GPUs under power capping
Publikacja
- A. Krzywaniak
- P. Czarnul
- Rok 2020
In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Pełny tekst do pobrania w portalu
Modelowanie wydajności, niezawodności i zużycia energii wilopoziomowych systemów równoległych wielkiej skali z uwzględnieniem CPU oraz GPU

Projekty

Kierownik projektu: dr hab. inż. Paweł Czarnul Program finansujący: OPUS

Projekt realizowany w Wydział Elektroniki, Telekomunikacji i Informatyki zgodnie z porozumieniem UMO-2012/07/B/ST6/01516 z dnia 2013-07-17
Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs
Publikacja
- P. Czarnul
- P. Rościszewski
- Rok 2014
The paper proposes an approach for parallelization of computations across a collection of clusters with heterogeneous nodes with both GPUs and CPUs. The proposed system partitions input data into chunks and assigns to par- ticular devices for processing using OpenCL kernels defined by the user. The sys- tem is able to minimize the execution time of the application while maintaining the power consumption of the utilized GPUs and...

Pełny tekst do pobrania w serwisie zewnętrznym
Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs
Publikacja
- Rok 2022
In the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...

Pełny tekst do pobrania w portalu
KernelHive: a new workflow-based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs
Publikacja
- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2016
The paper presents a new open-source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available....

Pełny tekst do pobrania w serwisie zewnętrznym
Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs
Publikacja
- M. Knap
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Rok 2019
The paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation,...

Pełny tekst do pobrania w portalu
Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs
Publikacja
- A. Dziekoński
- G. Fotyga
- M. Mrozowski
- IEEE Access - Rok 2018
This paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...

Pełny tekst do pobrania w portalu
Paweł Czarnul dr hab. inż.

Osoby

Katedra Architektury Systemów Komputerowych, Wydział Elektroniki, Telekomunikacji i Informatyki

Paweł Czarnul uzyskał stopień doktora habilitowanego w dziedzinie nauk technicznych w dyscyplinie informatyka w roku 2015 zaś stopień doktora nauk technicznych w zakresie informatyki(z wyróżnieniem) nadany przez Radę Wydziału Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej w roku 2003. Dziedziny jego zainteresowań obejmują: przetwarzanie równoległei rozproszone w tym programowanie równoległe na klastrach obliczeniowych,...
Implementation of TVDI calculation for coastal zone
Publikacja
- T. Bieliński
- Rok 2015
Paper will show an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland, especially from region of Gdańsk coastal zone. All phases of TVDI implementation...
How to render FDTD computations more effective using agraphics accelerator.
Publikacja
- IEEE TRANSACTIONS ON MAGNETICS - Rok 2009
Graphics processing units (GPUs) for years have been dedicated mostly to real time rendering. Recently leading GPU manufactures have extended their research area and decided to support also graphics computing. In this paper, we describe an impact of new GPU features on development process of an efficient finite difference time domain (FDTD) implementation.

Pełny tekst do pobrania w serwisie zewnętrznym
Performance evaluation of the parallel object tracking algorithm employing the particle filter
Publikacja
- G. Szwoch
- Rok 2016
An algorithm based on particle filters is employed to track moving objects in video streams from fixed and non-fixed cameras. Particle weighting is based on color histograms computed in the iHLS color space. Particle computations are parallelized with CUDA framework. The algorithm was tested on various GPU devices: a desktop GPU card, a mobile chipset and two embedded GPU platforms. The processing speed depending on the number...
Multi-core and Multiprocessor Implementation of Numerical Integration in Finite Element Method
Publikacja
- Rok 2012
The paper presents techniques for accelerating a numerical integration process which appears in the Finite Element Method. The acceleration is achieved by taking advantages of multi-core and multiprocessor devices. It is shown that using multi-core implementation with OpenMP and a GPU acceleration using CUDA architecture allows one to achieve the speedups by a factor of 5 and 10 on a CPU and GPUs, respectively.
Programowanie równoległe na architekturach wielordzeniowych
Kursy Online
- A. Brzeski
- P. Czarnul
- R. Kałaska
Kurs poświęcony zagadnieniom programowania równoległego na maszynach z pamięcią współdzieloną, w tym na wielordzeniowych CPU oraz GPU.
Programowanie równoległe na architekturach wielordzeniowych (2023-24)
Kursy Online
- H. A. Mojeed
- P. Czarnul
- R. Kałaska
Kurs poświęcony zagadnieniom programowania równoległego na maszynach z pamięcią współdzieloną, w tym na wielordzeniowych CPU oraz GPU.
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
Publikacja
- T. M. Boiński
- P. Czarnul
- COMPUTER JOURNAL - Rok 2021
In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...

Pełny tekst do pobrania w portalu
Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA
Publikacja
- M. J. Adiletta
- J. J. Tithi
- E. Farsarakis
- G. Gerogiannis
- R. Adolf
- R. Benke
- S. Kashyap
- S. Hsia
- K. Lakhotia
- F. Petrini... i 2 innych
- Rok 2023
Large-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems is challenging due to a large memory footprint, sparse computational patterns, and irregular memory accesses with poor locality. Intel’s Programmable Integrated Unffied Memory Architecture (PIUMA) is designed to address these challenges for graph analytics. In this paper, a detailed characterization of GCNs is presented using the Open-Graph Benchmark...

Pełny tekst do pobrania w serwisie zewnętrznym
Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA
Publikacja
- A. Dziekoński
- Rok 2015
Celem niniejszej rozprawy oraz stypendium odbytego w ramach projektu było opracowanie numerycznie efektywnego rozwiązania algorytmicznego i sprzętowego, które umożliwia przyspieszenie analizy problemów elektromagnetycznych metodą elementów skończonych (MES) z funkcjami bazowymi wysokiego rzędu. Metoda elementów skończonych w dziedzinie częstotliwości stanowi wydajne i uniwersalne narzędzie analizy układów mikrofalowych (rys....
Implementation of FDTD-Compatible Green's Function on Graphics Processing Unit
Publikacja
- T. Stefański
- K. Krzyżanowska
- IEEE Antennas and Wireless Propagation Letters - Rok 2012
In this letter, implementation of the finite-difference time domain (FDTD)-compatible Green's function on a graphics processing unit (GPU) is presented. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates its applications in the FDTD simulations of radiation and scattering problems. Unfortunately, implementation of the new DGF formula in software requires a multiple precision...

Pełny tekst do pobrania w serwisie zewnętrznym
Generation of large finite-element matrices on multiple graphics processors
Publikacja
- INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING - Rok 2013
This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...

Pełny tekst do pobrania w serwisie zewnętrznym
Piotr Szczuko dr hab. inż.

Osoby

Katedra Systemów Multimedialnych

Dr hab. inż. Piotr Szczuko w 2002 roku ukończył studia na Wydziale Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej zdobywając tytuł magistra inżyniera. Tematem pracy dyplomowej było badanie zjawisk jednoczesnej percepcji obrazu cyfrowego i dźwięku dookólnego. W roku 2008 obronił rozprawę doktorską zatytułowaną "Zastosowanie reguł rozmytych w komputerowej animacji postaci", za którą otrzymał nagrodę Prezesa Rady...
ZASTOSOWANIA DRONÓW I SENSORÓW WIZYJNYCH I AKUSTYCZNYCH DO ZDALNEJ DETEKCJI I LOKALIZACJI OBIEKTÓW I ZDARZEŃ
Publikacja
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2016
W referacie przedstawiono wybrane sensory akustyczne i wizyjne i propozycje ich zastosowania do wykrywania i lokalizacji obiektów i zdarzeń z pokładu drona. Opisano pokrótce zastosowane algorytmy analizy strumieni, przedstawiono wyniki badań stworzonych prototypów i metod, zaimplementowanych na wydajnych układach GPU
Towards an efficient multi-stage Riemann solver for nuclear physics simulations
Publikacja
- S. Cygert
- J. Porter-Sobieraj
- D. Kikoła
- J. Sikorski
- M. Słodkowski
- Rok 2013
Relativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....

Pełny tekst do pobrania w serwisie zewnętrznym
Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn - ciało stałe
Publikacja
- A. Butterweck
- M. H. Ghaemi
- Mechanik - Rok 2011
W artykule po przedstawieniu podstawowych informacji na temat technologii GPGPU oraz struktury NVIDIA CUDA opisano równania zachowania rządzące przepływami oraz ich dyskretyzację numeryczna. Następnie zbadano możliwości wykorzystania technologii GPGPU w celu zoptymalizowania czasu wykonywania obliczeń numerycznych przepływu przez ośrodek dwufazowy (płyn - cząsteczki ciała stała stałego) zbliżony do ośrodka porowatego. W tym celu,...
Nowoczesne koncepcje integracji usług w systemie BeesyCluster
Publikacja
- P. Czarnul
- Rok 2010
Opisano funkcje aktualnej wersji systemu BeesyCluster jakowarstwy pośredniej w dostępie do rozproszonych zasobów wraz podsystemami integracji usług, wyboru usług oraz ich wykonania. Zaprezentowano rozszerzenia podsystemu integracji usług zorientowane na green computing. Omówiono problemy inteligentnego wyszukiwania usług, wykorzystanie GPU, współpracę z urządzeniami mobilnymi oraz przetwarzanie w przestrzeniach inteligentnych.Dodatkowo...
Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn-ciało stałe
Publikacja
- A. Butterweck
- M. H. Ghaemi
- Rok 2011
W artykule po przedstawieniu podstawowych informacji na temat technologii GPGPU oraz struktury NVIDIA CUDA opisano równania zachowania rządzące przepływami oraz ich dyskretyzację numeryczna. Następnie zbadano możliwości wykorzystania technologii GPGPU w celu zoptymalizowania czasu wykonywania obliczeń numerycznych przepływu przez ośrodek dwufazowy (płyn - cząsteczki ciała stała stałego) zbliżony do ośrodka porowatego. W tym celu,...
Parallel multithread computing for spectroscopic analysis in optical coherence tomography
Publikacja
- Rok 2014
Spectroscopic Optical Coherence Tomography (SOCT) is an extension of Optical Coherence Tomography (OCT). It allows gathering spectroscopic information from individual scattering points inside the sample. It is based on time-frequency analysis of interferometric signals. Such analysis requires calculating hundreds of Fourier transforms while performing a single A-scan. Additionally, further processing of acquired spectroscopic information...

Pełny tekst do pobrania w serwisie zewnętrznym
A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems
Publikacja
- Rok 2014
Modern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...
The impact of the AC922 Architecture on Performance of Deep Neural Network Training
Publikacja
- P. Rościszewski
- M. Iwański
- P. Czarnul
- Rok 2020
Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report...

Pełny tekst do pobrania w serwisie zewnętrznym
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
Publikacja
- ENERGIES - Rok 2023
High-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...

Pełny tekst do pobrania w portalu
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Publikacja
- P. Rościszewski
- J. Kaliski
- Rok 2017
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Pełny tekst do pobrania w serwisie zewnętrznym
Neural Architecture Search for Skin Lesion Classification
Publikacja
- IEEE Access - Rok 2020
Deep neural networks have achieved great success in many domains. However, successful deployment of such systems is determined by proper manual selection of the neural architecture. This is a tedious and time-consuming process that requires expert knowledge. Different tasks need very different architectures to obtain satisfactory results. The group of methods called the neural architecture search (NAS) helps to find effective architecture...

Pełny tekst do pobrania w portalu
Advanced Potential Energy Surfaces for Molecular Simulation
Publikacja
- A. Albaugh
- H. Boateng
- R. Bradshaw
- O. Demerdash
- J. Dziedzic
- Y. Mao
- D. Margul
- J. Swails
- Q. Zeng
- D. Case... i 10 innych
- JOURNAL OF PHYSICAL CHEMISTRY B - Rok 2016
Advanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models...

Pełny tekst do pobrania w portalu
Comparing Apples and Oranges: A Mobile User Experience Study of iOS and Android Consumer Devices
Publikacja
- P. Falkowski-Gilski
- T. Uhl
- Rok 2023
With the rapid development of wireless networks and the spread of broadband access around the world, the number of active mobile user devices continues to grow. Each year more and more terminals are released on the market, with the smartphone being the most popular among them. They include low-end, mid-range, and of course high-end devices, with top hardware specifications. They do vary in build quality, utilized type of material,...

Pełny tekst do pobrania w serwisie zewnętrznym
Krzysztof Bikonis dr inż.

Osoby

Katedra Systemów Geoinformatycznych
Modeling of Performance, Reliability and Energy Efficiency in Large-Scale Computational Environment
Publikacja
- J. Kuchta
- Rok 2016
Large scale of complexity of distributed computational systems imposes special challanges for prediction of quality in such systems.Existing quality models for lower-scale systems include functionality,performance,reliability,flexibility and usability.Among these attributes,performance and reliability have a particular significance to the large-scale systems computing quality modeling due to their strong dependence on the system...
Block-based Representation of Application Execution on Modern Parallel Systems
Publikacja
- P. Czarnul
- Rok 2013
The chapter presents how to model execution of a parallel computational application that is to be executed in a large-scale parallel or distributed environment with potentially thousands to millions of execution units. The representation uses pre- viously attributes and factors representative of modern high performance systems including multicore CPUs, GPUs, dedicated accelerators such as Intel Phi.
Krylov Space Iterative Solvers on Graphics Processing Units
Publikacja
- A. Dziekoński
- M. Mrozowski
- Rok 2010
CUDA architecture was introduced by Nvidia three years ago and since then there have been many promising publications demonstrating a huge potential of Graphics Processing Units (GPUs) in scientific computations. In this paper, we investigate the performance of iterative methods such as cg, minres, gmres, bicg that may be used to solve large sparse real and complex systems of equations arising in computational electromagnetics.

Pełny tekst do pobrania w serwisie zewnętrznym
Mobile Cloud computing architecture for massively parallelizablegeometric computation
Publikacja
- V. Sánchez Ribes
- H. Mora-Mora
- A. Sobecki
- F. José Mora Gimeno
- COMPUTERS IN INDUSTRY - Rok 2020
Cloud Computing is one of the most disruptive technologies of this century. This technology has been widely adopted in many areas of the society. In the field of manufacturing industry, it can be used to provide advantages in the execution of the complex geometric computation algorithms involved on CAD/CAM processes. The idea proposed in this research consists in outsourcing part of the load to be com- puted in the client machines...

Pełny tekst do pobrania w portalu
Wykorzystanie technologii CUDA do kompresji w czasie rzeczywistym danych pochodzących z sonarów wielowiązkowych.
Publikacja
- A. Chybicki
- K. Laskowski
- M. Moszyński
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010
W pracy przedstawiono projekt oraz implementację systemu przeznaczonego do kompresji danych z sonarów wielowiązkowych działającego z wykorzystaniem technologii CUDA. Omówiono oraz zastosowano metody bezstratnej kompresji danych oraz techniki przetwarzania równoległego. Stworzoną aplikację przetestowano pod kątem prędkości i stopnia kompresji oraz porównano z innymi rozwiązaniami umożliwiającymi kompresję tego typu informacji.

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: gpu

Paweł Czarnul dr hab. inż.

Piotr Szczuko dr hab. inż.

Krzysztof Bikonis dr inż.