Wyniki wyszukiwania dla: gpu

Finite element matrix generation on a GPU

Publikacja

- Progress in Electromagnetics Research-PIER - Rok 2012

This paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a Fermi GPU (1x Tesla C2075) and a CPU (2x twelve-core...

Pełny tekst do pobrania w serwisie zewnętrznym

Tuning matrix-vector multiplication on GPU

Publikacja

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010

A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...

GPU-accelerated finite element method

Publikacja

- Rok 2016

In this paper the results of the acceleration of computations involved in analysing electromagnetic problems by means of the finite element method (FEM), obtained with graphics processors (GPU), are presented. A 4.7-fold acceleration was achieved thanks to the massive parallelization of the most time-consuming steps of FEM, namely finite-element matrix-generation and the solution of a sparse system of linear equations with the...

Pełny tekst do pobrania w serwisie zewnętrznym

Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool

Publikacja

- Future Generation Computer Systems-The International Journal of Grid Computing-Theory Methods and Applications - Rok 2023

GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...

Pełny tekst do pobrania w serwisie zewnętrznym

Modelling and simulation of GPU processing in the MERPSYS environment

Publikacja

- Scalable Computing: Practice and Experience - Rok 2018

In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...

Pełny tekst do pobrania w portalu

Performance evaluation of parallel background subtraction on GPU platforms

Publikacja

G. Szwoch

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2015

Implementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...

Pełny tekst do pobrania w serwisie zewnętrznym

A memory efficient and fast sparse matrix vector product on a Gpu

Publikacja

- Progress in Electromagnetics Research-PIER - Rok 2011

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Pełny tekst do pobrania w serwisie zewnętrznym

Acceleration of the DGF-FDTD method on GPU using the CUDA technology

Publikacja

- Rok 2015

We present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...

Pełny tekst do pobrania w serwisie zewnętrznym

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Publikacja

P. Czarnul

- JOURNAL OF SUPERCOMPUTING - Rok 2018

The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu

Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology

Publikacja

- Rok 2016

The discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...

Pełny tekst do pobrania w serwisie zewnętrznym

Parallel Background Subtraction in Video Streams Using OpenCL on GPU Platforms

Publikacja

G. Szwoch

- Rok 2014

Implementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU architecture, such as memory access,...

Pełny tekst do pobrania w serwisie zewnętrznym

Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment

Publikacja

- Rok 2014

The paper presents design, implementation and real life uses of a visualization subsystem for a distributed framework for parallelization of workflow-based computations among clusters with nodes that feature both CPUs and GPUs. Firstly, the proposed system presents a graphical view of the infrastructure with clusters, nodes and compute devices along with parameters and runtime graphs of load, memory available, fan speeds etc. Secondly,...

Pełny tekst do pobrania w serwisie zewnętrznym

TensorHive: Management of Exclusive GPU Access for Distributed Machine Learning Workloads

Publikacja

- JOURNAL OF MACHINE LEARNING RESEARCH - Rok 2021

TensorHive is a tool for organizing work of research and engineering teams that use servers with GPUs for machine learning workloads. In a comprehensive web interface, it supports reservation of GPUs for exclusive usage, hardware monitoring, as well as configuring, executing and queuing distributed computational jobs. Focusing on easy installation and simple configuration, the tool automatically detects the available computing...

Pełny tekst do pobrania w portalu

Accuracy, Memory and Speed Strategies in GPU-based Finite-Element Matrix-Generation

Publikacja

- IEEE Antennas and Wireless Propagation Letters - Rok 2012

This paper presents strategies on how to optimize GPU-based finite-element matrix-generation that occurs in the finite-element method (FEM) using higher order curvilinear elements. The goal of the optimization is to increase the speed of evaluation and assembly of large finite-element matrices on a single GPU (Graphics Processing Unit) while maintaining the accuracy of numerical integration at the desired level. For this reason,...

Pełny tekst do pobrania w serwisie zewnętrznym

GPU based implementation of Temperature-Vegetation Dryness Index for AVHRR3 Satellite Data

Publikacja

- Rok 2014

Paper presents an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland. All phases of TVDI implementation on GPU are modified in respect to CUDA platform....

A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU

Publikacja

- SIAM JOURNAL ON SCIENTIFIC COMPUTING - Rok 2015

In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....

Pełny tekst do pobrania w serwisie zewnętrznym

GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method

Publikacja

- IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS - Rok 2011

The letter discusses a fast implementation of the conjugate gradient iterative method with ${rm E}$-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results...

Pełny tekst do pobrania w serwisie zewnętrznym

GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method

Publikacja

- RADIOENGINEERING - Rok 2017

This paper discusses a strategy for speeding up the mesh deformation process in the design-byoptimization of high-frequency components involving electromagnetic field simulations using the 3D finite element method (FEM). The mesh deformation is assumed to be described by a linear elasticity model of a rigid body; therefore, each time the shape of the device is changed, an auxiliary elasticity finite-element problem must be solved....

Pełny tekst do pobrania w portalu

Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system

Publikacja

T. Stefański

- Progress in Electromagnetics Research-PIER - Rok 2013

This paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...

Pełny tekst do pobrania w serwisie zewnętrznym

Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics

Publikacja

- IEEE Antennas and Wireless Propagation Letters - Rok 2018

In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Pełny tekst do pobrania w serwisie zewnętrznym

Filtry

Katalog

Kategoria

Rok

Opcje

Finite element matrix generation on a GPU

Tuning matrix-vector multiplication on GPU

GPU-accelerated finite element method

Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool

Modelling and simulation of GPU processing in the MERPSYS environment

Performance evaluation of parallel background subtraction on GPU platforms

A memory efficient and fast sparse matrix vector product on a Gpu

Acceleration of the DGF-FDTD method on GPU using the CUDA technology

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology

Parallel Background Subtraction in Video Streams Using OpenCL on GPU Platforms

Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment

TensorHive: Management of Exclusive GPU Access for Distributed Machine Learning Workloads

Accuracy, Memory and Speed Strategies in GPU-based Finite-Element Matrix-Generation

GPU based implementation of Temperature-Vegetation Dryness Index for AVHRR3 Satellite Data

A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU

GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method

GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method

Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system

Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: gpu