Wyniki wyszukiwania dla: gpu

Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform

Publikacja

- Rok 2017

Performance evaluation of selected complex video processing algorithms, implemented on a parallel, embedded GPU platform Tegra X1, is presented. Three algorithms were chosen for evaluation: a GMM-based object detection algorithm, a particle filter tracking algorithm and an optical flow based algorithm devoted to people counting in a crowd flow. The choice of these algorithms was based on their computational complexity and parallel...

Pełny tekst do pobrania w serwisie zewnętrznym

Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem

Publikacja

Ł. Syrocki
G. Pestka

- Open Computer Science - Rok 2016

Pełny tekst do pobrania w serwisie zewnętrznym

Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping

Publikacja

- Rok 2024

In this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...

Pełny tekst do pobrania w serwisie zewnętrznym

Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation

Publikacja

- IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES - Rok 2017

This paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral meshes that lead to sparse matrices too large to be dealt with on a typical workstation using direct methods. We discuss the problem of rapid matrix generation and assembly, as well as...

Pełny tekst do pobrania w serwisie zewnętrznym

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Publikacja

P. Czarnul

- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2023

In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...

Pełny tekst do pobrania w serwisie zewnętrznym

Jacobi and gauss-seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method

Publikacja

- Rok 2010

In this paper two implementations of iterative solvers for solving complex symmetric and sparse systems resulting from finite element method applied to wave equation are discussed. The problem under investigation is a dielectric resonator antenna (DRA) discretized by FEM with vector elements of the second order (LT/QN). The solvers use the preconditioned conjugate gradient (pcg) method implemented on Graphics Processing Unit (GPU)...

Pełny tekst do pobrania w serwisie zewnętrznym

GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]

Publikacja

- IEEE ANTENNAS AND PROPAGATION MAGAZINE - Rok 2014

This paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...

Pełny tekst do pobrania w serwisie zewnętrznym

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Publikacja

- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023

In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym

Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams

Publikacja

P. Czarnul

- COMPUTING AND INFORMATICS - Rok 2020

The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...

Pełny tekst do pobrania w portalu

UNRES-GPU for Physics-Based Coarse-Grained Simulations of Protein Systems at Biological Time- and Size-Scales

Publikacja

- BIOINFORMATICS - Rok 2023

The dynamics of the virus like particles (VLPs) corresponding to the GII.4 Houston, GII.2 SMV, and GI.1 Norwalk strains of human noroviruses (HuNoV) that cause gastroenteritis was investigated by means of long-time (about 30 μs in the laboratory timescale) molecular dynamics simulations with the coarse-grained UNRES force field. The main motion of VLP units turned out to be the bending at the junction between the P1 subdomain (that...

Pełny tekst do pobrania w portalu

Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge

Publikacja

- Rok 2020

Auto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...

Pełny tekst do pobrania w portalu

GPU Power Capping for Energy-Performance Trade-Offs in Training of Deep Convolutional Neural Networks for Image Recognition

Publikacja

- Rok 2022

In the paper we present performance-energy trade-off investigation of training Deep Convolutional Neural Networks for image recognition. Several representative and widely adopted network models, such as Alexnet, VGG-19, Inception V3, Inception V4, Resnet50 and Resnet152 were tested using systems with Nvidia Quadro RTX 6000 as well as Nvidia V100 GPUs. Using GPU power capping we found other than default configurations minimizing...

Pełny tekst do pobrania w portalu

A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM

Publikacja

- IEEE Access - Rok 2018

The paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...

Pełny tekst do pobrania w portalu

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations

Publikacja

- IEEE Antennas and Wireless Propagation Letters - Rok 2011

This letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow...

Pełny tekst do pobrania w serwisie zewnętrznym

GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM

Publikacja

- Communications in Computational Physics - Rok 2017

This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...

Pełny tekst do pobrania w serwisie zewnętrznym

Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method

Publikacja

- RADIOENGINEERING - Rok 2018

This paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...

Pełny tekst do pobrania w portalu

Using GPUs for Parallel Stencil Computations in Relativistic Hydrodynamic Simulation

Publikacja

S. Cygert
D. Kikoła
J. Porter-Sobieraj
J. Sikorski
M. Słodkowski

- Rok 2014

This paper explores the possibilities of using a GPU for complex 3D finite difference computation. We propose a new approach to this topic using surface memory and compare it with 3D stencil computations carried out via shared memory, which is currently considered to be the best approach. The case study was performed for the extensive computation of collisions between heavy nuclei in terms of relativistic hydrodynamics.

Pełny tekst do pobrania w serwisie zewnętrznym

Quality of Cryptocurrency Mining on Previous Generation NVIDIA GTX GPUs

Publikacja

- Rok 2022

Currently, there is a lot of previous generation NVIDIA GTX graphical processing units (GPUs) available on the market, which were ousted from by next-gen RTX units. Due to this fact, numerous fully-operational devices remain underused, which are available at an affordable price. First, this paper presents an analysis of the cryptocurrency market. Next, in this context, the results of research on the performance of NVIDIA graphics...

Pełny tekst do pobrania w serwisie zewnętrznym

Benchmarking overlapping communication and computations with multiple streams for modern GPUs

Publikacja

P. Czarnul

- Annals of Computer Science and Information Systems - Rok 2018

The paper presents benchmarking a multi-stream application processing a set of input data arrays. Tests have been performed and execution times measured for various numbers of streams and various compute intensities measured as the ratio of kernel compute time and data transfer time. As such, the application and benchmarking is representative of frequently used operations such as vector weighted sum, matrix multiplication etc....

Pełny tekst do pobrania w portalu

Performance/energy aware optimization of parallel applications on GPUs under power capping

Publikacja

- Rok 2020

In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Pełny tekst do pobrania w portalu

Filtry

Katalog

Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform

Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem

Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping

Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Jacobi and gauss-seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method

GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams

UNRES-GPU for Physics-Based Coarse-Grained Simulations of Protein Systems at Biological Time- and Size-Scales

Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge

GPU Power Capping for Energy-Performance Trade-Offs in Training of Deep Convolutional Neural Networks for Image Recognition

A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations

GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM

Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method

Using GPUs for Parallel Stencil Computations in Relativistic Hydrodynamic Simulation

Quality of Cryptocurrency Mining on Previous Generation NVIDIA GTX GPUs

Benchmarking overlapping communication and computations with multiple streams for modern GPUs

Performance/energy aware optimization of parallel applications on GPUs under power capping

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: gpu