Wyniki wyszukiwania dla: MULTI GPU

Multi-GPU UNRES for scalable coarse-grained simulations of very large protein systems

Publikacja

K. Ocetkiewicz
C. Czaplewski
H. Krawczyk
A. Lipska
A. Liwo
J. Proficz
A. K. Sieradzan
P. Czarnul

- COMPUTER PHYSICS COMMUNICATIONS - Rok 2024

Graphical Processor Units (GPUs) are nowadays widely used in all-atom molecular simulations because of the advantage of efficient partitioning of atom pairs between the kernels to compute the contributions to energy and forces, thus enabling the treatment of very large systems. Extension of time- and size-scale of computations is also sought through the development of coarse-grained (CG) models, in which atoms are merged into extended...

Pełny tekst do pobrania w serwisie zewnętrznym

Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping

Publikacja

- Rok 2024

In this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...

Pełny tekst do pobrania w serwisie zewnętrznym

Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation

Publikacja

- IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES - Rok 2017

This paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral meshes that lead to sparse matrices too large to be dealt with on a typical workstation using direct methods. We discuss the problem of rapid matrix generation and assembly, as well as...

Pełny tekst do pobrania w serwisie zewnętrznym

Multi-GPU-powered UNRES package for physics-based coarse-grained simulations of structure, dynamics, and thermodynamics of protein systems at biological size- and timescales

Publikacja

- BIOPHYSICAL JOURNAL - Rok 2024

Coarse-grained models are nowadays extensively used in biomolecular simulations owing to the tremendous extension of size- and time-scale of simulations. The physics-based UNRES (UNited RESidue) model of proteins developed in our laboratory has only two interaction sites per amino-acid residue (united peptide groups and united side chains) and implicit solvent. However, owing to rigorous physics-based derivation, which enabled...

Pełny tekst do pobrania w serwisie zewnętrznym

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Publikacja

P. Czarnul

- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2023

In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...

Pełny tekst do pobrania w portalu

Finite element matrix generation on a GPU

Publikacja

- Progress in Electromagnetics Research-PIER - Rok 2012

This paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a Fermi GPU (1x Tesla C2075) and a CPU (2x twelve-core...

Pełny tekst do pobrania w serwisie zewnętrznym

Multi-core and Multiprocessor Implementation of Numerical Integration in Finite Element Method

Publikacja

- Rok 2012

The paper presents techniques for accelerating a numerical integration process which appears in the Finite Element Method. The acceleration is achieved by taking advantages of multi-core and multiprocessor devices. It is shown that using multi-core implementation with OpenMP and a GPU acceleration using CUDA architecture allows one to achieve the speedups by a factor of 5 and 10 on a CPU and GPUs, respectively.

GPU based implementation of Temperature-Vegetation Dryness Index for AVHRR3 Satellite Data

Publikacja

- Rok 2014

Paper presents an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland. All phases of TVDI implementation on GPU are modified in respect to CUDA platform....

Implementation of TVDI calculation for coastal zone

Publikacja

T. Bieliński

- Rok 2015

Paper will show an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland, especially from region of Gdańsk coastal zone. All phases of TVDI implementation...

Towards an efficient multi-stage Riemann solver for nuclear physics simulations

Publikacja

S. Cygert
J. Porter-Sobieraj
D. Kikoła
J. Sikorski
M. Słodkowski

- Rok 2013

Relativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....

Pełny tekst do pobrania w serwisie zewnętrznym

Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming

Publikacja

- COMPUTER JOURNAL - Rok 2021

In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...

Pełny tekst do pobrania w portalu

Performance evaluation of the parallel object tracking algorithm employing the particle filter

Publikacja

G. Szwoch

- Rok 2016

An algorithm based on particle filters is employed to track moving objects in video streams from fixed and non-fixed cameras. Particle weighting is based on color histograms computed in the iHLS color space. Particle computations are parallelized with CUDA framework. The algorithm was tested on various GPU devices: a desktop GPU card, a mobile chipset and two embedded GPU platforms. The processing speed depending on the number...

GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]

Publikacja

- IEEE ANTENNAS AND PROPAGATION MAGAZINE - Rok 2014

This paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...

Pełny tekst do pobrania w serwisie zewnętrznym

A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM

Publikacja

- IEEE Access - Rok 2018

The paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...

Pełny tekst do pobrania w portalu

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Publikacja

P. Czarnul

- JOURNAL OF SUPERCOMPUTING - Rok 2018

The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu

Jacobi and gauss-seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method

Publikacja

- Rok 2010

In this paper two implementations of iterative solvers for solving complex symmetric and sparse systems resulting from finite element method applied to wave equation are discussed. The problem under investigation is a dielectric resonator antenna (DRA) discretized by FEM with vector elements of the second order (LT/QN). The solvers use the preconditioned conjugate gradient (pcg) method implemented on Graphics Processing Unit (GPU)...

Pełny tekst do pobrania w serwisie zewnętrznym

Accuracy, Memory and Speed Strategies in GPU-based Finite-Element Matrix-Generation

Publikacja

- IEEE Antennas and Wireless Propagation Letters - Rok 2012

This paper presents strategies on how to optimize GPU-based finite-element matrix-generation that occurs in the finite-element method (FEM) using higher order curvilinear elements. The goal of the optimization is to increase the speed of evaluation and assembly of large finite-element matrices on a single GPU (Graphics Processing Unit) while maintaining the accuracy of numerical integration at the desired level. For this reason,...

Pełny tekst do pobrania w serwisie zewnętrznym

Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method

Publikacja

- RADIOENGINEERING - Rok 2018

This paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...

Pełny tekst do pobrania w portalu

How to render FDTD computations more effective using agraphics accelerator.

Publikacja

- IEEE TRANSACTIONS ON MAGNETICS - Rok 2009

Graphics processing units (GPUs) for years have been dedicated mostly to real time rendering. Recently leading GPU manufactures have extended their research area and decided to support also graphics computing. In this paper, we describe an impact of new GPU features on development process of an efficient finite difference time domain (FDTD) implementation.

Pełny tekst do pobrania w serwisie zewnętrznym

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Publikacja

- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023

In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym

Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system

Publikacja

T. Stefański

- Progress in Electromagnetics Research-PIER - Rok 2013

This paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...

Pełny tekst do pobrania w serwisie zewnętrznym

Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology

Publikacja

- Rok 2016

The discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...

Pełny tekst do pobrania w serwisie zewnętrznym

Acceleration of the DGF-FDTD method on GPU using the CUDA technology

Publikacja

- Rok 2015

We present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...

Pełny tekst do pobrania w serwisie zewnętrznym

Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams

Publikacja

P. Czarnul

- COMPUTING AND INFORMATICS - Rok 2020

The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...

Pełny tekst do pobrania w portalu

Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA

Publikacja

M. J. Adiletta
J. J. Tithi
E. Farsarakis
G. Gerogiannis
R. Adolf
R. Benke
S. Kashyap
S. Hsia
K. Lakhotia
F. Petrini... i 2 innych

- Rok 2023

Large-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems is challenging due to a large memory footprint, sparse computational patterns, and irregular memory accesses with poor locality. Intel’s Programmable Integrated Unffied Memory Architecture (PIUMA) is designed to address these challenges for graph analytics. In this paper, a detailed characterization of GCNs is presented using the Open-Graph Benchmark...

Pełny tekst do pobrania w serwisie zewnętrznym

Parallel Background Subtraction in Video Streams Using OpenCL on GPU Platforms

Publikacja

G. Szwoch

- Rok 2014

Implementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU architecture, such as memory access,...

Pełny tekst do pobrania w serwisie zewnętrznym

Tuning matrix-vector multiplication on GPU

Publikacja

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010

A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...

Implementation of FDTD-Compatible Green's Function on Graphics Processing Unit

Publikacja

T. Stefański
K. Krzyżanowska

- IEEE Antennas and Wireless Propagation Letters - Rok 2012

In this letter, implementation of the finite-difference time domain (FDTD)-compatible Green's function on a graphics processing unit (GPU) is presented. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates its applications in the FDTD simulations of radiation and scattering problems. Unfortunately, implementation of the new DGF formula in software requires a multiple precision...

Pełny tekst do pobrania w serwisie zewnętrznym

Generation of large finite-element matrices on multiple graphics processors

Publikacja

- INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING - Rok 2013

This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...

Pełny tekst do pobrania w serwisie zewnętrznym

ZASTOSOWANIA DRONÓW I SENSORÓW WIZYJNYCH I AKUSTYCZNYCH DO ZDALNEJ DETEKCJI I LOKALIZACJI OBIEKTÓW I ZDARZEŃ

Publikacja

- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2016

W referacie przedstawiono wybrane sensory akustyczne i wizyjne i propozycje ich zastosowania do wykrywania i lokalizacji obiektów i zdarzeń z pokładu drona. Opisano pokrótce zastosowane algorytmy analizy strumieni, przedstawiono wyniki badań stworzonych prototypów i metod, zaimplementowanych na wydajnych układach GPU

Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics

Publikacja

- IEEE Antennas and Wireless Propagation Letters - Rok 2018

In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Pełny tekst do pobrania w serwisie zewnętrznym

Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge

Publikacja

- Rok 2020

Auto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...

Pełny tekst do pobrania w portalu

Using GPUs for Parallel Stencil Computations in Relativistic Hydrodynamic Simulation

Publikacja

S. Cygert
D. Kikoła
J. Porter-Sobieraj
J. Sikorski
M. Słodkowski

- Rok 2014

This paper explores the possibilities of using a GPU for complex 3D finite difference computation. We propose a new approach to this topic using surface memory and compare it with 3D stencil computations carried out via shared memory, which is currently considered to be the best approach. The case study was performed for the extensive computation of collisions between heavy nuclei in terms of relativistic hydrodynamics.

Pełny tekst do pobrania w serwisie zewnętrznym

Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations

Publikacja

- IEEE Antennas and Wireless Propagation Letters - Rok 2011

This letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow...

Pełny tekst do pobrania w serwisie zewnętrznym

GPU-accelerated finite element method

Publikacja

- Rok 2016

In this paper the results of the acceleration of computations involved in analysing electromagnetic problems by means of the finite element method (FEM), obtained with graphics processors (GPU), are presented. A 4.7-fold acceleration was achieved thanks to the massive parallelization of the most time-consuming steps of FEM, namely finite-element matrix-generation and the solution of a sparse system of linear equations with the...

Pełny tekst do pobrania w serwisie zewnętrznym

Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool

Publikacja

- Future Generation Computer Systems-The International Journal of Grid Computing-Theory Methods and Applications - Rok 2023

GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...

Pełny tekst do pobrania w serwisie zewnętrznym

GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method

Publikacja

- IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS - Rok 2011

The letter discusses a fast implementation of the conjugate gradient iterative method with ${rm E}$-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results...

Pełny tekst do pobrania w serwisie zewnętrznym

Nowoczesne koncepcje integracji usług w systemie BeesyCluster

Publikacja

P. Czarnul

- Rok 2010

Opisano funkcje aktualnej wersji systemu BeesyCluster jakowarstwy pośredniej w dostępie do rozproszonych zasobów wraz podsystemami integracji usług, wyboru usług oraz ich wykonania. Zaprezentowano rozszerzenia podsystemu integracji usług zorientowane na green computing. Omówiono problemy inteligentnego wyszukiwania usług, wykorzystanie GPU, współpracę z urządzeniami mobilnymi oraz przetwarzanie w przestrzeniach inteligentnych.Dodatkowo...

Performance evaluation of parallel background subtraction on GPU platforms

Publikacja

G. Szwoch

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2015

Implementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...

Pełny tekst do pobrania w serwisie zewnętrznym

Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA

Publikacja

A. Dziekoński

- Rok 2015

Celem niniejszej rozprawy oraz stypendium odbytego w ramach projektu było opracowanie numerycznie efektywnego rozwiązania algorytmicznego i sprzętowego, które umożliwia przyspieszenie analizy problemów elektromagnetycznych metodą elementów skończonych (MES) z funkcjami bazowymi wysokiego rzędu. Metoda elementów skończonych w dziedzinie częstotliwości stanowi wydajne i uniwersalne narzędzie analizy układów mikrofalowych (rys....

GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM

Publikacja

- Communications in Computational Physics - Rok 2017

This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...

Pełny tekst do pobrania w serwisie zewnętrznym

GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method

Publikacja

- RADIOENGINEERING - Rok 2017

This paper discusses a strategy for speeding up the mesh deformation process in the design-byoptimization of high-frequency components involving electromagnetic field simulations using the 3D finite element method (FEM). The mesh deformation is assumed to be described by a linear elasticity model of a rigid body; therefore, each time the shape of the device is changed, an auxiliary elasticity finite-element problem must be solved....

Pełny tekst do pobrania w portalu

Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs

Publikacja

A. Dziekoński
G. Fotyga
M. Mrozowski

- IEEE Access - Rok 2018

This paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...

Pełny tekst do pobrania w portalu

A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems

Publikacja

- Rok 2014

Modern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...

Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform

Publikacja

- Rok 2017

Performance evaluation of selected complex video processing algorithms, implemented on a parallel, embedded GPU platform Tegra X1, is presented. Three algorithms were chosen for evaluation: a GMM-based object detection algorithm, a particle filter tracking algorithm and an optical flow based algorithm devoted to people counting in a crowd flow. The choice of these algorithms was based on their computational complexity and parallel...

Pełny tekst do pobrania w serwisie zewnętrznym

Sign Language Recognition Using Convolution Neural Networks

Publikacja

- Rok 2024

The objective of this work was to provide an app that can automatically recognize hand gestures from the American Sign Language (ASL) on mobile devices. The app employs a model based on Convolutional Neural Network (CNN) for gesture classification. Various CNN architectures and optimization strategies suitable for devices with limited resources were examined. InceptionV3 and VGG-19 models exhibited negligibly higher accuracy than...

Pełny tekst do pobrania w portalu

Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs

Publikacja

- Rok 2022

In the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...

Pełny tekst do pobrania w portalu

Performance/energy aware optimization of parallel applications on GPUs under power capping

Publikacja

- Rok 2020

In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Pełny tekst do pobrania w portalu

A memory efficient and fast sparse matrix vector product on a Gpu

Publikacja

- Progress in Electromagnetics Research-PIER - Rok 2011

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Pełny tekst do pobrania w serwisie zewnętrznym

Cryptocurrencies as a Speculative Asset: How Much Uncertainty is Included in Cryptocurrency Price?

Publikacja

T. Ahsan
K. Zawadzki
K. Mubashir

- SAGE Open - Rok 2024

The aim of this paper is to examine the relationship between uncertainty indices (Geopolitical Uncertainty Index and Global Economic Policy Uncertainty Index) and cryptocurrencies. This study evaluated the behavior of cryptocurrencies with the evolution of uncertainties (GPU, EPU) on returns and volatility in terms of safe heaven as in traditional specualtive assets it increases their volaitility and reduces risk. For this purpose,...

Pełny tekst do pobrania w portalu

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: MULTI GPU