Wyniki wyszukiwania dla: CUDA TECHNOLOGY

Acceleration of the DGF-FDTD method on GPU using the CUDA technology

Publikacja

- Rok 2015

We present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...

Pełny tekst do pobrania w serwisie zewnętrznym

Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology

Publikacja

- Rok 2016

The discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...

Pełny tekst do pobrania w serwisie zewnętrznym

High performance filtering for big datasets from Airborne Laser Scanning with CUDA technology

Publikacja

W. Błaszczak-bąk
A. Janowski
P. Srokosz

- SURVEY REVIEW - Rok 2018

There are many studies on the problems of processing big datasets provided by Airborne Laser Scanning (ALS). The processing of point clouds is often executed in stages or on the fragments of the measurement set. Therefore, solutions that enable the processing of the entire cloud at the same time in a simple, fast, efficient way are the subject of many researches. In this paper, authors propose to use General-Purpose computation...

Pełny tekst do pobrania w serwisie zewnętrznym

Use of ICT infrastructure for teaching HPC

Publikacja

- Rok 2019

In this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...

Pełny tekst do pobrania w serwisie zewnętrznym

Implementation of FDTD-Compatible Green's Function on Graphics Processing Unit

Publikacja

T. Stefański
K. Krzyżanowska

- IEEE Antennas and Wireless Propagation Letters - Rok 2012

In this letter, implementation of the finite-difference time domain (FDTD)-compatible Green's function on a graphics processing unit (GPU) is presented. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates its applications in the FDTD simulations of radiation and scattering problems. Unfortunately, implementation of the new DGF formula in software requires a multiple precision...

Pełny tekst do pobrania w serwisie zewnętrznym

Parallel multithread computing for spectroscopic analysis in optical coherence tomography

Publikacja

- Rok 2014

Spectroscopic Optical Coherence Tomography (SOCT) is an extension of Optical Coherence Tomography (OCT). It allows gathering spectroscopic information from individual scattering points inside the sample. It is based on time-frequency analysis of interferometric signals. Such analysis requires calculating hundreds of Fourier transforms while performing a single A-scan. Additionally, further processing of acquired spectroscopic information...

Pełny tekst do pobrania w serwisie zewnętrznym

Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams

Publikacja

P. Czarnul

- COMPUTING AND INFORMATICS - Rok 2020

The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...

Pełny tekst do pobrania w portalu

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Publikacja

P. Czarnul

- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2023

In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...

Pełny tekst do pobrania w portalu

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Publikacja

- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023

In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym

Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

Publikacja

- JOURNAL OF SUPERCOMPUTING - Rok 2017

The aim of this paper is to evaluate performance of new CUDA mechanisms—unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically,...

Pełny tekst do pobrania w portalu

Latająca Kawiarenka Naukowa

Publikacja

M. Rucka

- Pismo PG - Rok 2014

W artykule opisano spotkanie Latającej Kawiarenki Naukowej mającej na celu popularyzację nauki z zakresu mechaniki konstrukcji oraz mostów. Kawiarenka zatytułowana „Mosty: cuda architektury i techniki” została zorganizowana przez Akademię Młodych Uczonych PAN oraz Koło Naukowe Mechaniki Budowli KoMBo.

Multi-core and Multiprocessor Implementation of Numerical Integration in Finite Element Method

Publikacja

- Rok 2012

The paper presents techniques for accelerating a numerical integration process which appears in the Finite Element Method. The acceleration is achieved by taking advantages of multi-core and multiprocessor devices. It is shown that using multi-core implementation with OpenMP and a GPU acceleration using CUDA architecture allows one to achieve the speedups by a factor of 5 and 10 on a CPU and GPUs, respectively.

Krylov Space Iterative Solvers on Graphics Processing Units

Publikacja

- Rok 2010

CUDA architecture was introduced by Nvidia three years ago and since then there have been many promising publications demonstrating a huge potential of Graphics Processing Units (GPUs) in scientific computations. In this paper, we investigate the performance of iterative methods such as cg, minres, gmres, bicg that may be used to solve large sparse real and complex systems of equations arising in computational electromagnetics.

Pełny tekst do pobrania w serwisie zewnętrznym

Wykorzystanie technologii CUDA do kompresji w czasie rzeczywistym danych pochodzących z sonarów wielowiązkowych.

Publikacja

A. Chybicki
K. Laskowski
M. Moszyński

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010

W pracy przedstawiono projekt oraz implementację systemu przeznaczonego do kompresji danych z sonarów wielowiązkowych działającego z wykorzystaniem technologii CUDA. Omówiono oraz zastosowano metody bezstratnej kompresji danych oraz techniki przetwarzania równoległego. Stworzoną aplikację przetestowano pod kątem prędkości i stopnia kompresji oraz porównano z innymi rozwiązaniami umożliwiającymi kompresję tego typu informacji.

Towards an efficient multi-stage Riemann solver for nuclear physics simulations

Publikacja

S. Cygert
J. Porter-Sobieraj
D. Kikoła
J. Sikorski
M. Słodkowski

- Rok 2013

Relativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....

Pełny tekst do pobrania w serwisie zewnętrznym

GPU based implementation of Temperature-Vegetation Dryness Index for AVHRR3 Satellite Data

Publikacja

- Rok 2014

Paper presents an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland. All phases of TVDI implementation on GPU are modified in respect to CUDA platform....

Performance evaluation of parallel background subtraction on GPU platforms

Publikacja

G. Szwoch

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2015

Implementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...

Pełny tekst do pobrania w serwisie zewnętrznym

Performance evaluation of the parallel object tracking algorithm employing the particle filter

Publikacja

G. Szwoch

- Rok 2016

An algorithm based on particle filters is employed to track moving objects in video streams from fixed and non-fixed cameras. Particle weighting is based on color histograms computed in the iHLS color space. Particle computations are parallelized with CUDA framework. The algorithm was tested on various GPU devices: a desktop GPU card, a mobile chipset and two embedded GPU platforms. The processing speed depending on the number...

Optimizing the computation of a parallel 3D finite difference algorithm for graphics processing units

Publikacja

J. Porter-Sobieraj
S. Cygert
K. Daniel
J. Sikorski
M. Słodkowski

- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2015

This paper explores the possibilities of using a graphics processing unit for complex 3D finite difference computation via MUSTA‐FORCE and WENO algorithms. We propose a novel algorithm based on the new properties of CUDA surface memory optimized for 2D spatial locality and compare it with 3D stencil computations carried out via shared memory, which is currently considered to be the best approach. A case study was performed for...

Pełny tekst do pobrania w serwisie zewnętrznym

Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool

Publikacja

- Future Generation Computer Systems-The International Journal of Grid Computing-Theory Methods and Applications - Rok 2023

GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...

Pełny tekst do pobrania w serwisie zewnętrznym

Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs

Publikacja

- JOURNAL OF SUPERCOMPUTING - Rok 2019

The paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation,...

Pełny tekst do pobrania w portalu

Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system

Publikacja

T. Stefański

- Progress in Electromagnetics Research-PIER - Rok 2013

This paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...

Pełny tekst do pobrania w serwisie zewnętrznym

Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn - ciało stałe

Publikacja

- Mechanik - Rok 2011

W artykule po przedstawieniu podstawowych informacji na temat technologii GPGPU oraz struktury NVIDIA CUDA opisano równania zachowania rządzące przepływami oraz ich dyskretyzację numeryczna. Następnie zbadano możliwości wykorzystania technologii GPGPU w celu zoptymalizowania czasu wykonywania obliczeń numerycznych przepływu przez ośrodek dwufazowy (płyn - cząsteczki ciała stała stałego) zbliżony do ośrodka porowatego. W tym celu,...

Pełny tekst do pobrania w portalu

Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn-ciało stałe

Publikacja

- Rok 2011

W artykule po przedstawieniu podstawowych informacji na temat technologii GPGPU oraz struktury NVIDIA CUDA opisano równania zachowania rządzące przepływami oraz ich dyskretyzację numeryczna. Następnie zbadano możliwości wykorzystania technologii GPGPU w celu zoptymalizowania czasu wykonywania obliczeń numerycznych przepływu przez ośrodek dwufazowy (płyn - cząsteczki ciała stała stałego) zbliżony do ośrodka porowatego. W tym celu,...

Modelling and simulation of GPU processing in the MERPSYS environment

Publikacja

- Scalable Computing: Practice and Experience - Rok 2018

In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...

Pełny tekst do pobrania w portalu

Parallel Programming for Modern High Performance Computing Systems

Publikacja

P. Czarnul

- Rok 2018

In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Pełny tekst do pobrania w serwisie zewnętrznym

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Publikacja

P. Czarnul

- JOURNAL OF SUPERCOMPUTING - Rok 2018

The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu

Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA

Publikacja

A. Dziekoński

- Rok 2015

Celem niniejszej rozprawy oraz stypendium odbytego w ramach projektu było opracowanie numerycznie efektywnego rozwiązania algorytmicznego i sprzętowego, które umożliwia przyspieszenie analizy problemów elektromagnetycznych metodą elementów skończonych (MES) z funkcjami bazowymi wysokiego rzędu. Metoda elementów skończonych w dziedzinie częstotliwości stanowi wydajne i uniwersalne narzędzie analizy układów mikrofalowych (rys....

Algorytmy analizy i przetwarzania danych z sonarów wielowiązkowych w rozproszonych systemach GIS

Publikacja

A. Chybicki

- Rok 2011

Telemonitoring morski oraz szeroko rozumiane badania morza są ważnym elementem aktywności człowieka w sferze badań, nauki oraz gospodarki. Prowadzenie działań związanych z tworzeniem map dna, inspekcją nadbrzeży, umocnień, badaniem fauny morskiej pozwala zrozumieć procesy zachodzące w środowisku morskim oraz przyczynia się do rozwoju wielu gałęzi gospodarki takich jak transport morski, bezpieczeństwo, ochrona portów i inne. W ramach...

Filtry

Katalog

Kategoria

Rok

Opcje

Acceleration of the DGF-FDTD method on GPU using the CUDA technology

Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology

High performance filtering for big datasets from Airborne Laser Scanning with CUDA technology

Use of ICT infrastructure for teaching HPC

Implementation of FDTD-Compatible Green's Function on Graphics Processing Unit

Parallel multithread computing for spectroscopic analysis in optical coherence tomography

Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

Latająca Kawiarenka Naukowa

Multi-core and Multiprocessor Implementation of Numerical Integration in Finite Element Method

Krylov Space Iterative Solvers on Graphics Processing Units

Wykorzystanie technologii CUDA do kompresji w czasie rzeczywistym danych pochodzących z sonarów wielowiązkowych.

Towards an efficient multi-stage Riemann solver for nuclear physics simulations

GPU based implementation of Temperature-Vegetation Dryness Index for AVHRR3 Satellite Data

Performance evaluation of parallel background subtraction on GPU platforms

Performance evaluation of the parallel object tracking algorithm employing the particle filter

Optimizing the computation of a parallel 3D finite difference algorithm for graphics processing units

Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool

Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs

Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system

Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn - ciało stałe

Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn-ciało stałe

Modelling and simulation of GPU processing in the MERPSYS environment

Parallel Programming for Modern High Performance Computing Systems

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA

Algorytmy analizy i przetwarzania danych z sonarów wielowiązkowych w rozproszonych systemach GIS

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: CUDA TECHNOLOGY