Wyniki wyszukiwania dla: gpus

Wyniki wyszukiwania dla: gpus

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 50

wyczyść wszystkie filtry niedostępne

Benchmarking overlapping communication and computations with multiple streams for modern GPUs
Publikacja
- P. Czarnul
- Annals of Computer Science and Information Systems - Rok 2018
The paper presents benchmarking a multi-stream application processing a set of input data arrays. Tests have been performed and execution times measured for various numbers of streams and various compute intensities measured as the ratio of kernel compute time and data transfer time. As such, the application and benchmarking is representative of frequently used operations such as vector weighted sum, matrix multiplication etc....

Pełny tekst do pobrania w portalu
Quality of Cryptocurrency Mining on Previous Generation NVIDIA GTX GPUs
Publikacja
- Rok 2022
Currently, there is a lot of previous generation NVIDIA GTX graphical processing units (GPUs) available on the market, which were ousted from by next-gen RTX units. Due to this fact, numerous fully-operational devices remain underused, which are available at an affordable price. First, this paper presents an analysis of the cryptocurrency market. Next, in this context, the results of research on the performance of NVIDIA graphics...

Pełny tekst do pobrania w serwisie zewnętrznym
Using GPUs for Parallel Stencil Computations in Relativistic Hydrodynamic Simulation
Publikacja
- S. Cygert
- D. Kikoła
- J. Porter-Sobieraj
- J. Sikorski
- M. Słodkowski
- Rok 2014
This paper explores the possibilities of using a GPU for complex 3D finite difference computation. We propose a new approach to this topic using surface memory and compare it with 3D stencil computations carried out via shared memory, which is currently considered to be the best approach. The case study was performed for the extensive computation of collisions between heavy nuclei in terms of relativistic hydrodynamics.

Pełny tekst do pobrania w serwisie zewnętrznym
Performance/energy aware optimization of parallel applications on GPUs under power capping
Publikacja
- A. Krzywaniak
- P. Czarnul
- Rok 2020
In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Pełny tekst do pobrania w portalu
Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs
Publikacja
- P. Czarnul
- P. Rościszewski
- Rok 2014
The paper proposes an approach for parallelization of computations across a collection of clusters with heterogeneous nodes with both GPUs and CPUs. The proposed system partitions input data into chunks and assigns to par- ticular devices for processing using OpenCL kernels defined by the user. The sys- tem is able to minimize the execution time of the application while maintaining the power consumption of the utilized GPUs and...

Pełny tekst do pobrania w serwisie zewnętrznym
Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs
Publikacja
- Rok 2022
In the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...

Pełny tekst do pobrania w portalu
Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs
Publikacja
- M. Knap
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Rok 2019
The paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation,...

Pełny tekst do pobrania w portalu
KernelHive: a new workflow-based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs
Publikacja
- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2016
The paper presents a new open-source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available....

Pełny tekst do pobrania w serwisie zewnętrznym
Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs
Publikacja
- A. Dziekoński
- G. Fotyga
- M. Mrozowski
- IEEE Access - Rok 2018
This paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...

Pełny tekst do pobrania w portalu
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
Publikacja
- P. Czarnul
- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2023
In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...

Pełny tekst do pobrania w portalu
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
Publikacja
- P. Czarnul
- COMPUTING AND INFORMATICS - Rok 2020
The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...

Pełny tekst do pobrania w portalu
Piotr Sypek dr inż.

Osoby

Katedra Inżynierii Mikrofalowej i Antenowej

Piotr Sypek otrzymał w Politechnice Gdańskiej tytuł magistra inżyniera w 2003 roku oraz stopień doktora nauk technicznych (z wyróżnieniem) w 2012 roku. Obecnie pracuje w Katedrze Inżynierii Mikrofalowej i Antenowej na Wydziale Elektroniki, Telekomunikacji i Informatyki w Politechnice Gdańskiej. Jego działalność badawcza zawiera projektowanie i implementację równoległych algorytmów stosowanych do budowania i wyznaczania rozwiązywania...
The impact of the AC922 Architecture on Performance of Deep Neural Network Training
Publikacja
- P. Rościszewski
- M. Iwański
- P. Czarnul
- Rok 2020
Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report...

Pełny tekst do pobrania w serwisie zewnętrznym
Generation of large finite-element matrices on multiple graphics processors
Publikacja
- INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING - Rok 2013
This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...

Pełny tekst do pobrania w serwisie zewnętrznym
Performance evaluation of parallel background subtraction on GPU platforms
Publikacja
- G. Szwoch
- Elektronika : konstrukcje, technologie, zastosowania - Rok 2015
Implementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...

Pełny tekst do pobrania w serwisie zewnętrznym
TensorHive: Management of Exclusive GPU Access for Distributed Machine Learning Workloads
Publikacja
- JOURNAL OF MACHINE LEARNING RESEARCH - Rok 2021
TensorHive is a tool for organizing work of research and engineering teams that use servers with GPUs for machine learning workloads. In a comprehensive web interface, it supports reservation of GPUs for exclusive usage, hardware monitoring, as well as configuring, executing and queuing distributed computational jobs. Focusing on easy installation and simple configuration, the tool automatically detects the available computing...

Pełny tekst do pobrania w portalu
Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation
Publikacja
- IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES - Rok 2017
This paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral meshes that lead to sparse matrices too large to be dealt with on a typical workstation using direct methods. We discuss the problem of rapid matrix generation and assembly, as well as...

Pełny tekst do pobrania w serwisie zewnętrznym
Multi-GPU UNRES for scalable coarse-grained simulations of very large protein systems
Publikacja
- K. Ocetkiewicz
- C. Czaplewski
- H. Krawczyk
- A. Lipska
- A. Liwo
- J. Proficz
- A. K. Sieradzan
- P. Czarnul
- COMPUTER PHYSICS COMMUNICATIONS - Rok 2024
Graphical Processor Units (GPUs) are nowadays widely used in all-atom molecular simulations because of the advantage of efficient partitioning of atom pairs between the kernels to compute the contributions to energy and forces, thus enabling the treatment of very large systems. Extension of time- and size-scale of computations is also sought through the development of coarse-grained (CG) models, in which atoms are merged into extended...

Pełny tekst do pobrania w serwisie zewnętrznym
Paweł Rościszewski dr inż.

Osoby

Paweł Rościszewski received his PhD in Computer Science at Gdańsk University of Technology in 2018 based on PhD thesis entitled: "Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption". Currently, he is an Assistant Professor at the Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Poland....
Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment
Publikacja
- Rok 2014
The paper presents design, implementation and real life uses of a visualization subsystem for a distributed framework for parallelization of workflow-based computations among clusters with nodes that feature both CPUs and GPUs. Firstly, the proposed system presents a graphical view of the infrastructure with clusters, nodes and compute devices along with parameters and runtime graphs of load, memory available, fan speeds etc. Secondly,...

Pełny tekst do pobrania w serwisie zewnętrznym
High performance filtering for big datasets from Airborne Laser Scanning with CUDA technology
Publikacja
- W. Błaszczak-bąk
- A. Janowski
- P. Srokosz
- SURVEY REVIEW - Rok 2018
There are many studies on the problems of processing big datasets provided by Airborne Laser Scanning (ALS). The processing of point clouds is often executed in stages or on the fragments of the measurement set. Therefore, solutions that enable the processing of the entire cloud at the same time in a simple, fast, efficient way are the subject of many researches. In this paper, authors propose to use General-Purpose computation...

Pełny tekst do pobrania w serwisie zewnętrznym
Block-based Representation of Application Execution on Modern Parallel Systems
Publikacja
- P. Czarnul
- Rok 2013
The chapter presents how to model execution of a parallel computational application that is to be executed in a large-scale parallel or distributed environment with potentially thousands to millions of execution units. The representation uses pre- viously attributes and factors representative of modern high performance systems including multicore CPUs, GPUs, dedicated accelerators such as Intel Phi.
How to render FDTD computations more effective using agraphics accelerator.
Publikacja
- IEEE TRANSACTIONS ON MAGNETICS - Rok 2009
Graphics processing units (GPUs) for years have been dedicated mostly to real time rendering. Recently leading GPU manufactures have extended their research area and decided to support also graphics computing. In this paper, we describe an impact of new GPU features on development process of an efficient finite difference time domain (FDTD) implementation.

Pełny tekst do pobrania w serwisie zewnętrznym
Multi-core and Multiprocessor Implementation of Numerical Integration in Finite Element Method
Publikacja
- Rok 2012
The paper presents techniques for accelerating a numerical integration process which appears in the Finite Element Method. The acceleration is achieved by taking advantages of multi-core and multiprocessor devices. It is shown that using multi-core implementation with OpenMP and a GPU acceleration using CUDA architecture allows one to achieve the speedups by a factor of 5 and 10 on a CPU and GPUs, respectively.
Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
Publikacja
- Rok 2024
In this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...

Pełny tekst do pobrania w serwisie zewnętrznym
Krylov Space Iterative Solvers on Graphics Processing Units
Publikacja
- A. Dziekoński
- M. Mrozowski
- Rok 2010
CUDA architecture was introduced by Nvidia three years ago and since then there have been many promising publications demonstrating a huge potential of Graphics Processing Units (GPUs) in scientific computations. In this paper, we investigate the performance of iterative methods such as cg, minres, gmres, bicg that may be used to solve large sparse real and complex systems of equations arising in computational electromagnetics.

Pełny tekst do pobrania w serwisie zewnętrznym
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
Publikacja
- J. Skrzypczak
- P. Czarnul
- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023
In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym
Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool
Publikacja
- Future Generation Computer Systems-The International Journal of Grid Computing-Theory Methods and Applications - Rok 2023
GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...

Pełny tekst do pobrania w serwisie zewnętrznym
Paweł Czarnul dr hab. inż.

Osoby

Dział Usług Chmurowych, Wydział Elektroniki, Telekomunikacji i Informatyki, Katedra Architektury Systemów Komputerowych

Paweł Czarnul uzyskał stopień doktora habilitowanego w dziedzinie nauk technicznych w dyscyplinie informatyka w roku 2015 zaś stopień doktora nauk technicznych w zakresie informatyki(z wyróżnieniem) nadany przez Radę Wydziału Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej w roku 2003. Dziedziny jego zainteresowań obejmują: przetwarzanie równoległei rozproszone w tym programowanie równoległe na klastrach obliczeniowych,...
GPU based implementation of Temperature-Vegetation Dryness Index for AVHRR3 Satellite Data
Publikacja
- T. Bieliński
- A. Chybicki
- Rok 2014
Paper presents an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland. All phases of TVDI implementation on GPU are modified in respect to CUDA platform....
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
Publikacja
- P. Rościszewski
- Procedia Computer Science - Rok 2017
In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

Pełny tekst do pobrania w portalu
Simulation of parallel similarity measure computations for large data sets
Publikacja
- Rok 2015
The paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components...

Pełny tekst do pobrania w serwisie zewnętrznym
Implementation of FDTD-Compatible Green's Function on Graphics Processing Unit
Publikacja
- T. Stefański
- K. Krzyżanowska
- IEEE Antennas and Wireless Propagation Letters - Rok 2012
In this letter, implementation of the finite-difference time domain (FDTD)-compatible Green's function on a graphics processing unit (GPU) is presented. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates its applications in the FDTD simulations of radiation and scattering problems. Unfortunately, implementation of the new DGF formula in software requires a multiple precision...

Pełny tekst do pobrania w serwisie zewnętrznym
NVRAM as Main Storage of Parallel File System
Publikacja
- A. Malinowski
- Journal of Computer Science and Control Systems - Rok 2016
Modern cluster environments' main trouble used to be lack of computational power provided by CPUs and GPUs, but recently they suffer more and more from insufficient performance of input and output operations. Apart from better network infrastructure and more sophisticated processing algorithms, a lot of solutions base on emerging memory technologies. This paper presents evaluation of using non-volatile random-access memory as a...

Pełny tekst do pobrania w serwisie zewnętrznym
Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method
Publikacja
- A. Dziekoński
- M. Mrozowski
- RADIOENGINEERING - Rok 2018
This paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...

Pełny tekst do pobrania w portalu
Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform
Publikacja
- G. Szwoch
- M. Szczodrak
- Rok 2017
Performance evaluation of selected complex video processing algorithms, implemented on a parallel, embedded GPU platform Tegra X1, is presented. Three algorithms were chosen for evaluation: a GMM-based object detection algorithm, a particle filter tracking algorithm and an optical flow based algorithm devoted to people counting in a crowd flow. The choice of these algorithms was based on their computational complexity and parallel...

Pełny tekst do pobrania w serwisie zewnętrznym
An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method
Publikacja
- RADIOENGINEERING - Rok 2014
In this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in...

Pełny tekst do pobrania w portalu
Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
Publikacja
- A. Dziekoński
- M. Mrozowski
- IEEE Antennas and Wireless Propagation Letters - Rok 2018
In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Pełny tekst do pobrania w serwisie zewnętrznym
A memory efficient and fast sparse matrix vector product on a Gpu
Publikacja
- Progress in Electromagnetics Research-PIER - Rok 2011
This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Pełny tekst do pobrania w serwisie zewnętrznym
Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
Publikacja
- IEEE Antennas and Wireless Propagation Letters - Rok 2011
This letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow...

Pełny tekst do pobrania w serwisie zewnętrznym
Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments
Publikacja
- Scientific Programming - Rok 2019
The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods. System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. Optimization goals include various combinations of...

Pełny tekst do pobrania w portalu
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
Publikacja
- SIAM JOURNAL ON SCIENTIFIC COMPUTING - Rok 2015
In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....

Pełny tekst do pobrania w serwisie zewnętrznym
GPU Power Capping for Energy-Performance Trade-Offs in Training of Deep Convolutional Neural Networks for Image Recognition
Publikacja
- Rok 2022
In the paper we present performance-energy trade-off investigation of training Deep Convolutional Neural Networks for image recognition. Several representative and widely adopted network models, such as Alexnet, VGG-19, Inception V3, Inception V4, Resnet50 and Resnet152 were tested using systems with Nvidia Quadro RTX 6000 as well as Nvidia V100 GPUs. Using GPU power capping we found other than default configurations minimizing...

Pełny tekst do pobrania w portalu
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Publikacja
- P. Rościszewski
- J. Kaliski
- Rok 2017
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Pełny tekst do pobrania w serwisie zewnętrznym
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
Publikacja
- T. M. Boiński
- P. Czarnul
- COMPUTER JOURNAL - Rok 2021
In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...

Pełny tekst do pobrania w portalu
Modelling and simulation of GPU processing in the MERPSYS environment
Publikacja
- T. Gajger
- P. Czarnul
- Scalable Computing: Practice and Experience - Rok 2018
In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...

Pełny tekst do pobrania w portalu
Parallel Programming for Modern High Performance Computing Systems
Publikacja
- P. Czarnul
- Rok 2018
In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Pełny tekst do pobrania w serwisie zewnętrznym
MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems
Publikacja
- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2017
In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...

Pełny tekst do pobrania w portalu
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
Publikacja
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Rok 2018
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu
Advanced Potential Energy Surfaces for Molecular Simulation
Publikacja
- A. Albaugh
- H. Boateng
- R. Bradshaw
- O. Demerdash
- J. Dziedzic
- Y. Mao
- D. Margul
- J. Swails
- Q. Zeng
- D. Case... i 10 innych
- JOURNAL OF PHYSICAL CHEMISTRY B - Rok 2016
Advanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models...

Pełny tekst do pobrania w portalu

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: gpus

Piotr Sypek dr inż.

Paweł Rościszewski dr inż.

Paweł Czarnul dr hab. inż.