Paweł Czarnul - Publikacje

dr hab. inż. Paweł Czarnul

Zatrudnienie

Główny specjalista w Dział Usług Chmurowych
Prodziekan ds. współpracy i promocji w Wydział Elektroniki, Telekomunikacji i Informatyki
Kierownik katedry w Katedra Architektury Systemów Komputerowych

Słowa kluczowe Pomoc

Publikacje

wyników na stronę:
rok:
- zaznaczony Sortuj po rok od najnowszych
- Sortuj po rok od najstarszych
tytuł:
- zaznaczony Sortuj po tytuł A-Z
- Sortuj po tytuł Z-A
cytowania:
- Sortuj po cytowania malejąco
- Sortuj po cytowania rosnąco

Rok 2024

Dataset Characteristics and Their Impact on Offline Policy Learning of Contextual Multi-Armed Bandits
Publikacja
- Rok 2024
The Contextual Multi-Armed Bandits (CMAB) framework is pivotal for learning to make decisions. However, due to challenges in deploying online algorithms, there is a shift towards offline policy learning, which relies on pre-existing datasets. This study examines the relationship between the quality of these datasets and the performance of offline policy learning algorithms, specifically, Neural Greedy and NeuraLCB. Our results...

Pełny tekst do pobrania w portalu
Investigation of Performance and Energy Consumption of Tokenization Algorithms on Multi-core CPUs Under Power Capping
Publikacja
- Rok 2024
In this paper we investigate performance-energy optimization of tokenizer algorithm training using power capping. We focus on parallel, multi-threaded implementations of Byte Pair Encoding (BPE), Unigram, WordPiece, and WordLevel run on two systems with different multi-core CPUs: Intel Xeon 6130 and desktop Intel i7-13700K. We analyze execution times and energy consumption for various numbers of threads and various power caps and...

Pełny tekst do pobrania w portalu
Multi-GPU UNRES for scalable coarse-grained simulations of very large protein systems
Publikacja
- K. Ocetkiewicz
- C. Czaplewski
- H. Krawczyk
- A. Lipska
- A. Liwo
- J. Proficz
- A. K. Sieradzan
- P. Czarnul
- COMPUTER PHYSICS COMMUNICATIONS - Rok 2024
Graphical Processor Units (GPUs) are nowadays widely used in all-atom molecular simulations because of the advantage of efficient partitioning of atom pairs between the kernels to compute the contributions to energy and forces, thus enabling the treatment of very large systems. Extension of time- and size-scale of computations is also sought through the development of coarse-grained (CG) models, in which atoms are merged into extended...

Pełny tekst do pobrania w serwisie zewnętrznym
Multi-GPU-powered UNRES package for physics-based coarse-grained simulations of structure, dynamics, and thermodynamics of protein systems at biological size- and timescales
Publikacja
- C. Czaplewski
- P. Czarnul
- H. Krawczyk
- A. Lipska
- E. Lubecka
- K. Ocetkiewicz
- J. Proficz
- A. Sieradzan
- R. Ślusarz
- J. Liwo
- BIOPHYSICAL JOURNAL - Rok 2024
Coarse-grained models are nowadays extensively used in biomolecular simulations owing to the tremendous extension of size- and time-scale of simulations. The physics-based UNRES (UNited RESidue) model of proteins developed in our laboratory has only two interaction sites per amino-acid residue (united peptide groups and united side chains) and implicit solvent. However, owing to rigorous physics-based derivation, which enabled...

Pełny tekst do pobrania w serwisie zewnętrznym
Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
Publikacja
- Rok 2024
In this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...

Pełny tekst do pobrania w portalu
Teaching High–performance Computing Systems – A Case Study with Parallel Programming APIs: MPI, OpenMP and CUDA
Publikacja
- Rok 2024
High performance computing (HPC) education has become essential in recent years, especially that parallel computing on high performance computing systems enables modern machine learning models to grow in scale. This significant increase in the computational power of modern supercomputers relies on a large number of cores in modern CPUs and GPUs. As a consequence, parallel program development based on parallel thinking has become...

Pełny tekst do pobrania w serwisie zewnętrznym

Rok 2023

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
Publikacja
- P. Czarnul
- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Rok 2023
In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...

Pełny tekst do pobrania w portalu
Dataset Related Experimental Investigation of Chess Position Evaluation Using a Deep Neural Network
Publikacja
- D. Wieczerzak
- P. Czarnul
- Rok 2023
The idea of training Articial Neural Networks to evaluate chess positions has been widely explored in the last ten years. In this paper we investigated dataset impact on chess position evaluation. We created two datasets with over 1.6 million unique chess positions each. In one of those we also included randomly generated positions resulting from consideration of potentially unpredictable chess moves. Each position was evaluated...

Pełny tekst do pobrania w portalu
Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool
Publikacja
- Future Generation Computer Systems-The International Journal of Grid Computing-Theory Methods and Applications - Rok 2023
GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...

Pełny tekst do pobrania w serwisie zewnętrznym
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
Publikacja
- J. Skrzypczak
- P. Czarnul
- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023
In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w portalu
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
Publikacja
- ENERGIES - Rok 2023
High-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...

Pełny tekst do pobrania w portalu
Long‐time scale simulations of virus‐like particles from three human‐norovirus strains
Publikacja
- A. Lipska
- A. Sieradzan
- C. Czaplewski
- A. D. Lipińska
- K. Ocetkiewicz
- J. Proficz
- P. Czarnul
- H. Krawczyk
- J. Liwo
- JOURNAL OF COMPUTATIONAL CHEMISTRY - Rok 2023
The dynamics of the virus like particles (VLPs) corresponding to the GII.4 Houston, GII.2 SMV, and GI.1 Norwalk strains of human noroviruses (HuNoV) that cause gastroenteritis was investigated by means of long-time (about 30 μs in the laboratory timescale) molecular dynamics simulations with the coarse-grained UNRES force field. The main motion of VLP units turned out to be the bending at the junction between the P1 subdomain (that...

Pełny tekst do pobrania w portalu
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
Publikacja
- A. Sieradzan
- J. Sans‐Duñó
- E. Lubecka
- C. Czaplewski
- A. Lipska
- H. Leszczyński
- K. Ocetkiewicz
- J. Proficz
- P. Czarnul
- H. Krawczyk
- A. Liwo
- JOURNAL OF COMPUTATIONAL CHEMISTRY - Rok 2023
We report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...

Pełny tekst do pobrania w portalu
Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs
Publikacja
- B. Gawrych
- P. Czarnul
- Rok 2023
Considering ongoing developments of both modern CPUs, especially in the context of increasing numbers of cores, cache memory and architectures as well as compilers there is a constant need for benchmarking representative and frequently run workloads. The key metric is speed-up as the computational power of modern CPUs stems mainly from using multiple cores. In this paper, we show and discuss results from running codes such as:...

Pełny tekst do pobrania w serwisie zewnętrznym
The Idea of a Student Research Project as a Method of Preparing a Student for Professional and Scientific Work
Publikacja
- Rok 2023
In the paper we present the idea and implementation of a student research project course within the master’s program at the Faculty of Electronics, Telecommunications and Informatics, Gdansk Tech. It aims at preparing students for performing research and scientific tasks in future professional work. We outline the evolution from group projects into research project and the current deployment of both at bachelor’s and master’s levels...

Pełny tekst do pobrania w portalu
UNRES-GPU for Physics-Based Coarse-Grained Simulations of Protein Systems at Biological Time- and Size-Scales
Publikacja
- BIOINFORMATICS - Rok 2023
The dynamics of the virus like particles (VLPs) corresponding to the GII.4 Houston, GII.2 SMV, and GI.1 Norwalk strains of human noroviruses (HuNoV) that cause gastroenteritis was investigated by means of long-time (about 30 μs in the laboratory timescale) molecular dynamics simulations with the coarse-grained UNRES force field. The main motion of VLP units turned out to be the bending at the junction between the P1 subdomain (that...

Pełny tekst do pobrania w portalu

Rok 2022

DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing
Publikacja
- SOFTWARE-PRACTICE & EXPERIENCE - Rok 2022
In the article we propose an automatic power capping software tool DEPO that allows one to perform runtime optimization of performance and energy related metrics. For an assumed application model with an initialization phase followed by a running phase with uniform compute and memory intensity, the tool performs automatic tuning engaging one of the two exploration algorithms—linear search (LS) and golden section search (GSS), finds...

Pełny tekst do pobrania w serwisie zewnętrznym
Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs
Publikacja
- Rok 2022
In the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...

Pełny tekst do pobrania w portalu
GPU Power Capping for Energy-Performance Trade-Offs in Training of Deep Convolutional Neural Networks for Image Recognition
Publikacja
- Rok 2022
In the paper we present performance-energy trade-off investigation of training Deep Convolutional Neural Networks for image recognition. Several representative and widely adopted network models, such as Alexnet, VGG-19, Inception V3, Inception V4, Resnet50 and Resnet152 were tested using systems with Nvidia Quadro RTX 6000 as well as Nvidia V100 GPUs. Using GPU power capping we found other than default configurations minimizing...

Pełny tekst do pobrania w portalu
Investigation of Performance and Configuration of a Selected IoT System—Middleware Deployment Benchmarking and Recommendations
Publikacja
- R. Kałaska
- P. Czarnul
- Applied Sciences-Basel - Rok 2022
Nowadays Internet of Things is gaining more and more focus all over the world. As a concept it gives many opportunities for applications for society and it is expected that the number of software services deployed in this area will still grow fast. Especially important in this context are properties connected with deployment such as portability, scalability and balance between software requirements and hardware capabilities. In...

Pełny tekst do pobrania w portalu
Performance Assessment of Using Docker for Selected MPI Applications in a Parallel Environment Based on Commodity Hardware
Publikacja
- T. Kononowicz
- P. Czarnul
- Applied Sciences-Basel - Rok 2022
In the paper, we perform detailed performance analysis of three parallel MPI applications run in a parallel environment based on commodity hardware, using Docker and bare-metal configurations. The testbed applications are representative of the most typical parallel processing paradigms: master–slave, geometric Single Program Multiple Data (SPMD) as well as divide-and-conquer and feature characteristic computational and communication...

Pełny tekst do pobrania w portalu

Rok 2021

Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
Publikacja
- P. Czarnul
- Electronics - Rok 2021
The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Pełny tekst do pobrania w portalu
Benchmarking Scalability and Security Configuration Impact for A Distributed Sensors-Server IOT Use Case
Publikacja
- R. Kałaska
- P. Czarnul
- Rok 2021
Internet of Things has been getting more and more attention and found numerous practical applications. Especially important in this context are performance, security and ability to cope with failures. Especially crucial is to find good trade-off between these. In this article we present results of practical tests with multiple clients representing sensors sending notifications to an IoT middleware – DeviceHive. We investigate performance...

Pełny tekst do pobrania w portalu
Human awareness versus Autonomous Vehicles view: comparison of reaction times during emergencies
Publikacja
- A. Rydzewski
- P. Czarnul
- Rok 2021
Human safety is one of the most critical factors when a new technology is introduced to the everyday use. It was no different in the case of Autonomous Vehicles (AV), designed to replace generally available Conventional Vehicles (CV) in the future. AV rules, from the start, focus on guaranteeing safety for passengers and other road users, and these assumptions usually work during normal traffic conditions. However, there is still...

Pełny tekst do pobrania w serwisie zewnętrznym
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
Publikacja
- T. M. Boiński (dawniej: T. Boiński)
- P. Czarnul
- COMPUTER JOURNAL - Rok 2021
In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...

Pełny tekst do pobrania w portalu

Rok 2020

Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge
Publikacja
- P. Czarnul
- P. Rościszewski
- Rok 2020
Auto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...

Pełny tekst do pobrania w portalu
Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors
Publikacja
- P. Czarnul
- K. Jabłońska
- International Journal of Computer Information Systems and Industrial Management Applications - Rok 2020
In the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...

Pełny tekst do pobrania w serwisie zewnętrznym
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
Publikacja
- P. Czarnul
- COMPUTING AND INFORMATICS - Rok 2020
The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...

Pełny tekst do pobrania w portalu
Performance/energy aware optimization of parallel applications on GPUs under power capping
Publikacja
- A. Krzywaniak
- P. Czarnul
- Rok 2020
In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Pełny tekst do pobrania w portalu
Recent advances in traffic optimisation: systematic literature review of modern models, methods and algorithms
Publikacja
- A. Rydzewski
- P. Czarnul
- IET Intelligent Transport Systems - Rok 2020
Over the past few decades, the increasing number of vehicles and imperfect road traffic management have been sources of congestion in cities and reasons for deteriorating health of its inhabitants. With the help of computer simulations, transport engineers optimise and improve the capacity of city streets. However, with an enormous number of possible simulation types, it is difficult to grasp valuable, innovative solutions which...

Pełny tekst do pobrania w serwisie zewnętrznym
Some Security Features of Selected IoT Platforms
Publikacja
- R. Kałaska
- P. Czarnul
- TASK Quarterly - Rok 2020
IoT (Internet of Things) is certainly one of the leading current and future trends for processing in the current distributed world. It is changing our life and society. IoT allows new ubiquitous applications and processing, but, on the other hand, it introduces potentially serious security threats. Nowadays researchers in IoT areas should, without a doubt, consider and focus on security aspects. This paper is aimed at a high-level...

Pełny tekst do pobrania w portalu
Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems
Publikacja
- Scientific Programming - Rok 2020
This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals,...

Pełny tekst do pobrania w portalu
The impact of the AC922 Architecture on Performance of Deep Neural Network Training
Publikacja
- P. Rościszewski
- M. Iwański
- P. Czarnul
- Rok 2020
Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report...

Pełny tekst do pobrania w serwisie zewnętrznym

Rok 2019

Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments
Publikacja
- Scientific Programming - Rok 2019
The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods. System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. Optimization goals include various combinations of...

Pełny tekst do pobrania w portalu
Extended investigation of performance-energy trade-offs under power capping in HPC environments
Publikacja
- Rok 2019
—In the paper we present investigation of performance-energy trade-offs under power capping using modern processors. The results are presented for systems targeted at both server and client markets and were collected from Intel Xeon E5 and Intel Xeon Phi server processors as well as from desktop and mobile Intel Core i7 processors. The results, when using power capping, show that we can find various interesting combinations of...
Multi-agent large-scale parallel crowd simulation with NVRAM-based distributed cache
Publikacja
- A. Malinowski
- P. Czarnul
- Journal of Computational Science - Rok 2019
This paper presents the architecture, main components and performance results for a parallel and modu-lar agent-based environment aimed at crowd simulation. The environment allows to simulate thousandsor more agents on maps of square kilometers or more, features a modular design and incorporates non-volatile RAM (NVRAM) with a fail-safe mode that can be activated to allow to continue computationsfrom a recently analyzed state in...

Pełny tekst do pobrania w serwisie zewnętrznym
Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs
Publikacja
- M. Knap
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Rok 2019
The paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation,...

Pełny tekst do pobrania w portalu
Use of ICT infrastructure for teaching HPC
Publikacja
- P. Czarnul
- M. Matuszek
- Rok 2019
In this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...

Pełny tekst do pobrania w serwisie zewnętrznym

Rok 2018

A Solution to Image Processing with Parallel MPI I/O and Distributed NVRAM Cache
Publikacja
- A. Malinowski
- P. Czarnul
- Scalable Computing: Practice and Experience - Rok 2018
The paper presents a new approach to parallel image processing using byte addressable, non-volatile memory (NVRAM). We show that our custom built MPI I/O implementation of selected functions that use a distributed cache that incorporates NVRAMs located in cluster nodes can be used for efficient processing of large images. We demonstrate performance benefits of such a solution compared to a traditional implementation without NVRAM...

Pełny tekst do pobrania w portalu
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
Publikacja
- Annals of Computer Science and Information Systems - Rok 2018
In the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...

Pełny tekst do pobrania w portalu
Benchmarking overlapping communication and computations with multiple streams for modern GPUs
Publikacja
- P. Czarnul
- Annals of Computer Science and Information Systems - Rok 2018
The paper presents benchmarking a multi-stream application processing a set of input data arrays. Tests have been performed and execution times measured for various numbers of streams and various compute intensities measured as the ratio of kernel compute time and data transfer time. As such, the application and benchmarking is representative of frequently used operations such as vector weighted sum, matrix multiplication etc....

Pełny tekst do pobrania w portalu
Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
Publikacja
- P. Czarnul
- Rok 2018
The paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...

Pełny tekst do pobrania w serwisie zewnętrznym
From Sequential to Parallel Implementation of NLP Using the Actor Model
Publikacja
- Advances in Intelligent Systems and Computing - Rok 2018
The article focuses on presenting methods allowing easy parallelization of an existing, sequential Natural Language Processing (NLP) application within a multi-core system. The actor-based solution implemented with the Akka framework has been applied and compared to an application based on Task Parallel Library (TPL) and to the original sequential application. Architectures, data and control flows are described along with execution...

Pełny tekst do pobrania w portalu
Modelling and simulation of GPU processing in the MERPSYS environment
Publikacja
- T. Gajger
- P. Czarnul
- Scalable Computing: Practice and Experience - Rok 2018
In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...

Pełny tekst do pobrania w portalu
Parallel Programming for Modern High Performance Computing Systems
Publikacja
- P. Czarnul
- Rok 2018
In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Pełny tekst do pobrania w serwisie zewnętrznym
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
Publikacja
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Rok 2018
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu
Three levels of fail-safe mode in MPI I/O NVRAM distributed cache
Publikacja
- A. Malinowski
- P. Czarnul
- Procedia Computer Science - Rok 2018
The paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which...

Pełny tekst do pobrania w portalu

Rok 2017

A distributed system for conducting chess games in parallel
Publikacja
- A. Rydzewski
- P. Czarnul
- Procedia Computer Science - Rok 2017
This paper proposes a distributed and scalable cloud based system designed to play chess games in parallel. Games can be played between chess engines alone or between clusters created by combined chess engines. The system has a built-in mechanism that compares engines, based on Elo ranking which finally presents the strength of each tested approach. If an approach needs more computational power, the design of the system allows...

Pełny tekst do pobrania w portalu
Distributed NVRAM Cache – Optimization and Evaluation with Power of Adjacency Matrix
Publikacja
- A. Malinowski
- P. Czarnul
- Rok 2017
In this paper we build on our previously proposed MPI I/O NVRAM distributed cache for high performance computing. In each cluster node it incorporates NVRAMs which are used as an intermediate cache layer between an application and a file for fast read/write operations supported through wrappers of MPI I/O functions. In this paper we propose optimizations of the solution including handling of write requests with a synchronous mode,...

Pełny tekst do pobrania w serwisie zewnętrznym
MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems
Publikacja
- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2017
In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...

Pełny tekst do pobrania w portalu

wyświetlono 5991 razy

Wyszukiwarka

dr hab. inż. Paweł Czarnul

Zatrudnienie

Słowa kluczowe Pomoc

Publikacje

Filtry

Kategoria

Rok

Opcje

Katalog Publikacji

Rok 2024

Rok 2023

Rok 2022

Rok 2021

Rok 2020

Rok 2019

Rok 2018

Rok 2017