dr hab. inż. Paweł Czarnul
Zatrudnienie
- Kierownik katedry w Katedra Architektury Systemów Komputerowych
- Prodziekan ds. współpracy i promocji w Wydział Elektroniki, Telekomunikacji i Informatyki
- Profesor uczelni w Katedra Architektury Systemów Komputerowych
Publikacje
Filtry
wszystkich: 117
Katalog Publikacji
Rok 2024
-
Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
PublikacjaIn this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...
-
Dataset Characteristics and Their Impact on Offline Policy Learning of Contextual Multi-Armed Bandits
PublikacjaThe Contextual Multi-Armed Bandits (CMAB) framework is pivotal for learning to make decisions. However, due to challenges in deploying online algorithms, there is a shift towards offline policy learning, which relies on pre-existing datasets. This study examines the relationship between the quality of these datasets and the performance of offline policy learning algorithms, specifically, Neural Greedy and NeuraLCB. Our results...
Rok 2023
-
UNRES-GPU for Physics-Based Coarse-Grained Simulations of Protein Systems at Biological Time- and Size-Scales
PublikacjaThe dynamics of the virus like particles (VLPs) corresponding to the GII.4 Houston, GII.2 SMV, and GI.1 Norwalk strains of human noroviruses (HuNoV) that cause gastroenteritis was investigated by means of long-time (about 30 μs in the laboratory timescale) molecular dynamics simulations with the coarse-grained UNRES force field. The main motion of VLP units turned out to be the bending at the junction between the P1 subdomain (that...
-
The Idea of a Student Research Project as a Method of Preparing a Student for Professional and Scientific Work
PublikacjaIn the paper we present the idea and implementation of a student research project course within the master’s program at the Faculty of Electronics, Telecommunications and Informatics, Gdansk Tech. It aims at preparing students for performing research and scientific tasks in future professional work. We outline the evolution from group projects into research project and the current deployment of both at bachelor’s and master’s levels...
-
Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs
PublikacjaConsidering ongoing developments of both modern CPUs, especially in the context of increasing numbers of cores, cache memory and architectures as well as compilers there is a constant need for benchmarking representative and frequently run workloads. The key metric is speed-up as the computational power of modern CPUs stems mainly from using multiple cores. In this paper, we show and discuss results from running codes such as:...
-
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
PublikacjaWe report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...
-
Long‐time scale simulations of virus‐like particles from three human‐norovirus strains
PublikacjaThe dynamics of the virus like particles (VLPs) corresponding to the GII.4 Houston, GII.2 SMV, and GI.1 Norwalk strains of human noroviruses (HuNoV) that cause gastroenteritis was investigated by means of long-time (about 30 μs in the laboratory timescale) molecular dynamics simulations with the coarse-grained UNRES force field. The main motion of VLP units turned out to be the bending at the junction between the P1 subdomain (that...
-
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
PublikacjaHigh-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...
-
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
PublikacjaIn the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...
-
Dynamic GPU power capping with online performance tracing for energy efficient GPU computing using DEPO tool
PublikacjaGPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this...
-
Dataset Related Experimental Investigation of Chess Position Evaluation Using a Deep Neural Network
PublikacjaThe idea of training Articial Neural Networks to evaluate chess positions has been widely explored in the last ten years. In this paper we investigated dataset impact on chess position evaluation. We created two datasets with over 1.6 million unique chess positions each. In one of those we also included randomly generated positions resulting from consideration of potentially unpredictable chess moves. Each position was evaluated...
-
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
PublikacjaIn the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...
Rok 2022
-
Performance Assessment of Using Docker for Selected MPI Applications in a Parallel Environment Based on Commodity Hardware
PublikacjaIn the paper, we perform detailed performance analysis of three parallel MPI applications run in a parallel environment based on commodity hardware, using Docker and bare-metal configurations. The testbed applications are representative of the most typical parallel processing paradigms: master–slave, geometric Single Program Multiple Data (SPMD) as well as divide-and-conquer and feature characteristic computational and communication...
-
Investigation of Performance and Configuration of a Selected IoT System—Middleware Deployment Benchmarking and Recommendations
PublikacjaNowadays Internet of Things is gaining more and more focus all over the world. As a concept it gives many opportunities for applications for society and it is expected that the number of software services deployed in this area will still grow fast. Especially important in this context are properties connected with deployment such as portability, scalability and balance between software requirements and hardware capabilities. In...
-
GPU Power Capping for Energy-Performance Trade-Offs in Training of Deep Convolutional Neural Networks for Image Recognition
PublikacjaIn the paper we present performance-energy trade-off investigation of training Deep Convolutional Neural Networks for image recognition. Several representative and widely adopted network models, such as Alexnet, VGG-19, Inception V3, Inception V4, Resnet50 and Resnet152 were tested using systems with Nvidia Quadro RTX 6000 as well as Nvidia V100 GPUs. Using GPU power capping we found other than default configurations minimizing...
-
Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs
PublikacjaIn the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...
-
DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing
PublikacjaIn the article we propose an automatic power capping software tool DEPO that allows one to perform runtime optimization of performance and energy related metrics. For an assumed application model with an initialization phase followed by a running phase with uniform compute and memory intensity, the tool performs automatic tuning engaging one of the two exploration algorithms—linear search (LS) and golden section search (GSS), finds...
Rok 2021
-
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
PublikacjaIn the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...
-
Human awareness versus Autonomous Vehicles view: comparison of reaction times during emergencies
PublikacjaHuman safety is one of the most critical factors when a new technology is introduced to the everyday use. It was no different in the case of Autonomous Vehicles (AV), designed to replace generally available Conventional Vehicles (CV) in the future. AV rules, from the start, focus on guaranteeing safety for passengers and other road users, and these assumptions usually work during normal traffic conditions. However, there is still...
-
Benchmarking Scalability and Security Configuration Impact for A Distributed Sensors-Server IOT Use Case
PublikacjaInternet of Things has been getting more and more attention and found numerous practical applications. Especially important in this context are performance, security and ability to cope with failures. Especially crucial is to find good trade-off between these. In this article we present results of practical tests with multiple clients representing sensors sending notifications to an IoT middleware – DeviceHive. We investigate performance...
wyświetlono 3665 razy