Search results for: PARALLEL APPLICATIONS

Search results for: PARALLEL APPLICATIONS

results on page:
embed this view on your website

Filters

total: 498

clear all filters disabled

Modeling Parallel Applications in the MERPSYS Environment
Publication
- P. Czarnul
- Year 2016
The chapter presents how to model parallel computational applications for which simulation of execution in a large-scale parallel or distributed environment is performed within the MERPSYS environment. Specifically, it is shown what approaches can be adopted to model key paradigms often used for parallel applications: master-slave, geometric parallelism (single program multiple data), pipelined and divide-and-conquer applications....
Modeling energy consumption of parallel applications
Publication
- Annals of Computer Science and Information Systems - Year 2016
The paper presents modeling and simulation of energy consumption of two types of parallel applications: geometric Single Program Multiple Data (SPMD) and divide-and-conquer (DAC). Simulation is performed in a new MERPSYS environment. Model of an application uses the Java language with extension representing message exchange between processes working in parallel. Simulation is performed by running threads representing distinct process...

Full text available to download
Simulation of Parallel Applications on Large-scale Distributed Systems
Publication
- P. Rościszewski
- P. Sidorczak
- Year 2014
This chapter has a form of a review article in the field of simulating High-Performance Computing systems. We justify the need for a new versatile simulator considering heterogeneity, energy efficiency and reliability of HPC systems. We sketch the problems that need to be solved by such simulator and rationalize using discrete-event simulation for this purpose. Based on a review of existing discrete-event HPC simulation solutions...
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
Publication
- P. Czarnul
- Electronics - Year 2021
The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Full text available to download
Performance/energy aware optimization of parallel applications on GPUs under power capping
Publication
- A. Krzywaniak
- P. Czarnul
- Year 2020
In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Full text available to download
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
Publication
- Annals of Computer Science and Information Systems - Year 2018
In the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...

Full text available to download
Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
Publication
- Ł. Jarząbek
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Year 2017
The aim of this paper is to evaluate performance of new CUDA mechanisms—unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically,...

Full text available to download
New user-guided and ckpt-based checkpointing libraries for parallel MPI applications
Publication
- P. Czarnul
- M. Frączak
- Year 2005
Praca prezentuje szczególy projektowe i implementacyjne jak również wyniki wydajnościowe dwóch nowych bibliotek checkpointingu opracowanych przez autorów dla równoległych aplikacji MPI. Pierwsz biblioteka, tzw. user-guided wymaga od programisty dostarczenia funkcji pakujących i rozpakowujących stan procesu, ale dostarcza łatwego w użyciu API z wykorzystaniem stałych MPI. Wykorzystuje funkcje I/O MPI-2 lub dedykowany proces master...
Performance Assessment of Using Docker for Selected MPI Applications in a Parallel Environment Based on Commodity Hardware
Publication
- T. Kononowicz
- P. Czarnul
- Applied Sciences-Basel - Year 2022
In the paper, we perform detailed performance analysis of three parallel MPI applications run in a parallel environment based on commodity hardware, using Docker and bare-metal configurations. The testbed applications are representative of the most typical parallel processing paradigms: master–slave, geometric Single Program Multiple Data (SPMD) as well as divide-and-conquer and feature characteristic computational and communication...

Full text available to download
A Fail-Safe NVRAM Based Mechanism for Efficient Creation and Recovery of Data Copies in Parallel MPI Applications
Publication
- A. Malinowski
- P. Czarnul
- M. Maciejewski
- P. Skowron
- Year 2016
The paper presents a fail-safe NVRAM based mechanism for creation and recovery of data copies during parallel MPI application runtime. Specifically, we target a cluster environment in which each node has an NVRAM installed in it. Our previously developed extension to the MPI I/O API can take advantage of NVRAM regions in order to provide an NVRAM based cache like mechanism to significantly speed up I/O operations and allow to preload...

Full text to download in external service
Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs
Publication
- M. Knap
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Year 2019
The paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation,...

Full text available to download
Checkpointing of Parallel MPI Applications using MPI One-sided API with Support for Byte-addressable Non-volatile RAM
Publication
- P. Dorożyński
- P. Czarnul
- A. Malinowski
- K. Czuryło
- Ł. Dorau
- M. Maciejewski
- P. Skowron
- Year 2016
The increasing size of computational clusters results in an increasing probability of failures, which in turn requires application checkpointing in order to survive those failures. Traditional checkpointing requires data to be copied from application memory into persistent storage medium, which increases application execution time as it is usually done in a separate step. In this paper we propose to use emerging byte-addressable...

Full text to download in external service
International Conference on Parallel and Distributed Computing, Applications and Technologies

Conferences
IEEE International Symposium on Parallel and Distributed Processing with Applications

Conferences
International Conference on Parallel and Distributed Processing Techniques and Applications

Conferences
International workshop on High-Level Parallel Programming and Applications

Conferences
International Workshop on Formal Methods for Parallel Programming: Theory and Applications

Conferences
Parallel processing of multimedia streams
Publication
- Computer Applications in Electrical Engineering - Year 2010
Rozdział przedstawia platformę KASKADA służącą do przetwarzania strumieni multimedialnych. Został opisany jej projekt: diagramy UML klas i sekwencji obrazujące mechanizmy przetwarzania strumieni, oraz szczegóły komunikacji. Zaprezentowano, również, specjalistyczny framework wspomagający tworzenie i wykonywanie algorytmów, jak również definiowanie scenariuszy usług, wraz z oceną ich użyteczności.
Parallel implementation of a Sailing Assistance Application in a Cloud Environment
Publication
- IEEE Access - Year 2023
Sailboat weather routing is a highly complex problem in terms of both the computational time and memory. The reason for this is a large search resulting in a multitude of possible routes and a variety of user preferences. Analysing all possible routes is only feasible for small sailing regions, low-resolution maps, or sailboat movements on a grid. Therefore, various heuristic approaches are often applied, which can find solutions...

Full text available to download
Parallel Programming for Modern High Performance Computing Systems
Publication
- P. Czarnul
- Year 2018
In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Full text to download in external service
Paweł Czarnul dr hab. inż.

People

Dział Usług Chmurowych, Faculty of Electronics, Telecommunications and Informatics, Department of Computer Architecture

Paweł Czarnul obtained a D.Sc. degree in computer science in 2015, a Ph.D. in computer science granted by a council at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology in 2003. His research interests include:parallel and distributed processing including clusters, accelerators, coprocessors; distributed information systems; architectures of distributed systems; programming mobile devices....
Simulation of parallel similarity measure computations for large data sets
Publication
- Year 2015
The paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components...

Full text to download in external service
A Workflow Application for Parallel Processing of Big Data from an Internet Portal
Publication
- P. Czarnul
- Year 2014
The paper presents a workflow application for efficient parallel processing of data downloaded from an Internet portal. The workflow partitions input files into subdirectories which are further split for parallel processing by services installed on distinct computer nodes. This way, analysis of the first ready subdirectories can start fast and is handled by services implemented as parallel multithreaded applications using multiple...

Full text to download in external service
MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems
Publication
- SIMULATION MODELLING PRACTICE AND THEORY - Year 2017
In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...

Full text available to download
Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems
Publication
- P. Rościszewski
- Year 2014
Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...

Full text to download in external service
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
Publication
- T. Stefański
- Progress in Electromagnetics Research-PIER - Year 2013
This paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...

Full text to download in external service
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
Publication
- P. Rościszewski
- Procedia Computer Science - Year 2017
In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

Full text available to download
A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache
Publication
- A. Malinowski
- P. Czarnul
- P. Dorożyński
- K. Czuryło
- Ł. Dorau
- M. Maciejewski
- P. Skowron
- Annals of Computer Science and Information Systems - Year 2016
While many scientiﬁc, large-scale applications are data-intensive, fast and efﬁcient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind...

Full text available to download
Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge
Publication
- P. Czarnul
- P. Rościszewski
- Year 2020
Auto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...

Full text available to download
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
Publication
- P. Czarnul
- Year 2002
This work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...

Full text to download in external service
Parallelization of Compute Intensive Applications into Workflows based on Services in BeesyCluster
Publication
- P. Czarnul
- Year 2011
The paper presents an approach for modeling, optimization and execution of workflow applications based on services that incorporates both service selection and partitioning of input data for parallel processing by parallel workflow paths. A compute-intensive workflow application for parallel integration is presented. An impact of the input data partitioning on the scalability is presented. The paper shows a comparison of the theoretical...

Full text available to download
Adaptive system for recognition of sounds indicating threats to security of people and property employing parallel processing of audio data streams
Publication
- K. Łopatka
- Year 2015
A system for recognition of threatening acoustic events employing parallel processing on a supercomputing cluster is featured. The methods for detection, parameterization and classication of acoustic events are introduced. The recognition engine is based onthreshold-based detection with adaptive threshold and Support Vector Machine classifcation. Spectral, temporal and mel-frequency descriptors are used as signal features. The...
Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption
Publication
- P. Rościszewski
- Year 2018
Many important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various characteristics. Utilizing full power of such systems requires programming parallel applications that are hybrid in two meanings: they can utilize parallelism on multiple levels at the same time and combine together programming interfaces...

Full text to download in external service
Parallel Cooperating A-Teams
Publication
- D. Barbucha
- I. Czarnowski
- P. Jędrzejowicz
- E. Ratajczak-Ropel
- I. Wierzbowska
- Year 2011
Full text to download in external service
Integration of Services into Workflow Applications
Publication
- P. Czarnul
- Year 2015
Describing state-of-the-art solutions in distributed system architectures, Integration of Services into Workflow Applications presents a concise approach to the integration of loosely coupled services into workflow applications. It discusses key challenges related to the integration of distributed systems and proposes solutions, both in terms of theoretical aspects such as models and workflow scheduling algorithms, and technical...

Full text to download in external service
Performance and Power-Aware Modeling of MPI Applications for Cluster Computing
Publication
- J. Proficz
- P. Czarnul
- Year 2016
The paper presents modeling of performance and power consumption when running parallel applications on modern cluster-based systems. The model includes basic so-called blocks representing either computations or communication. The latter includes both point-to-point and collective communication. Real measurements were performed using MPI applications and routines run on three different clusters with both Infiniband and Gigabit Ethernet...

Full text available to download
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
Publication
- P. Czarnul
- INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING - Year 2016
The paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...

Full text available to download
A Concept of Modeling and Optimization of Applications in Large Scale Systems
Publication
- P. Czarnul
- Year 2013
The chapter presents the idea that includes modeling and subsequent optimization of application execution on large scale parallel and distributed systems. The model considers performance, reliability and power consumption. It should allow easy modeling of various classes of applications while reflecting key parameters of both the applications and two classes of target systems: clusters and volunteer based systems. The chapter presents...
Conformance testing of parallel languages
Publication
- Year 2002
Przedstawiono propozycję formalizacji opisu procesu generacji, wykonania ioceny testów zgodności dla języków i bibliotek programowania równoległego, wzakresie zgodności funkcjonalnej i wydajnościowej. Przykłady ilustrujące proponowany formalizm wykorzystują platformę programowania Athapascan.
Parallel scheduling by graph ranking
Publication
- D. Dereniowski
- Year 2006
Nr dokum.: 73017Praca dotyczy jednego z nieklasycznych modeli kolorowania grafów - uporządkowanego kolorowania. Celem było uzyskanie wyników, które mogo być wykorzystane w praktycznych zastosowaniach tego modelu, do których należą: równoległe przetwarzanie zapytań w relacyjnych bazach danych, równoległa faktoryzacja macierzy metodą Choleskiego, równoległa asemblacja produktu z jego części składowych. W pracy wskazano uogólnienia...
Parallel processing of multimedia streams
Publication
- Year 2010
W artykule zaprezentowana jest nowa biblioteka wspierającą tworzenie zadań obliczeniowych, część platformy KASKADA.Przedstawiony został projekt biblioteki, uwzględniający diagram głównych klas oraz diagram sekwencji. Drugi z diagramów ukazuje współpracę głównych klas w procesie przetwarzania strumieni multimedialnych. W dalszej częsci omówione zostały szczegły mechanizmu komunikacji międzyzadawniowej oraz przedstawiony został graf...
Parallel immune system for graph coloring
Publication
- J. Dąbrowski
- Year 2008
This paper presents a parallel artificial immune system designed forgraph coloring. The algorithm is based on the clonal selection principle. Each processor operates on its own pool of antibodies and amigration mechanism is used to allow processors to exchange information. Experimental results show that migration improves the performance of the algorithm. The experiments were performed using a high performance cluster on a set...

Full text to download in external service
Scheduling of compatible jobs on parallel machines
Publication
- T. Pikies
- Year 2021
The dissertation discusses the problems of scheduling compatible jobs on parallel machines. Some jobs are incompatible, which is modeled as a binary relation on the set of jobs; the relation is often modeled by an incompatibility graph. We consider two models of machines. The first model, more emphasized in the thesis, is a classical model of scheduling, where each machine does one job at time. The second one is a model of p-batching...
Parallel Computations of Text Similarities for Categorization Task
Publication
- J. Szymański
- Year 2013
In this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....
NVRAM as Main Storage of Parallel File System
Publication
- A. Malinowski
- Journal of Computer Science and Control Systems - Year 2016
Modern cluster environments' main trouble used to be lack of computational power provided by CPUs and GPUs, but recently they suffer more and more from insufficient performance of input and output operations. Apart from better network infrastructure and more sophisticated processing algorithms, a lot of solutions base on emerging memory technologies. This paper presents evaluation of using non-volatile random-access memory as a...

Full text to download in external service
Testing for conformance of parallel programming pattern languages
Publication
- Ł. Garstecki
- P. Kaczmarek
- J. C. D. Kergommeaux
- H. Krawczyk
- B. Wiszniewski
- LECTURE NOTES IN COMPUTER SCIENCE - Year 2002
This paper reports on the project being run by TUG and IMAG, aimed at reducing the volume of tests required to exercise parallel programming language compilers and libraries. The idea is to use the ISO STEP standard scheme for conformance testing of software products. A detailed example illustrating the ongoing work is presented.
Bounds on the Cover Time of Parallel Rotor Walks
Publication
- D. Dereniowski
- A. Kosowski
- D. Pająk
- P. Uznański
- Year 2014
The rotor-router mechanism was introduced as a deterministic alternative to the random walk in undirected graphs. In this model, a set of k identical walkers is deployed in parallel, starting from a chosen subset of nodes, and moving around the graph in synchronous steps. During the process, each node maintains a cyclic ordering of its outgoing arcs, and successively propagates walkers which visit it along its outgoing arcs in...

Full text to download in external service
The parallel environment for endoscopic image analysis
Publication
- H. Krawczyk
- A. Neyman
- M. Nowikowski
- J. Saif
- Year 2002
The jPVM-oriented environment to support high performance computing required for the Endoscopy Recommender System (ERS) is defined. SPMD model of image matching is considered and its two implementations are proposed: Lexicographical Searching Algorithm (LSA) and Gradient Serching Algorithm (GSA). Three classes of experiments are considered and the relative degree of similarity and execution time of each algorithm are analysed....

Full text to download in external service
Coordination in serial-parallel image processing
Publication
- W. Wójcik
- V. Dubovoi
- M. Duda
- R. Romaniuk
- L. Yesmakhanova
- A. Kozbakova
- R. S. Romaniuk
- Year 2015
Full text to download in external service
Bounds on the cover time of parallel rotor walks
Publication
- D. Dereniowski
- A. Kosowski
- D. Pająk
- P. Uznański
- JOURNAL OF COMPUTER AND SYSTEM SCIENCES - Year 2016
The rotor-router mechanism was introduced as a deterministic alternative to the random walk in undirected graphs. In this model, a set of k identical walkers is deployed in parallel, starting from a chosen subset of nodes, and moving around the graph in synchronous steps. During the process, each node successively propagates walkers visiting it along its outgoing arcs in round-robin fashion, according to a fixed ordering. We consider...

Full text available to download

Search

Filters

Catalog

Search results for: PARALLEL APPLICATIONS

Paweł Czarnul dr hab. inż.