Search results for: cluster intel xeon phi

Search results for: cluster intel xeon phi

results on page:
embed this view on your website

Filters

total: 24

clear all filters disabled

Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
Publication
- P. Czarnul
- Year 2018
The paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...

Full text to download in external service
Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi
Publication
- A. Malinowski
- International Journal of Information Technology and Computer Science - Year 2015
Parallel algorithms are popular method of increasing system performance. Apart from showing their properties using asymptotic analysis, proof-of-concept implementation and practical experiments are often required. In order to speed up the development and provide simple and easily accessible testing environment that enables execution of reliable experiments, the paper proposes a platform with multi-core computational accelerator:...

Full text to download in external service
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
Publication
- P. Czarnul
- INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING - Year 2016
The paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...

Full text available to download
Parallelization of Selected Algorithms on Multi-core CPUs, a Cluster and in a Hybrid CPU+Xeon Phi Environment
Publication
- A. Krzywaniak
- P. Czarnul
- Advances in Intelligent Systems and Computing - Year 2017
In the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The parallel codes, implementing the master-slave model in C+MPI, differ in computation to communication ratios. The considered problems include: a genetic algorithm with various ratios of master processing time to communication...

Full text available to download
Extended investigation of performance-energy trade-offs under power capping in HPC environments
Publication
- Year 2019
—In the paper we present investigation of performance-energy trade-offs under power capping using modern processors. The results are presented for systems targeted at both server and client markets and were collected from Intel Xeon E5 and Intel Xeon Phi server processors as well as from desktop and mobile Intel Core i7 processors. The results, when using power capping, show that we can find various interesting combinations of...
Parallel Programming for Modern High Performance Computing Systems
Publication
- P. Czarnul
- Year 2018
In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Full text to download in external service
Superkomputer Tryton

Research Equipment
Dział Komputerów Dużej Mocy

Superkomputer o architekturze klastrowej
Paweł Czarnul dr hab. inż.

People

Dział Usług Chmurowych, Faculty of Electronics, Telecommunications and Informatics, Department of Computer Architecture

Paweł Czarnul obtained a D.Sc. degree in computer science in 2015, a Ph.D. in computer science granted by a council at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology in 2003. His research interests include:parallel and distributed processing including clusters, accelerators, coprocessors; distributed information systems; architectures of distributed systems; programming mobile devices....
Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs
Publication
- B. Gawrych
- P. Czarnul
- Year 2023
Considering ongoing developments of both modern CPUs, especially in the context of increasing numbers of cores, cache memory and architectures as well as compilers there is a constant need for benchmarking representative and frequently run workloads. The key metric is speed-up as the computational power of modern CPUs stems mainly from using multiple cores. In this paper, we show and discuss results from running codes such as:...

Full text to download in external service
Block-based Representation of Application Execution on Modern Parallel Systems
Publication
- P. Czarnul
- Year 2013
The chapter presents how to model execution of a parallel computational application that is to be executed in a large-scale parallel or distributed environment with potentially thousands to millions of execution units. The representation uses pre- viously attributes and factors representative of modern high performance systems including multicore CPUs, GPUs, dedicated accelerators such as Intel Phi.
GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM
Publication
- Communications in Computational Physics - Year 2017
This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...

Full text to download in external service
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
Publication
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Year 2018
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Full text available to download
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
Publication
- P. Czarnul
- Electronics - Year 2021
The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Full text available to download
GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]
Publication
- IEEE ANTENNAS AND PROPAGATION MAGAZINE - Year 2014
This paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...

Full text to download in external service
Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method
Publication
- A. Dziekoński
- M. Mrozowski
- RADIOENGINEERING - Year 2018
This paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...

Full text available to download
DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing
Publication
- SOFTWARE-PRACTICE & EXPERIENCE - Year 2022
In the article we propose an automatic power capping software tool DEPO that allows one to perform runtime optimization of performance and energy related metrics. For an assumed application model with an initialization phase followed by a running phase with uniform compute and memory intensity, the tool performs automatic tuning engaging one of the two exploration algorithms—linear search (LS) and golden section search (GSS), finds...

Full text to download in external service
Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
Publication
- A. Dziekoński
- M. Mrozowski
- IEEE Antennas and Wireless Propagation Letters - Year 2018
In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Full text to download in external service
A memory efficient and fast sparse matrix vector product on a Gpu
Publication
- Progress in Electromagnetics Research-PIER - Year 2011
This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Full text to download in external service
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
Publication
- J. Skrzypczak
- P. Czarnul
- SIMULATION MODELLING PRACTICE AND THEORY - Year 2023
In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Full text to download in external service
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
Publication
- T. M. Boiński
- P. Czarnul
- COMPUTER JOURNAL - Year 2021
In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...

Full text available to download
A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM
Publication
- A. Dziekoński
- M. Mrozowski
- IEEE Access - Year 2018
The paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...

Full text available to download
Generation of large finite-element matrices on multiple graphics processors
Publication
- INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING - Year 2013
This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...

Full text to download in external service
Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs
Publication
- A. Dziekoński
- G. Fotyga
- M. Mrozowski
- IEEE Access - Year 2018
This paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...

Full text available to download
Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA
Publication
- A. Dziekoński
- Year 2015
Celem niniejszej rozprawy oraz stypendium odbytego w ramach projektu było opracowanie numerycznie efektywnego rozwiązania algorytmicznego i sprzętowego, które umożliwia przyspieszenie analizy problemów elektromagnetycznych metodą elementów skończonych (MES) z funkcjami bazowymi wysokiego rzędu. Metoda elementów skończonych w dziedzinie częstotliwości stanowi wydajne i uniwersalne narzędzie analizy układów mikrofalowych (rys....

Search

Filters

Catalog

Search results for: cluster intel xeon phi

Paweł Czarnul dr hab. inż.