Search results for: CPU
-
Evaluation the effectiveness of virtual machine integrated with CPU
PublicationIn the paper effectiveness of example CPU with integrated virtual machine is presented. The idea and implementation of virtual machine is shown. In next sections reference CPU and sample virtual machine is described. Finally optimality of the translation process is analysed.
-
Parallelization of Selected Algorithms on Multi-core CPUs, a Cluster and in a Hybrid CPU+Xeon Phi Environment
PublicationIn the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The parallel codes, implementing the master-slave model in C+MPI, differ in computation to communication ratios. The considered problems include: a genetic algorithm with various ratios of master processing time to communication...
-
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
PublicationThe paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...
-
Study on CPU and RAM Resource Consumption of Mobile Devices using Streaming Services
PublicationStreaming multimedia services have become very popular in recent years, due to the development of wireless networks. With the growing number of mobile devices worldwide, service providers offer dedicated applications that allow to deliver on-demand audio and video content anytime and everywhere. The aim of this study was to compare different streaming services and investigate their impact on the CPU and RAM resources, with respect...
-
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
PublicationThis paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...
-
The Speedup Analysis in GEM Detector Based Acquisition System Algorithms with CPU and PCIe Cards
Publication -
CPU-e Revista de Investigacion Educativa
Journals -
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
PublicationIn the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...
-
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
PublicationThe paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...
-
Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge
PublicationAuto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...
-
Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
PublicationThis letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow...
-
Multi Queue Approach for Network Services Implemented for Multi Core CPUs
PublicationMultiple core processors have already became the dominant design for general purpose CPUs. Incarnations of this technology are present in solutions dedicated to such areas like computer graphics, signal processing and also computer networking. Since the key functionality of network core components is fast package servicing, multicore technology, due to multi tasking ability, seems useful to support packet processing. Dedicated...
-
Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs
PublicationConsidering ongoing developments of both modern CPUs, especially in the context of increasing numbers of cores, cache memory and architectures as well as compilers there is a constant need for benchmarking representative and frequently run workloads. The key metric is speed-up as the computational power of modern CPUs stems mainly from using multiple cores. In this paper, we show and discuss results from running codes such as:...
-
Modelowanie wydajności, niezawodności i zużycia energii wilopoziomowych systemów równoległych wielkiej skali z uwzględnieniem CPU oraz GPU
ProjectsProject realized in Faculty of Electronics, Telecommunications and Informatics according to UMO-2012/07/B/ST6/01516 agreement from 2013-07-17
-
Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs
PublicationThe paper proposes an approach for parallelization of computations across a collection of clusters with heterogeneous nodes with both GPUs and CPUs. The proposed system partitions input data into chunks and assigns to par- ticular devices for processing using OpenCL kernels defined by the user. The sys- tem is able to minimize the execution time of the application while maintaining the power consumption of the utilized GPUs and...
-
KernelHive: a new workflow-based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs
PublicationThe paper presents a new open-source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available....
-
Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs
PublicationThis paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...
-
Programowanie równoległe na architekturach wielordzeniowych
e-Learning CoursesKurs poświęcony zagadnieniom programowania równoległego na maszynach z pamięcią współdzieloną, w tym na wielordzeniowych CPU oraz GPU.
-
Programowanie równoległe na architekturach wielordzeniowych (2023-24)
e-Learning CoursesKurs poświęcony zagadnieniom programowania równoległego na maszynach z pamięcią współdzieloną, w tym na wielordzeniowych CPU oraz GPU.
-
Paweł Czarnul dr hab. inż.
PeoplePaweł Czarnul obtained a D.Sc. degree in computer science in 2015, a Ph.D. in computer science granted by a council at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology in 2003. His research interests include:parallel and distributed processing including clusters, accelerators, coprocessors; distributed information systems; architectures of distributed systems; programming mobile devices....
-
Implementation of Coprocessor for Integer Multiple Precision Arithmetic on Zynq Ultrascale+ MPSoC
PublicationRecently, we have opened the source code of coprocessor for multiple-precision arithmetic (MPA). In this contribution, the implementation and benchmarking results for this MPA coprocessor are presented on modern Zynq Ultrascale+ multiprocessor system on chip, which combines field-programmable gate array with quad-core ARM Cortex-A53 64-bit central processing unit (CPU). In our benchmark, a single coprocessor can be up to 4.5 times...
-
Finite element matrix generation on a GPU
PublicationThis paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a Fermi GPU (1x Tesla C2075) and a CPU (2x twelve-core...
-
Jacobi and gauss-seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method
PublicationIn this paper two implementations of iterative solvers for solving complex symmetric and sparse systems resulting from finite element method applied to wave equation are discussed. The problem under investigation is a dielectric resonator antenna (DRA) discretized by FEM with vector elements of the second order (LT/QN). The solvers use the preconditioned conjugate gradient (pcg) method implemented on Graphics Processing Unit (GPU)...
-
Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA
PublicationLarge-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems is challenging due to a large memory footprint, sparse computational patterns, and irregular memory accesses with poor locality. Intel’s Programmable Integrated Unffied Memory Architecture (PIUMA) is designed to address these challenges for graph analytics. In this paper, a detailed characterization of GCNs is presented using the Open-Graph Benchmark...
-
Acceleration of Electromagnetic Simulations on Reconfigurable FPGA Card
PublicationIn this contribution, the hardware acceleration of electromagnetic simulations on the reconfigurable field-programmable-gate-array (FPGA) card is presented. In the developed implementation of scientific computations, the matrix-assembly phase of the method of moments (MoM) is accelerated on the Xilinx Alveo U200 card. The computational method involves discretization of the frequency-domain mixed potential integral equation using...
-
GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM
PublicationThis paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...
-
Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology
PublicationThe discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...
-
Acceleration of the DGF-FDTD method on GPU using the CUDA technology
PublicationWe present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...
-
FPGA Acceleration of Matrix-Assembly Phase of RWG-Based MoM
PublicationIn this letter, the field-programmable-gate-array accelerated implementation of matrix-assembly phase of the method of moments (MoM) is presented. The solution is based on a discretization of the frequency-domain mixed potential integral equation using the Rao-Wilton-Glisson basis functions and their extension to wire-to-surface junctions. To take advantage of the given hardware resources (i.e., Xilinx Alveo U200 accelerator card),...
-
Serce dzielnicy w stanie embrionalnego rozwoju - śrómieście reurbanizowanego wielkiego osiedla w polskim mieście metropolitalnym = Heart of the districts in embrional faze of development -city core structures for the large scale housing in Polish metropolitan city
PublicationW ostatnich 10 latach obserwowana jest intensyfikacja procesu uzupełniania zabudowy śródmieścia Gdańska. Jest to z jednej strony kontynuacja odbudowy historycznego centrum miasta, z drugiej zaś rewitalizacja dzielnic o przewadze zabudowy z początku wieku obejmująca śródmiejskie tereny powojskowe i poprzemysłowe m.in dawnej Stoczni Gdańskiej, koszaty we Wrzeszczu. Liczne komercyjne projekty prowadzą do istotnych dogęszczeń funcjami...
-
Generation of large finite-element matrices on multiple graphics processors
PublicationThis paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...
-
Open-Source Coprocessor for Integer Multiple Precision Arithmetic
PublicationThis paper presents an open-source digital circuit of the coprocessor for an integer multiple-precision arithmetic (MPA). The purpose of this coprocessor is to support a central processing unit (CPU) by offloading computations requiring integer precision higher than 32/64 bits. The coprocessor is developed using the very high speed integrated circuit hardware description language (VHDL) as an intellectual property (IP) core. Therefore,...
-
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
PublicationIn the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...
-
Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method
PublicationThis paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...
-
Multi-core and Multiprocessor Implementation of Numerical Integration in Finite Element Method
PublicationThe paper presents techniques for accelerating a numerical integration process which appears in the Finite Element Method. The acceleration is achieved by taking advantages of multi-core and multiprocessor devices. It is shown that using multi-core implementation with OpenMP and a GPU acceleration using CUDA architecture allows one to achieve the speedups by a factor of 5 and 10 on a CPU and GPUs, respectively.
-
Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
PublicationIn this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...
-
Fast clutter cancellation for noise radars via waveform design
PublicationCanceling clutter is an important, but very expensive part of signal processing in noise radars. It is shown that considerable improvements can be made to a simple least squares canceler if minor constraints are imposed onto noise waveform. Using a combination of FPGA and CPU, the proposed scheme is capable of canceling both stationary clutter and moving targets in real-time, even for high sampling rates.
-
Document Agents with the Intelligent Negotiations Capability
PublicationThe paper focus is on augmenting proactive document-agents with built -in intelligence to enable them to recognize execution context provided by devices visited durning the business process, and to reach collaboration agreement despite of their conflicting requirements. We propose a solution based on neural networks to improve simple multi-issue negotiation between the document and the device, practically with no excessive cost...
-
Multi-level Virtualization and Its Impact on System Performance in Cloud Computing
PublicationThe results of benchmarking tests of multi-level virtualized environments are presented. There is analysed the performance impact of hardware virtualization, container-type isolation and programming level abstraction. The comparison is made on the basis of a proposed score metric that allows you to compare different aspects of performance. There is general performance (CPU and memory), networking, disk operations and application-like...
-
Towards an efficient multi-stage Riemann solver for nuclear physics simulations
PublicationRelativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....
-
AUTOMATED NEGOTIATIONS OVER COLLABORATION PROTOCOL AGREEMENTS
PublicationThe dissertation focuses on the augmentation of proactive document - agents with built-in intelligence to recognize execution context provided by devices visited during a business process, and to reach collaboration agreement despite conflicting requirements. The proposed solution, based on intelligent bargaining using neural networks to improve simple multi-issue negotiation between the document and thedevice, requires practically...
-
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
PublicationIn the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...
-
GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method
PublicationThe letter discusses a fast implementation of the conjugate gradient iterative method with ${rm E}$-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results...
-
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
PublicationThe paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...
-
GPU based implementation of Temperature-Vegetation Dryness Index for AVHRR3 Satellite Data
PublicationPaper presents an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland. All phases of TVDI implementation on GPU are modified in respect to CUDA platform....
-
Using Disparity Map for Moving Object Position Estimation in Pan Tilt Camera Images
PublicationIn this paper we present the algorithm for rapid moving object position estimation in an images acquired from pan tilt camera. Detection of a moving object in a image acquired from a moving camera might be quite challenging. Standard methods that relay on analyzing two consecutive frames are not applicable due to the changing background. To overtake this problem we decided to evaluate the possibility of calculating a disparity...
-
DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing
PublicationIn the article we propose an automatic power capping software tool DEPO that allows one to perform runtime optimization of performance and energy related metrics. For an assumed application model with an initialization phase followed by a running phase with uniform compute and memory intensity, the tool performs automatic tuning engaging one of the two exploration algorithms—linear search (LS) and golden section search (GSS), finds...
-
Implementation of TVDI calculation for coastal zone
PublicationPaper will show an implementation of TVDI (Temperature-Vegetation-Dryness Index) algorithm on GPU (Graphics Processing Unit). Calculation of this index is based on LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index). Discussed results are based on multi-spectral imagery retrieved from AVHRR3 sensors for area of Poland, especially from region of Gdańsk coastal zone. All phases of TVDI implementation...
-
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
PublicationIn the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...
-
Optymalizacja zasobów chmury obliczeniowej z wykorzystaniem inteligentnych agentów w zdalnym nauczaniu
PublicationRozprawa dotyczy optymalizacji zasobów chmury obliczeniowej, w której zastosowano inteligentne agenty w zdalnym nauczaniu. Zagadnienie jest istotne w edukacji, gdzie wykorzystuje się nowoczesne technologie, takie jak Internet Rzeczy, rozszerzoną i wirtualną rzeczywistość oraz deep learning w środowisku chmury obliczeniowej. Zagadnienie jest istotne również w sytuacji, gdy pandemia wymusza stosowanie zdalnego nauczania na dużą skalę...
-
Big Data and the Internet of Things in Edge Computing for Smart City
PublicationRequests expressing collective human expectations and outcomes from city service tasks can be partially satisfied by processing Big Data provided to a city cloud via the Internet of Things. To improve the efficiency of the city clouds an edge computing has been introduced regarding Big Data mining. This intelligent and efficient distributed system can be developed for citizens that are supposed to be informed and educated by the...
-
Tuning matrix-vector multiplication on GPU
PublicationA matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
-
Linux scheduler improvement for time demanding network applications, running on Communication Platform Systems
PublicationCommunication Platform Systems as ex. ATCA standard blades located in standardized chassis provides high level communication services between system peripherals. Each ATCA blade brings dedicated functionality to the system but can as well exist as separated host responsible for servicing set of task. According to platform philosophy these parts of system can be quite independent against another solutions provided by competitors....
-
GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]
PublicationThis paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...
-
A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems
PublicationModern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...
-
Design of a Multidomain IMS/NGN Service Stratum
PublicationThe paper continues our research concerning the Next Generation Network (NGN), which is standardized for delivering multimedia services with strict quality and includes elements of the IP Multimedia Subsystem (IMS). A design algorithm for a multidomain IMS/NGN service stratum is proposed, which calculates the necessary CSCF servers CPU message processing times and link bandwidths with respect to the given maximum values of mean...
-
An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method
PublicationIn this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in...
-
Scalability of surrogate-assisted multi-objective optimization of antenna structures exploiting variable-fidelity electromagnetic simulation models
PublicationMulti-objective optimization of antenna structures is a challenging task due to high-computational cost of evaluating the design objectives as well as large number of adjustable parameters. Design speedup can be achieved by means of surrogate-based optimization techniques. In particular, a combination of variable-fidelity electromagnetic (EM) simulations, design space reduction techniques, response surface approximation (RSA) models,...
-
Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
PublicationIn this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...
-
Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs
PublicationIn the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...
-
Cost-Efficient Design Methodology for Compact Rat-Race Couplers
PublicationIn this article, a reliable and low-cost design methodology for simulation-driven optimization of miniaturized rat-race couplers (RRCs) is presented. We exploit a two-stage design approach, where a composite structure (a basic building block of the RRC structure) is first optimized using a pattern search algorithm, and, subsequently, the entire coupler is tuned by means of surrogate-based optimization (SBO) procedure. SBO is executed...
-
A memory efficient and fast sparse matrix vector product on a Gpu
PublicationThis paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...
-
A highly-efficient technique for evaluating bond-orientational order parameters
PublicationWe propose a novel, highly-efficient approach for the evaluation of bond-orientational order parameters (BOPs). Our approach exploits the properties of spherical harmonics and Wigner 3jj-symbols to reduce the number of terms in the expressions for BOPs, and employs simultaneous interpolation of normalised associated Legendre polynomials and trigonometric functions to dramatically reduce the total number of arithmetic operations....
-
Expedited Gradient-Based Design Closure of Antennas Using Variable-Resolution Simulations and Sparse Sensitivity Updates
PublicationNumerical optimization has been playing an increasingly important role in the design of contemporary antenna systems. Due to the shortage of design-ready theoretical models, optimization is mainly based on electromagnetic (EM) analysis, which tends to be costly. Numerous techniques have evolved to abate this cost, including surrogate-assisted frameworks for global optimization, or sparse sensitivity updates for speeding up local...
-
Reduced-Cost Constrained Modeling of Microwave and Antenna Components: Recent Advances
PublicationElectromagnetic (EM) simulation models are ubiquitous in the design of microwave and antenna components. EM analysis is reliable but CPU intensive. In particular, multiple simulations entailed by parametric optimization or uncertainty quantification may considerably slow down the design processes. In order to address this problem, it is possible to employ fast metamodels. Here, the popular solution approaches are approximation...
-
Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA
PublicationCelem niniejszej rozprawy oraz stypendium odbytego w ramach projektu było opracowanie numerycznie efektywnego rozwiązania algorytmicznego i sprzętowego, które umożliwia przyspieszenie analizy problemów elektromagnetycznych metodą elementów skończonych (MES) z funkcjami bazowymi wysokiego rzędu. Metoda elementów skończonych w dziedzinie częstotliwości stanowi wydajne i uniwersalne narzędzie analizy układów mikrofalowych (rys....
-
A distributed system for conducting chess games in parallel
PublicationThis paper proposes a distributed and scalable cloud based system designed to play chess games in parallel. Games can be played between chess engines alone or between clusters created by combined chess engines. The system has a built-in mechanism that compares engines, based on Elo ranking which finally presents the strength of each tested approach. If an approach needs more computational power, the design of the system allows...
-
Triangulation-based Constrained Surrogate Modeling of Antennas
PublicationDesign of contemporary antenna structures is heavily based on full-wave electromagnetic (EM) simulation tools. They provide accuracy but are CPU-intensive. Reduction of EM-driven design procedure cost can be achieved by using fast replacement models (surrogates). Unfortunately, standard modeling techniques are unable to ensure sufficient predictive power for real-world antenna structures (multiple parameters, wide parameter ranges,...
-
Two-Stage Variable-Fidelity Modeling of Antennas with Domain Confinement
PublicationSurrogate modeling has become the method of choice in solving an increasing number of antenna design tasks, especially those involving expensive full-wave electromagnetic (EM) simulations. Notwithstanding, the curse of dimensionality considerably affects conventional metamodeling methods, and their capability to efficiently handle nonlinear antenna characteristics over broad ranges of the system parameters is limited. Performance-driven...
-
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublicationIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
-
Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn - ciało stałe
PublicationW artykule po przedstawieniu podstawowych informacji na temat technologii GPGPU oraz struktury NVIDIA CUDA opisano równania zachowania rządzące przepływami oraz ich dyskretyzację numeryczna. Następnie zbadano możliwości wykorzystania technologii GPGPU w celu zoptymalizowania czasu wykonywania obliczeń numerycznych przepływu przez ośrodek dwufazowy (płyn - cząsteczki ciała stała stałego) zbliżony do ośrodka porowatego. W tym celu,...
-
Expedited Simulation-Driven Multi-Objective Design Optimization of Quasi-Isotropic Dielectric Resonator Antenna
PublicationMajority of practical engineering design problems require simultaneous handling of several criteria. Although many of design tasks can be turned into single-objective problems using sufficient formulations, in some situations, acquiring comprehensive knowledge about possible trade-offs between conflicting objectives may be necessary. This calls for multi-objective optimization that aims at identifying a set of alternative, Pareto-optimal...
-
Fast EM-Driven Parameter Tuning of Microwave Circuits with Sparse Sensitivity Updates via Principal Directions
PublicationNumerical optimization has become more important than ever in the design of microwave components and systems, primarily as a consequence of increasing performance demands and growing complexity of the circuits. As the parameter tuning is more and more often executed using full-wave electromagnetic (EM) models, the CPU cost of the overall process tends to be excessive even for local optimization. Some ways of alleviating these issues...
-
Design-oriented computationally-efficient feature-based surrogate modelling of multi-band antennas with nested kriging
PublicationDesign of modern antenna structures heavily depends on electromagnetic (EM) simulation tools. EM analysis provides reliable evaluation of increasingly complex designs but tends to be CPU intensive. When multiple simulations are needed (e.g., for parameters tuning), the aggregated simulation cost may become a serious bottleneck. As one possible way of mitigating the issue, the recent literature fosters utilization of faster representations,...
-
Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn-ciało stałe
PublicationW artykule po przedstawieniu podstawowych informacji na temat technologii GPGPU oraz struktury NVIDIA CUDA opisano równania zachowania rządzące przepływami oraz ich dyskretyzację numeryczna. Następnie zbadano możliwości wykorzystania technologii GPGPU w celu zoptymalizowania czasu wykonywania obliczeń numerycznych przepływu przez ośrodek dwufazowy (płyn - cząsteczki ciała stała stałego) zbliżony do ośrodka porowatego. W tym celu,...
-
Reliable Surrogate Modeling of Antenna Input Characteristics by Means of Domain Confinement and Principal Components
PublicationA reliable design of contemporary antenna structures necessarily involves full-wave electromagnetic (EM) analysis which is the only tool capable of accounting, for example, for element coupling or the effects of connectors. As EM simulations tend to be CPU-intensive, surrogate modeling allows for relieving the computational overhead of design tasks that require numerous analyses, for example, parametric optimization or uncertainty...
-
Efficient Simulation-Based Global Antenna Optimization Using Characteristic Point Method and Nature-Inspired Metaheuristics
PublicationAntenna structures are designed nowadays to fulfil rigorous demands, including multi-band operation, where the center frequencies need to be precisely allocated at the assumed targets while improving other features, such as impedance matching. Achieving this requires simultaneous optimization of antenna geometry parameters. When considering multimodal problems or if a reasonable initial design is not at hand, one needs to rely...
-
Simulation-Driven Antenna Modeling by Means of Response Features and Confined Domains of Reduced Dimensionality
PublicationIn recent years, the employment of full-wave electromagnetic (EM) simulation tools has become imperative in the antenna design mainly for reliability reasons. While the CPU cost of a single simulation is rarely an issue, the computational overhead associated with EM-driven tasks that require massive EM analyses may become a serious bottleneck. A widely used approach to lessen this cost is the employment of surrogate models, especially...
-
Smaller Representation of Finite State Automata
PublicationThis paper is a follow-up to Jan Daciuk's experiments on space-effcient finite state automata representation that can be used directly for traversals in main memory. We investigate several techniques of reducing memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve a gain of around 20-30%...
-
Implementation of Addition and Subtraction Operations in Multiple Precision Arithmetic
PublicationIn this paper, we present a digital circuit of arithmetic unit implementing addition and subtraction operations in multiple-precision arithmetic (MPA). This adder-subtractor unit is a part of MPA coprocessor supporting and offloading the central processing unit (CPU) in computations requiring precision higher than 32/64 bits. Although addition and subtraction operations of two n-digit numbers require O(n) operations, the efficient...
-
Expedited EM-Driven Design of Miniaturized Microwave Hybrid Couplers Using Surrogate-Based Optimization
PublicationMiniaturization of microwave hybrid couplers is important for contemporary wireless communication engineering. Using standard computer-aided design methods for development of compact structures is extremely challenging due to a general lack of computationally efficient and accurate simulation models. Poor accuracy of available equivalent circuits results from neglecting parasitic cross-couplings that greatly affect the performance...
-
Surrogate-assisted EM-driven miniaturization of wideband microwave couplers by means of co-simulation low-fidelity models
PublicationThis article proposes a methodology for rapid design optimization of miniaturized wideband couplers. More specifically, a class of circuits is considered, in which conventional transmission lines are replaced by their abbreviated counterparts referred to as slow-wave compact cells. Our focus is on explicit reduction of the structure size as well as on reducing the CPU cost of the design process. For the sake of computational feasibility,...
-
A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM
PublicationThe paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...
-
Reduced-Cost Microwave Modeling Using Constrained Domains and Dimensionality Reduction
PublicationDevelopment of modern microwave devices largely exploits full-wave electromagnetic (EM) simulations. Yet, simulation-driven design may be problematic due to the incurred CPU expenses. Addressing the high-cost issues stimulated the development of surrogate modeling methods. Among them, data-driven techniques seem to be the most widespread owing to their flexibility and accessibility. Nonetheless, applicability of approximation-based...
-
Reduced-cost optimization-based miniaturization of microwave passives by multi-resolution EM simulations for internet of things and space-limited applications
PublicationStringent performance specifications along with constraints imposed on physical dimensions, make the design of contemporary microwave components a truly onerous task. In recent years, the latter demand has been growing in importance, with the innovative application areas such as Internet of Things coming into play. The need to employ full-wave electromagnetic (EM) simu-lations for response evaluation, reliable yet CPU heavy, only...
-
Rapid Multi-Criterial Antenna Optimization by Means of Pareto Front Triangulation and Interpolative Design Predictors
PublicationModern antenna systems are designed to meet stringent performance requirements pertinent to both their electrical and field properties. The objectives typically stay in conflict with each other. As the simultaneous improvement of all performance parameters is rarely possible, compromise solutions have to be sought. The most comprehensive information about available design trade-offs can be obtained through multi-objective optimization...
-
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
PublicationThe paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...
-
Stanowisko badawczo-dydaktyczne "Ploter 3-osiowy"
Research Equipment -
Smaller representation of finite state automata
PublicationThis paper is a follow-up to Jan Daciuk's experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory (Daciuk, 2000)[4]. We investigate several techniques for reducing memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve...
-
Advanced Potential Energy Surfaces for Molecular Simulation
PublicationAdvanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models...
-
Variable‐fidelity modeling of antenna input characteristics using domain confinement and two‐stage Gaussian process regression surrogates
PublicationThe major bottleneck of electromagnetic (EM)-driven antenna design is the high CPU cost of massive simulations required by parametric optimization, uncertainty quantification, or robust design procedures. Fast surrogate models may be employed to mitigate this issue to a certain extent. Unfortunately, the curse of dimensionality is a serious limiting factor, hindering the construction of conventional data-driven models valid over...
-
DL_MG: A Parallel Multigrid Poisson and Poisson–Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution
PublicationThe solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential -- a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the...
-
IP Core of Coprocessor for Multiple-Precision-Arithmetic Computations
PublicationIn this paper, we present an IP core of coprocessor supporting computations requiring integer multiple-precision arithmetic (MPA). Whilst standard 32/64-bit arithmetic is sufficient to solve many computing problems, there are still applications that require higher numerical precision. Hence, the purpose of the developed coprocessor is to support and offload central processing unit (CPU) in such computations. The developed digital...
-
Expedited Re-Design of Multi-Band Passive Microwave Circuits Using Orthogonal Scaling Directions and Gradient-Based Tuning
PublicationGeometry scaling of microwave circuits is an essential but challenging task. In particular, the employment of a given passive structure in a different application area often requires re-adjustment of the operating frequencies/bands while maintaining top performance. Achieving this necessitates utilization of numerical optimization methods. Nonetheless, if the intended frequencies are distant from the ones at the starting point,...
-
Expedited Metaheuristic-Based Antenna Optimization Using EM Model Resolution Management
PublicationDesign of modern antenna systems heavily relies on numerical opti-mization methods. Their primary purpose is performance improvement by tun-ing of geometry and material parameters of the antenna under study. For relia-bility, the process has to be conducted using full-wave electromagnetic (EM) simulation models, which are associated with sizable computational expendi-tures. The problem is aggravated in the case of global optimization,...
-
Knowledge-Based Expedited Parameter Tuning of Microwave Passives by Means of Design Requirement Management and Variable-Resolution EM Simulations
PublicationThe importance of numerical optimization techniques has been continually growing in the design of microwave components over the recent years. Although reasonable initial designs can be obtained using circuit theory tools, precise parameter tuning is still necessary to account for effects such as electromagnetic (EM) cross coupling or radiation losses. EM-driven design closure is most often realized using gradient-based procedures,...
-
Expedited Acquisition of Database Designs for Reduced-Cost Performance-Driven Modeling and Rapid Dimension Scaling of Antenna Structures
PublicationFast replacement models have been playing an increasing role in high-frequency electronics, including the design of antenna structures. Their role is to improve computational efficiency of the procedures that normally entail large numbers of expensive full-wave electromagnetic (EM) simulations, e.g., parametric optimization or uncertainty quantification. Recently introduced performance-driven modeling methods, such as the nested...
-
Improved Modeling of Microwave Structures Using Performance-Driven Fully-Connected Regression Surrogate
PublicationFast replacement models (or surrogates) have been widely applied in the recent years to accelerate simulation-driven design procedures in microwave engineering. The fundamental reason is a considerable—and often prohibitive—CPU cost of massive full-wave electromagnetic (EM) analyses related to solving common tasks such as parametric optimization or uncertainty quantification. The most popular class of surrogates are data-driven...
-
Multi-objective optimization of expensive electromagnetic simulation models
PublicationVast majority of practical engineering design problems require simultaneous handling of several criteria. For the sake of simplicity and through a priori preference articulation one can turn many design tasks into single-objective problems that can be handled using conventional numerical optimization routines. However, in some situations, acquiring comprehensive knowledge about the system at hand, in particular, about possible...
-
Rapid tolerance‐aware design of miniaturized microwave passives by means of confined‐domain surrogates
PublicationThe effects of uncertainties, primarily manufacturing tolerances but also incomplete information about operating conditions or material parameters, can be detrimental to the performance of microwave components. Quantification of such effects is essential to ensure a meaningful evaluation of the structure, in particular, its reliability under imperfect fabrication procedures. The improvement of the circuit robustness can be achieved...