Search results for: cpu
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
PublicationIn the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...
Optymalizacja zasobów chmury obliczeniowej z wykorzystaniem inteligentnych agentów w zdalnym nauczaniu
PublicationRozprawa dotyczy optymalizacji zasobów chmury obliczeniowej, w której zastosowano inteligentne agenty w zdalnym nauczaniu. Zagadnienie jest istotne w edukacji, gdzie wykorzystuje się nowoczesne technologie, takie jak Internet Rzeczy, rozszerzoną i wirtualną rzeczywistość oraz deep learning w środowisku chmury obliczeniowej. Zagadnienie jest istotne również w sytuacji, gdy pandemia wymusza stosowanie zdalnego nauczania na dużą skalę...
Big Data and the Internet of Things in Edge Computing for Smart City
PublicationRequests expressing collective human expectations and outcomes from city service tasks can be partially satisfied by processing Big Data provided to a city cloud via the Internet of Things. To improve the efficiency of the city clouds an edge computing has been introduced regarding Big Data mining. This intelligent and efficient distributed system can be developed for citizens that are supposed to be informed and educated by the...
Tuning matrix-vector multiplication on GPU
PublicationA matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
Analysis of cores affinity within the containerized environment based on selected IOT middleware - observations and recommendations
PublicationThe Internet of Things gets bigger and bigger audiences. This topic is really popular in science and also in industry. There are many fields for research. One of them is efficient deployment against resource utilization. Another one is containerization within IoT platforms. One of the commonalities of these two topics is different CPU affinity against containerized platforms to get the best performance. There were plenty of papers...
Linux scheduler improvement for time demanding network applications, running on Communication Platform Systems
PublicationCommunication Platform Systems as ex. ATCA standard blades located in standardized chassis provides high level communication services between system peripherals. Each ATCA blade brings dedicated functionality to the system but can as well exist as separated host responsible for servicing set of task. According to platform philosophy these parts of system can be quite independent against another solutions provided by competitors....
Design of a Multidomain IMS/NGN Service Stratum
PublicationThe paper continues our research concerning the Next Generation Network (NGN), which is standardized for delivering multimedia services with strict quality and includes elements of the IP Multimedia Subsystem (IMS). A design algorithm for a multidomain IMS/NGN service stratum is proposed, which calculates the necessary CSCF servers CPU message processing times and link bandwidths with respect to the given maximum values of mean...
GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]
PublicationThis paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...
A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems
PublicationModern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...
An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method
PublicationIn this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in...
Scalability of surrogate-assisted multi-objective optimization of antenna structures exploiting variable-fidelity electromagnetic simulation models
PublicationMulti-objective optimization of antenna structures is a challenging task due to high-computational cost of evaluating the design objectives as well as large number of adjustable parameters. Design speedup can be achieved by means of surrogate-based optimization techniques. In particular, a combination of variable-fidelity electromagnetic (EM) simulations, design space reduction techniques, response surface approximation (RSA) models,...
Sign Language Recognition Using Convolution Neural Networks
PublicationThe objective of this work was to provide an app that can automatically recognize hand gestures from the American Sign Language (ASL) on mobile devices. The app employs a model based on Convolutional Neural Network (CNN) for gesture classification. Various CNN architectures and optimization strategies suitable for devices with limited resources were examined. InceptionV3 and VGG-19 models exhibited negligibly higher accuracy than...
Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
PublicationIn this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neural networks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were...
Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs
PublicationIn the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...
Cost-Efficient Design Methodology for Compact Rat-Race Couplers
PublicationIn this article, a reliable and low-cost design methodology for simulation-driven optimization of miniaturized rat-race couplers (RRCs) is presented. We exploit a two-stage design approach, where a composite structure (a basic building block of the RRC structure) is first optimized using a pattern search algorithm, and, subsequently, the entire coupler is tuned by means of surrogate-based optimization (SBO) procedure. SBO is executed...
A memory efficient and fast sparse matrix vector product on a Gpu
PublicationThis paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...
A highly-efficient technique for evaluating bond-orientational order parameters
PublicationWe propose a novel, highly-efficient approach for the evaluation of bond-orientational order parameters (BOPs). Our approach exploits the properties of spherical harmonics and Wigner 3jj-symbols to reduce the number of terms in the expressions for BOPs, and employs simultaneous interpolation of normalised associated Legendre polynomials and trigonometric functions to dramatically reduce the total number of arithmetic operations....
Expedited Gradient-Based Design Closure of Antennas Using Variable-Resolution Simulations and Sparse Sensitivity Updates
PublicationNumerical optimization has been playing an increasingly important role in the design of contemporary antenna systems. Due to the shortage of design-ready theoretical models, optimization is mainly based on electromagnetic (EM) analysis, which tends to be costly. Numerous techniques have evolved to abate this cost, including surrogate-assisted frameworks for global optimization, or sparse sensitivity updates for speeding up local...
Reduced-Cost Constrained Modeling of Microwave and Antenna Components: Recent Advances
PublicationElectromagnetic (EM) simulation models are ubiquitous in the design of microwave and antenna components. EM analysis is reliable but CPU intensive. In particular, multiple simulations entailed by parametric optimization or uncertainty quantification may considerably slow down the design processes. In order to address this problem, it is possible to employ fast metamodels. Here, the popular solution approaches are approximation...
Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA
PublicationCelem niniejszej rozprawy oraz stypendium odbytego w ramach projektu było opracowanie numerycznie efektywnego rozwiązania algorytmicznego i sprzętowego, które umożliwia przyspieszenie analizy problemów elektromagnetycznych metodą elementów skończonych (MES) z funkcjami bazowymi wysokiego rzędu. Metoda elementów skończonych w dziedzinie częstotliwości stanowi wydajne i uniwersalne narzędzie analizy układów mikrofalowych (rys....
Two-Stage Variable-Fidelity Modeling of Antennas with Domain Confinement
PublicationSurrogate modeling has become the method of choice in solving an increasing number of antenna design tasks, especially those involving expensive full-wave electromagnetic (EM) simulations. Notwithstanding, the curse of dimensionality considerably affects conventional metamodeling methods, and their capability to efficiently handle nonlinear antenna characteristics over broad ranges of the system parameters is limited. Performance-driven...
A distributed system for conducting chess games in parallel
PublicationThis paper proposes a distributed and scalable cloud based system designed to play chess games in parallel. Games can be played between chess engines alone or between clusters created by combined chess engines. The system has a built-in mechanism that compares engines, based on Elo ranking which finally presents the strength of each tested approach. If an approach needs more computational power, the design of the system allows...
Triangulation-based Constrained Surrogate Modeling of Antennas
PublicationDesign of contemporary antenna structures is heavily based on full-wave electromagnetic (EM) simulation tools. They provide accuracy but are CPU-intensive. Reduction of EM-driven design procedure cost can be achieved by using fast replacement models (surrogates). Unfortunately, standard modeling techniques are unable to ensure sufficient predictive power for real-world antenna structures (multiple parameters, wide parameter ranges,...
Design-oriented computationally-efficient feature-based surrogate modelling of multi-band antennas with nested kriging
PublicationDesign of modern antenna structures heavily depends on electromagnetic (EM) simulation tools. EM analysis provides reliable evaluation of increasingly complex designs but tends to be CPU intensive. When multiple simulations are needed (e.g., for parameters tuning), the aggregated simulation cost may become a serious bottleneck. As one possible way of mitigating the issue, the recent literature fosters utilization of faster representations,...
Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn - ciało stałe
PublicationW artykule po przedstawieniu podstawowych informacji na temat technologii GPGPU oraz struktury NVIDIA CUDA opisano równania zachowania rządzące przepływami oraz ich dyskretyzację numeryczna. Następnie zbadano możliwości wykorzystania technologii GPGPU w celu zoptymalizowania czasu wykonywania obliczeń numerycznych przepływu przez ośrodek dwufazowy (płyn - cząsteczki ciała stała stałego) zbliżony do ośrodka porowatego. W tym celu,...
Fast EM-Driven Parameter Tuning of Microwave Circuits with Sparse Sensitivity Updates via Principal Directions
PublicationNumerical optimization has become more important than ever in the design of microwave components and systems, primarily as a consequence of increasing performance demands and growing complexity of the circuits. As the parameter tuning is more and more often executed using full-wave electromagnetic (EM) models, the CPU cost of the overall process tends to be excessive even for local optimization. Some ways of alleviating these issues...
Expedited Simulation-Driven Multi-Objective Design Optimization of Quasi-Isotropic Dielectric Resonator Antenna
PublicationMajority of practical engineering design problems require simultaneous handling of several criteria. Although many of design tasks can be turned into single-objective problems using sufficient formulations, in some situations, acquiring comprehensive knowledge about possible trade-offs between conflicting objectives may be necessary. This calls for multi-objective optimization that aims at identifying a set of alternative, Pareto-optimal...
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublicationIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
Reliable Surrogate Modeling of Antenna Input Characteristics by Means of Domain Confinement and Principal Components
PublicationA reliable design of contemporary antenna structures necessarily involves full-wave electromagnetic (EM) analysis which is the only tool capable of accounting, for example, for element coupling or the effects of connectors. As EM simulations tend to be CPU-intensive, surrogate modeling allows for relieving the computational overhead of design tasks that require numerous analyses, for example, parametric optimization or uncertainty...
Zastosowanie technologii GPGPU do wspomagania inżynierskich obliczeń numerycznych na przykładzie analizy przepływu przez ośrodek dwufazowy płyn-ciało stałe
PublicationW artykule po przedstawieniu podstawowych informacji na temat technologii GPGPU oraz struktury NVIDIA CUDA opisano równania zachowania rządzące przepływami oraz ich dyskretyzację numeryczna. Następnie zbadano możliwości wykorzystania technologii GPGPU w celu zoptymalizowania czasu wykonywania obliczeń numerycznych przepływu przez ośrodek dwufazowy (płyn - cząsteczki ciała stała stałego) zbliżony do ośrodka porowatego. W tym celu,...
Efficient Simulation-Based Global Antenna Optimization Using Characteristic Point Method and Nature-Inspired Metaheuristics
PublicationAntenna structures are designed nowadays to fulfil rigorous demands, including multi-band operation, where the center frequencies need to be precisely allocated at the assumed targets while improving other features, such as impedance matching. Achieving this requires simultaneous optimization of antenna geometry parameters. When considering multimodal problems or if a reasonable initial design is not at hand, one needs to rely...
Simulation-Driven Antenna Modeling by Means of Response Features and Confined Domains of Reduced Dimensionality
PublicationIn recent years, the employment of full-wave electromagnetic (EM) simulation tools has become imperative in the antenna design mainly for reliability reasons. While the CPU cost of a single simulation is rarely an issue, the computational overhead associated with EM-driven tasks that require massive EM analyses may become a serious bottleneck. A widely used approach to lessen this cost is the employment of surrogate models, especially...
A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM
PublicationThe paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...
Smaller Representation of Finite State Automata
PublicationThis paper is a follow-up to Jan Daciuk's experiments on space-effcient finite state automata representation that can be used directly for traversals in main memory. We investigate several techniques of reducing memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve a gain of around 20-30%...
Expedited EM-Driven Design of Miniaturized Microwave Hybrid Couplers Using Surrogate-Based Optimization
PublicationMiniaturization of microwave hybrid couplers is important for contemporary wireless communication engineering. Using standard computer-aided design methods for development of compact structures is extremely challenging due to a general lack of computationally efficient and accurate simulation models. Poor accuracy of available equivalent circuits results from neglecting parasitic cross-couplings that greatly affect the performance...
Surrogate-assisted EM-driven miniaturization of wideband microwave couplers by means of co-simulation low-fidelity models
PublicationThis article proposes a methodology for rapid design optimization of miniaturized wideband couplers. More specifically, a class of circuits is considered, in which conventional transmission lines are replaced by their abbreviated counterparts referred to as slow-wave compact cells. Our focus is on explicit reduction of the structure size as well as on reducing the CPU cost of the design process. For the sake of computational feasibility,...
Reduced-Cost Microwave Modeling Using Constrained Domains and Dimensionality Reduction
PublicationDevelopment of modern microwave devices largely exploits full-wave electromagnetic (EM) simulations. Yet, simulation-driven design may be problematic due to the incurred CPU expenses. Addressing the high-cost issues stimulated the development of surrogate modeling methods. Among them, data-driven techniques seem to be the most widespread owing to their flexibility and accessibility. Nonetheless, applicability of approximation-based...
Rapid Multi-Criterial Antenna Optimization by Means of Pareto Front Triangulation and Interpolative Design Predictors
PublicationModern antenna systems are designed to meet stringent performance requirements pertinent to both their electrical and field properties. The objectives typically stay in conflict with each other. As the simultaneous improvement of all performance parameters is rarely possible, compromise solutions have to be sought. The most comprehensive information about available design trade-offs can be obtained through multi-objective optimization...
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
PublicationThe paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...
Implementation of Addition and Subtraction Operations in Multiple Precision Arithmetic
PublicationIn this paper, we present a digital circuit of arithmetic unit implementing addition and subtraction operations in multiple-precision arithmetic (MPA). This adder-subtractor unit is a part of MPA coprocessor supporting and offloading the central processing unit (CPU) in computations requiring precision higher than 32/64 bits. Although addition and subtraction operations of two n-digit numbers require O(n) operations, the efficient...
Reduced-cost optimization-based miniaturization of microwave passives by multi-resolution EM simulations for internet of things and space-limited applications
PublicationStringent performance specifications along with constraints imposed on physical dimensions, make the design of contemporary microwave components a truly onerous task. In recent years, the latter demand has been growing in importance, with the innovative application areas such as Internet of Things coming into play. The need to employ full-wave electromagnetic (EM) simu-lations for response evaluation, reliable yet CPU heavy, only...
Stanowisko badawczo-dydaktyczne "Ploter 3-osiowy"
Research Equipment -
IP Core of Coprocessor for Multiple-Precision-Arithmetic Computations
PublicationIn this paper, we present an IP core of coprocessor supporting computations requiring integer multiple-precision arithmetic (MPA). Whilst standard 32/64-bit arithmetic is sufficient to solve many computing problems, there are still applications that require higher numerical precision. Hence, the purpose of the developed coprocessor is to support and offload central processing unit (CPU) in such computations. The developed digital...
Variable‐fidelity modeling of antenna input characteristics using domain confinement and two‐stage Gaussian process regression surrogates
PublicationThe major bottleneck of electromagnetic (EM)-driven antenna design is the high CPU cost of massive simulations required by parametric optimization, uncertainty quantification, or robust design procedures. Fast surrogate models may be employed to mitigate this issue to a certain extent. Unfortunately, the curse of dimensionality is a serious limiting factor, hindering the construction of conventional data-driven models valid over...
Smaller representation of finite state automata
PublicationThis paper is a follow-up to Jan Daciuk's experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory (Daciuk, 2000)[4]. We investigate several techniques for reducing memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve...
Advanced Potential Energy Surfaces for Molecular Simulation
PublicationAdvanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models...
DL_MG: A Parallel Multigrid Poisson and Poisson–Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution
PublicationThe solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential -- a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the...
Variable Resolution Machine Learning Optimization of Antennas Using Global Sensitivity Analysis
PublicationThe significance of rigorous optimization techniques in antenna engineering has grown significantly in recent years. For many design tasks, parameter tuning must be conducted globally, presenting a challenge due to associated computational costs. The popular bio-inspired routines often necessitate thousands of merit function calls to converge, generating prohibitive expenses whenever the design process relies on electromagnetic...
Optimization of Microwave Components Using Machine Learning and Rapid Sensitivity Analysis
PublicationRecent years have witnessed a tremendous popularity growth of optimization methods in high-frequency electronics, including microwave design. With the increasing complexity of passive microwave components, meticulous tuning of their geometry parameters has become imperative to fulfill demands imposed by the diverse application areas. More and more often, achieving the best possible performance requires global optimization. Unfortunately,...
Expedited Re-Design of Multi-Band Passive Microwave Circuits Using Orthogonal Scaling Directions and Gradient-Based Tuning
PublicationGeometry scaling of microwave circuits is an essential but challenging task. In particular, the employment of a given passive structure in a different application area often requires re-adjustment of the operating frequencies/bands while maintaining top performance. Achieving this necessitates utilization of numerical optimization methods. Nonetheless, if the intended frequencies are distant from the ones at the starting point,...