Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING
-
GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method
PublicationThis paper discusses a strategy for speeding up the mesh deformation process in the design-byoptimization of high-frequency components involving electromagnetic field simulations using the 3D finite element method (FEM). The mesh deformation is assumed to be described by a linear elasticity model of a rigid body; therefore, each time the shape of the device is changed, an auxiliary elasticity finite-element problem must be solved....
-
An MOR Algorithm Based on the Immittance Zero and Pole Eigenvectors for Fast FEM Simulations of Two-Port Microwave Structures
PublicationThe aim of this article is to present a novel model-order reduction (MOR) algorithm for fast finite-element frequency-domain simulations of microwave two-port structures. The projection basis used to construct the reduced-order model (ROM) comprises two sets: singular vectors and regular vectors. The first set is composed of the eigenvectors associated with the poles of the finite-element method (FEM) state-space system, while...
-
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
PublicationIn the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...
-
Kamil Andrzej Rybacki mgr inż.
PeopleBorn on 23 October 1993 in Gdańsk. In 2017, I have received the M.Sc. Degree at the Faculty of Applied Physics and Mathematics, Gdańsk University of Technology, Poland. My main fields of interest include computer simulations of molecular systems, parallel computing in application to computational physics methods and development of various simulation software. Currently, my research is focused on the development of hybrid Molecular...
-
International Parallel Computing Workshop
Conferences -
PODEJŚCIE WARIANTOWE WE WSTĘPNYM PROJEKTOWANIU STATKÓW Variant methods approach to the preliminary ship design.
PublicationKlasyczna metoda projektowania okrętów jest metodą iteracyjną, bazującą na zgromadzonym doświadczeniu ze statków już zbu-dowanych. Natomiast w przypadku statku całkowicie nowego typu, bez „posagu wcześniejszych doświadczeń”, projektowanie polega na opracowaniu szeregu równoległych, wariantowych rozwiązań z wykorzystaniem optymalizacji. Artykuł wskazuje wybrane metody projektowe wykorzystujące optymalizacje, używane we wstępnym...
-
Nowoczesne koncepcje integracji usług w systemie BeesyCluster
PublicationOpisano funkcje aktualnej wersji systemu BeesyCluster jakowarstwy pośredniej w dostępie do rozproszonych zasobów wraz podsystemami integracji usług, wyboru usług oraz ich wykonania. Zaprezentowano rozszerzenia podsystemu integracji usług zorientowane na green computing. Omówiono problemy inteligentnego wyszukiwania usług, wykorzystanie GPU, współpracę z urządzeniami mobilnymi oraz przetwarzanie w przestrzeniach inteligentnych.Dodatkowo...
-
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublicationIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
-
Online sound restoration system for digital library applications
PublicationAudio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
-
Mobile devices and computing cloud resources allocation for interactive applications
PublicationUsing mobile devices such as smartphones or iPads for various interactive applications is currently very common. In the case of complex applications, e.g. chess games, the capabilities of these devices are insufficient to run the application in real time. One of the solutions is to use cloud computing. However, there is an optimization problem of mobile device and cloud resources allocation. An iterative heuristic algorithm for...
-
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
PublicationIn the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...
-
Drawing maps with advice
PublicationRozważamy następujący problem obliczeniowy. Agent zostaje umieszczony w wierzchołku nieznanego mu grafu. Wierzchołki grafu są nierozróżnialne, natomiast krawędzie posiadają numery portów. Zadaniem agenta jest wyznaczenie mapy, tzn. obliczenie izomorficznej kopii grafu, lub obliczenie dowolnego drzewa spinającego grafu. Bez dodatkowej informacji zadań tych nie można wykonać. W artykule wyznaczamy oszacowania na minimalną liczbę...
-
Generation of large finite-element matrices on multiple graphics processors
PublicationThis paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...
-
International Symposium on Parallel and Distributed Computing
Conferences -
Online sound restoration system for digital library applications.
PublicationAudio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
-
Further Developments of the Online Sound Restoration System for Digital Library Applications
PublicationNew signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublicationThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym
PublicationA method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...
-
On the influence of shell element properties on the response of car model in crash test
PublicationIt goes without saying that numerical simulations play important role in the modern engineering practice. Contemporary CAD environments combined with FEM solvers, along with computer power of modern processors, give the engineer fast and efficient tool. Ultimately, however it is the user alone who is responsible for the correctness of the results. As long as the FEM calculations remain in the sphere of academic exercise, the inevitable...
-
The impact of the AC922 Architecture on Performance of Deep Neural Network Training
PublicationPractical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report...
-
An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method
PublicationIn this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in...
-
International European Conference on Parallel and Distributed Computing
Conferences -
IFIP International Conference on Network and Parallel Computing
Conferences -
International Conference on Massively Parallel Computing Systems
Conferences -
Australasian Symposium on Parallel and Distributed Computing (was AusGrid)
Conferences -
General Provisioning Strategy for Local Specialized Cloud Computing Environments
PublicationThe well-known management strategies in cloud computing based on SLA requirements are considered. A deterministic parallel provisioning algorithm has been prepared and used to show its behavior for three different requirements: load balancing, consolidation, and fault tolerance. The impact of these strategies on the total execution time of different sets of services is analyzed for randomly chosen sets of data. This makes it possible...
-
Performance Analysis of the OpenCL Environment on Mobile Platforms
PublicationToday’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...
-
A Stand for Measurement and Prediction of Scattering Properties of Diffusers
PublicationIn this paper we present a set of solutions which may be used for prototyping and simulation of acoustic scattering devices. A system proposed is capable of measuring sound field. Also a way to use an open source solution for simulation of scattering phenomena occurring in proximity of acoustic diffusers is shown. The result of our work are measurement procedure and a prototype of the simulation script based on FEniCS - an open source...
-
Influence of nonlinearities on the efficiency and accuracy of FEM calculations on the example of a steel build-up thin-walled column
PublicationDue to the increase of computing capabilities of standard processing units, it is possible to perform complex analyses, considering a number of nonlinearities, such as geometric, material and boundary (contact) even on personal computers. In the paper, the authors have analysed the efficiency and accuracy of standard PC’s FEM calculations performed in Abaqus CAE 2017 software on the example of a critical load assessment of a thin-walled...
-
Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
PublicationThe paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...
-
The Quick Measure of a Nurbs Surface Curvature for Accurate Triangular Meshing
PublicationNURBS surfaces are the most widely used surfaces for three-dimensional models in CAD/CAE programs. As a model for FEM calculation is prepared with a CAD program it is inevitable to mesh it finally. There are many algorithms for meshing planar regions. Some of them may be used for meshing surfaces but it is necessary to take the curvature of the surface under consideration to avoid poor quality mesh. The mesh must be denser in the...
-
Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems
PublicationRapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...
-
A method to determine the tightening sequence for standing rigging of a mast
PublicationThe article proposes an alternative method to determine the sequence of generation of pre-tension forces in standing rigging of a mast. The proposed approach has been verified on both a virtual simulation experiment and laboratory tests. In this method, the desired tension values are obtained using the influence matrix which allows to calculate the effect of tension change in an individual rope on the tension distribution in the...
-
International Conference on Parallel and Distributed Computing, Applications and Technologies
Conferences -
A memory efficient and fast sparse matrix vector product on a Gpu
PublicationThis paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...
-
Three levels of fail-safe mode in MPI I/O NVRAM distributed cache
PublicationThe paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which...
-
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
PublicationWe report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...
-
Simulating propagation of coherent light in random media using the Fredholm type integral equation
PublicationStudying propagation of light in random scattering materials is important for both basic and applied research. Such studies often require usage of numerical method for simulating behavior of light beams in random media. However, if such simulations require consideration of coherence properties of light, they may become a complex numerical problems. There are well established methods for simulating multiple scattering of light (e.g....
-
Euro-Par: International European Conference on Parallel and Distributed Computing
Conferences -
MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems
PublicationIn this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...
-
Network-aware Data Prefetching Optimization of Computations in a Heterogeneous HPC Framework
PublicationRapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...
-
Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors
PublicationIn the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...
-
Video Analytics-Based Algorithm for Monitoring Egress from Buildings
PublicationA concept and practical implementation of the algorithm for detecting of potentially dangerous situations of crowding in passages is presented. An example of such situation is a crush which may be caused by obstructed pedestrian pathway. Surveillance video camera signal analysis performed on line is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of implemented algorithm which uses...
-
Use of ICT infrastructure for teaching HPC
PublicationIn this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...
-
New potential functions for greedy independence and coloring
PublicationA potential function $f_G$ of a finite, simple and undirected graph $G=(V,E)$ is an arbitrary function $f_G : V(G) \rightarrow \mathbb{N}_0$ that assigns a nonnegative integer to every vertex of a graph $G$. In this paper we define the iterative process of computing the step potential function $q_G$ such that $q_G(v)\leq d_G(v)$ for all $v\in V(G)$. We use this function in the development of new Caro-Wei-type and Brooks-type...
-
Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations
PublicationBecause of high stability and slow unfolding rates of G-quadruplexes (G4), cells have evolved specialized helicases that disrupt these non-canonical DNA and RNA structures in an ATP-dependent manner. One example is DHX36, a DEAH-box helicase, which participates in gene expression and replication by recognizing and unwinding parallel G4s. Here, we studied the molecular basis for the high affinity and specificity of DHX36 for parallel-type...
-
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
PublicationHigh-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...
-
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
PublicationIn this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....
-
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
PublicationThe paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...
-
International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing
Conferences