Filters
total: 121
filtered: 97
Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING
-
Drawing maps with advice
PublicationRozważamy następujący problem obliczeniowy. Agent zostaje umieszczony w wierzchołku nieznanego mu grafu. Wierzchołki grafu są nierozróżnialne, natomiast krawędzie posiadają numery portów. Zadaniem agenta jest wyznaczenie mapy, tzn. obliczenie izomorficznej kopii grafu, lub obliczenie dowolnego drzewa spinającego grafu. Bez dodatkowej informacji zadań tych nie można wykonać. W artykule wyznaczamy oszacowania na minimalną liczbę...
-
Generation of large finite-element matrices on multiple graphics processors
PublicationThis paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...
-
Further Developments of the Online Sound Restoration System for Digital Library Applications
PublicationNew signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...
-
Online sound restoration system for digital library applications.
PublicationAudio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublicationThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym
PublicationA method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...
-
On the influence of shell element properties on the response of car model in crash test
PublicationIt goes without saying that numerical simulations play important role in the modern engineering practice. Contemporary CAD environments combined with FEM solvers, along with computer power of modern processors, give the engineer fast and efficient tool. Ultimately, however it is the user alone who is responsible for the correctness of the results. As long as the FEM calculations remain in the sphere of academic exercise, the inevitable...
-
The impact of the AC922 Architecture on Performance of Deep Neural Network Training
PublicationPractical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report...
-
An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method
PublicationIn this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in...
-
General Provisioning Strategy for Local Specialized Cloud Computing Environments
PublicationThe well-known management strategies in cloud computing based on SLA requirements are considered. A deterministic parallel provisioning algorithm has been prepared and used to show its behavior for three different requirements: load balancing, consolidation, and fault tolerance. The impact of these strategies on the total execution time of different sets of services is analyzed for randomly chosen sets of data. This makes it possible...
-
Performance Analysis of the OpenCL Environment on Mobile Platforms
PublicationToday’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...
-
A Stand for Measurement and Prediction of Scattering Properties of Diffusers
PublicationIn this paper we present a set of solutions which may be used for prototyping and simulation of acoustic scattering devices. A system proposed is capable of measuring sound field. Also a way to use an open source solution for simulation of scattering phenomena occurring in proximity of acoustic diffusers is shown. The result of our work are measurement procedure and a prototype of the simulation script based on FEniCS - an open source...
-
Influence of nonlinearities on the efficiency and accuracy of FEM calculations on the example of a steel build-up thin-walled column
PublicationDue to the increase of computing capabilities of standard processing units, it is possible to perform complex analyses, considering a number of nonlinearities, such as geometric, material and boundary (contact) even on personal computers. In the paper, the authors have analysed the efficiency and accuracy of standard PC’s FEM calculations performed in Abaqus CAE 2017 software on the example of a critical load assessment of a thin-walled...
-
Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
PublicationThe paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...
-
The Quick Measure of a Nurbs Surface Curvature for Accurate Triangular Meshing
PublicationNURBS surfaces are the most widely used surfaces for three-dimensional models in CAD/CAE programs. As a model for FEM calculation is prepared with a CAD program it is inevitable to mesh it finally. There are many algorithms for meshing planar regions. Some of them may be used for meshing surfaces but it is necessary to take the curvature of the surface under consideration to avoid poor quality mesh. The mesh must be denser in the...
-
Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems
PublicationRapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...
-
A method to determine the tightening sequence for standing rigging of a mast
PublicationThe article proposes an alternative method to determine the sequence of generation of pre-tension forces in standing rigging of a mast. The proposed approach has been verified on both a virtual simulation experiment and laboratory tests. In this method, the desired tension values are obtained using the influence matrix which allows to calculate the effect of tension change in an individual rope on the tension distribution in the...
-
A memory efficient and fast sparse matrix vector product on a Gpu
PublicationThis paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...
-
Three levels of fail-safe mode in MPI I/O NVRAM distributed cache
PublicationThe paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which...
-
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
PublicationWe report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...
-
Simulating propagation of coherent light in random media using the Fredholm type integral equation
PublicationStudying propagation of light in random scattering materials is important for both basic and applied research. Such studies often require usage of numerical method for simulating behavior of light beams in random media. However, if such simulations require consideration of coherence properties of light, they may become a complex numerical problems. There are well established methods for simulating multiple scattering of light (e.g....
-
MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems
PublicationIn this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...
-
Network-aware Data Prefetching Optimization of Computations in a Heterogeneous HPC Framework
PublicationRapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...
-
Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors
PublicationIn the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...
-
Video Analytics-Based Algorithm for Monitoring Egress from Buildings
PublicationA concept and practical implementation of the algorithm for detecting of potentially dangerous situations of crowding in passages is presented. An example of such situation is a crush which may be caused by obstructed pedestrian pathway. Surveillance video camera signal analysis performed on line is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of implemented algorithm which uses...
-
Use of ICT infrastructure for teaching HPC
PublicationIn this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...
-
New potential functions for greedy independence and coloring
PublicationA potential function $f_G$ of a finite, simple and undirected graph $G=(V,E)$ is an arbitrary function $f_G : V(G) \rightarrow \mathbb{N}_0$ that assigns a nonnegative integer to every vertex of a graph $G$. In this paper we define the iterative process of computing the step potential function $q_G$ such that $q_G(v)\leq d_G(v)$ for all $v\in V(G)$. We use this function in the development of new Caro-Wei-type and Brooks-type...
-
Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations
PublicationBecause of high stability and slow unfolding rates of G-quadruplexes (G4), cells have evolved specialized helicases that disrupt these non-canonical DNA and RNA structures in an ATP-dependent manner. One example is DHX36, a DEAH-box helicase, which participates in gene expression and replication by recognizing and unwinding parallel G4s. Here, we studied the molecular basis for the high affinity and specificity of DHX36 for parallel-type...
-
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
PublicationHigh-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...
-
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
PublicationIn this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....
-
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
PublicationThe paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...
-
Testing of the longest span soil-steel bridge in Europe – new quality in measurements
PublicationThe article describes interdisciplinary and comprehensive diagnostic tests of final bridge inspection and acceptance proposed for a soil – steel bridge made of corrugated sheets, being the European span length record holder (25.74 m). As an effect of an original concept a detailed and precise information about the structure response was collected. The load test design was based on the nonlinear numerical simulations performed by...
-
Behavior Analysis and Dynamic Crowd Management in Video Surveillance System
PublicationA concept and practical implementation of a crowd management system which acquires input data by the set of monitoring cameras is presented. Two leading threads are considered. First concerns the crowd behavior analysis. Second thread focuses on detection of a hold-ups in the doorway. The optical flow combined with soft computing methods (neural network) is employed to evaluate the type of crowd behavior, and fuzzy logic aids detection...
-
Distributed NVRAM Cache – Optimization and Evaluation with Power of Adjacency Matrix
PublicationIn this paper we build on our previously proposed MPI I/O NVRAM distributed cache for high performance computing. In each cluster node it incorporates NVRAMs which are used as an intermediate cache layer between an application and a file for fast read/write operations supported through wrappers of MPI I/O functions. In this paper we propose optimizations of the solution including handling of write requests with a synchronous mode,...
-
Modelling and simulation of GPU processing in the MERPSYS environment
PublicationIn this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...
-
Nieliniowa statyka 6-parametrowych powłok sprężysto plastycznych. Efektywne obliczenia MES
PublicationGłównym zagadnieniem omawianym w monografii jest sformułowanie sprężysto-plastycznego prawa konstytutywnego w nieliniowej 6-parametrowej teorii powłok. Wyróżnikiem tej teorii jest występujący w niej w naturalny sposób tzw. stopień 6 swobody, czyli owinięcie (drilling rotation). Podstawowe założenie pracy to przyjęcie płaskiego stanu naprężenia uogólnionego na ośrodek typu Cosseratów. Takie podejście stanowi oryginalny aspekt opracowania....
-
Acceleration of Electromagnetic Simulations on Reconfigurable FPGA Card
PublicationIn this contribution, the hardware acceleration of electromagnetic simulations on the reconfigurable field-programmable-gate-array (FPGA) card is presented. In the developed implementation of scientific computations, the matrix-assembly phase of the method of moments (MoM) is accelerated on the Xilinx Alveo U200 card. The computational method involves discretization of the frequency-domain mixed potential integral equation using...
-
Advanced Potential Energy Surfaces for Molecular Simulation
PublicationAdvanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models...
-
NUMERICAL ESTIMATION OF HULL HYDRODYNAMIC DERIVATIVES IN SHIP MANOUVERING PREDICTION
PublicationOperating in crowded waterways pose a risk of accidents and disasters due to maneuvering limitations of the ship. In order to predict ship’s maneuvering characteristics at the design stage, model tests are often executed as the most accurate prediction tool. Two approaches can be distinguished here: free running model tests and numerical simulations based on planar motion model with the use of hydrodynamic derivatives obtained...
-
Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
PublicationThe aim of this paper is to evaluate performance of new CUDA mechanisms—unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically,...
-
Molecular dynamics simulations reveal the balance of forces governing the formation of a guanine tetrad—a common structural unit of G-quadruplex DNA
PublicationG-quadruplexes (G4) are nucleic acid conformations of guanine-rich sequences, in which guanines are arranged in the square-planar G-tetrads, stacked on one another. G4 motifs form in vivo and are implicated in regulation of such processes as gene expression and chromosome maintenance. The structure and stability of various G4 topologies were determined experimentally; however, the driving forces for their formation are not fully...
-
A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache
PublicationWhile many scientific, large-scale applications are data-intensive, fast and efficient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind...
-
DL_MG: A Parallel Multigrid Poisson and Poisson–Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution
PublicationThe solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential -- a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the...
-
Wpływ kontekstu na efektywność wykonania interaktywnych aplikacji iteracyjnych w dedykowanej przestrzeni usług
PublicationTematyka rozprawy dotyczy aplikacji kontekstowych wykonywanych w środowisku czasu rzeczywistego typu *pervasive computing*. To środowisko nazywane jest przestrzenią inteligentną a aplikacje w niej wykonywane określane są jako Interaktywne Aplikacje Iteracyjne (IAI). IAI analizuje w sposób ciągły sytuacje (wyrażone przez kontekst) zachodzące w przestrzeni i w zależności od bieżącego kontekstu podejmuje określone działania. W skład...
-
Sensitivity of the Baltic Sea level prediction to spatial model resolution
Publicationhe three-dimensional hydrodynamic model of the Baltic Sea (M3D) and...
-
DATABASE AND BIGDATA PROCESSING SYSTEM FOR ANALYSIS OF AIS MESSAGES IN THE NETBALTIC RESEARCH PROJECT
PublicationA specialized database and a software tool for graphical and numerical presentation of maritime measurement results has been designed and implemented as part of the research conducted under the netBaltic project (Internet over the Baltic Sea – the implementation of a multi-system, self-organizing broadband communications network over the sea for enhancing navigation safety through the development of e-navigation services.) The...
-
Processing of Satellite Data in the Cloud
PublicationThe dynamic development of digital technologies, especially those dedicated to devices generating large data streams, such as all kinds of measurement equipment (temperature and humidity sensors, cameras, radio-telescopes and satellites – Internet of Things) enables more in-depth analysis of the surrounding reality, including better understanding of various natural phenomenon, starting from atomic level reactions, through macroscopic...