Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING

Drawing maps with advice

Publication

D. Dereniowski
A. Pelc

- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING - Year 2012

Rozważamy następujący problem obliczeniowy. Agent zostaje umieszczony w wierzchołku nieznanego mu grafu. Wierzchołki grafu są nierozróżnialne, natomiast krawędzie posiadają numery portów. Zadaniem agenta jest wyznaczenie mapy, tzn. obliczenie izomorficznej kopii grafu, lub obliczenie dowolnego drzewa spinającego grafu. Bez dodatkowej informacji zadań tych nie można wykonać. W artykule wyznaczamy oszacowania na minimalną liczbę...

Full text to download in external service

Generation of large finite-element matrices on multiple graphics processors

Publication

- INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING - Year 2013

This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...

Full text to download in external service

Further Developments of the Online Sound Restoration System for Digital Library Applications

Publication

- Year 2014

New signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...

Full text to download in external service

Online sound restoration system for digital library applications.

Publication

- Journal of the Acoustical Society of America - Year 2013

Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...

Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC

Publication

P. Czarnul

- Year 2002

This work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...

Full text to download in external service

Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym

Publication

- Year 2015

A method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...

On the influence of shell element properties on the response of car model in crash test

Publication

- Year 2017

It goes without saying that numerical simulations play important role in the modern engineering practice. Contemporary CAD environments combined with FEM solvers, along with computer power of modern processors, give the engineer fast and efficient tool. Ultimately, however it is the user alone who is responsible for the correctness of the results. As long as the FEM calculations remain in the sphere of academic exercise, the inevitable...

Full text available to download

The impact of the AC922 Architecture on Performance of Deep Neural Network Training

Publication

- Year 2020

Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report...

Full text to download in external service

An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method

Publication

- RADIOENGINEERING - Year 2014

In this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in...

Full text available to download

General Provisioning Strategy for Local Specialized Cloud Computing Environments

Publication

- Year 2023

The well-known management strategies in cloud computing based on SLA requirements are considered. A deterministic parallel provisioning algorithm has been prepared and used to show its behavior for three different requirements: load balancing, consolidation, and fault tolerance. The impact of these strategies on the total execution time of different sets of services is analyzed for randomly chosen sets of data. This makes it possible...

Full text available to download

Performance Analysis of the OpenCL Environment on Mobile Platforms

Publication

- Year 2022

Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Full text to download in external service

A Stand for Measurement and Prediction of Scattering Properties of Diffusers

Publication

- Year 2018

In this paper we present a set of solutions which may be used for prototyping and simulation of acoustic scattering devices. A system proposed is capable of measuring sound field. Also a way to use an open source solution for simulation of scattering phenomena occurring in proximity of acoustic diffusers is shown. The result of our work are measurement procedure and a prototype of the simulation script based on FEniCS - an open source...

Full text to download in external service

Influence of nonlinearities on the efficiency and accuracy of FEM calculations on the example of a steel build-up thin-walled column

Publication

- MATEC Web of Conferences - Year 2018

Due to the increase of computing capabilities of standard processing units, it is possible to perform complex analyses, considering a number of nonlinearities, such as geometric, material and boundary (contact) even on personal computers. In the paper, the authors have analysed the efficiency and accuracy of standard PC’s FEM calculations performed in Abaqus CAE 2017 software on the example of a critical load assessment of a thin-walled...

Full text available to download

Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors

Publication

P. Czarnul

- Year 2018

The paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...

Full text to download in external service

The Quick Measure of a Nurbs Surface Curvature for Accurate Triangular Meshing

Publication

A. Kniat

- Polish Maritime Research - Year 2014

NURBS surfaces are the most widely used surfaces for three-dimensional models in CAD/CAE programs. As a model for FEM calculation is prepared with a CAD program it is inevitable to mesh it finally. There are many algorithms for meshing planar regions. Some of them may be used for meshing surfaces but it is necessary to take the curvature of the surface under consideration to avoid poor quality mesh. The mesh must be denser in the...

Full text available to download

Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems

Publication

P. Rościszewski

- Year 2014

Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...

Full text to download in external service

A method to determine the tightening sequence for standing rigging of a mast

Publication

- Polish Maritime Research - Year 2019

The article proposes an alternative method to determine the sequence of generation of pre-tension forces in standing rigging of a mast. The proposed approach has been verified on both a virtual simulation experiment and laboratory tests. In this method, the desired tension values are obtained using the influence matrix which allows to calculate the effect of tension change in an individual rope on the tension distribution in the...

Full text available to download

A memory efficient and fast sparse matrix vector product on a Gpu

Publication

- Progress in Electromagnetics Research-PIER - Year 2011

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Full text to download in external service

Three levels of fail-safe mode in MPI I/O NVRAM distributed cache

Publication

- Procedia Computer Science - Year 2018

The paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which...

Full text available to download

Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins

Publication

A. Sieradzan
J. Sans‐Duñó
E. Lubecka
C. Czaplewski
A. Lipska
H. Leszczyński
K. Ocetkiewicz
J. Proficz
P. Czarnul
H. Krawczyk
A. Liwo

- JOURNAL OF COMPUTATIONAL CHEMISTRY - Year 2023

We report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...

Full text available to download

Simulating propagation of coherent light in random media using the Fredholm type integral equation

Publication

- Year 2017

Studying propagation of light in random scattering materials is important for both basic and applied research. Such studies often require usage of numerical method for simulating behavior of light beams in random media. However, if such simulations require consideration of coherence properties of light, they may become a complex numerical problems. There are well established methods for simulating multiple scattering of light (e.g....

Full text available to download

MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems

Publication

- SIMULATION MODELLING PRACTICE AND THEORY - Year 2017

In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...

Full text available to download

Network-aware Data Prefetching Optimization of Computations in a Heterogeneous HPC Framework

Publication

P. Rościszewski

- International Journal of Computer Networks & Communications (IJCNC) - Year 2014

Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...

Full text available to download

Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors

Publication

- International Journal of Computer Information Systems and Industrial Management Applications - Year 2020

In the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...

Full text to download in external service

Video Analytics-Based Algorithm for Monitoring Egress from Buildings

Publication

- Year 2013

A concept and practical implementation of the algorithm for detecting of potentially dangerous situations of crowding in passages is presented. An example of such situation is a crush which may be caused by obstructed pedestrian pathway. Surveillance video camera signal analysis performed on line is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of implemented algorithm which uses...

Full text to download in external service

Use of ICT infrastructure for teaching HPC

Publication

- Year 2019

In this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...

Full text to download in external service

New potential functions for greedy independence and coloring

Publication

P. Borowiecki
D. Rautenbach

- DISCRETE APPLIED MATHEMATICS - Year 2015

A potential function $f_G$ of a finite, simple and undirected graph $G=(V,E)$ is an arbitrary function $f_G : V(G) \rightarrow \mathbb{N}_0$ that assigns a nonnegative integer to every vertex of a graph $G$. In this paper we define the iterative process of computing the step potential function $q_G$ such that $q_G(v)\leq d_G(v)$ for all $v\in V(G)$. We use this function in the development of new Caro-Wei-type and Brooks-type...

Full text available to download

Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations

Publication

- Computational and Structural Biotechnology Journal - Year 2021

Because of high stability and slow unfolding rates of G-quadruplexes (G4), cells have evolved specialized helicases that disrupt these non-canonical DNA and RNA structures in an ATP-dependent manner. One example is DHX36, a DEAH-box helicase, which participates in gene expression and replication by recognizing and unwinding parallel G4s. Here, we studied the molecular basis for the high affinity and specificity of DHX36 for parallel-type...

Full text available to download

Energy-Aware Scheduling for High-Performance Computing Systems: A Survey

Publication

- ENERGIES - Year 2023

High-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...

Full text available to download

A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU

Publication

- SIAM JOURNAL ON SCIENTIFIC COMPUTING - Year 2015

In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....

Full text to download in external service

Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications

Publication

P. Czarnul

- Electronics - Year 2021

The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Full text available to download

Testing of the longest span soil-steel bridge in Europe – new quality in measurements

Publication

- Year 2021

The article describes interdisciplinary and comprehensive diagnostic tests of final bridge inspection and acceptance proposed for a soil – steel bridge made of corrugated sheets, being the European span length record holder (25.74 m). As an effect of an original concept a detailed and precise information about the structure response was collected. The load test design was based on the nonlinear numerical simulations performed by...

Full text to download in external service

Behavior Analysis and Dynamic Crowd Management in Video Surveillance System

Publication

- Year 2011

A concept and practical implementation of a crowd management system which acquires input data by the set of monitoring cameras is presented. Two leading threads are considered. First concerns the crowd behavior analysis. Second thread focuses on detection of a hold-ups in the doorway. The optical flow combined with soft computing methods (neural network) is employed to evaluate the type of crowd behavior, and fuzzy logic aids detection...

Full text to download in external service

Distributed NVRAM Cache – Optimization and Evaluation with Power of Adjacency Matrix

Publication

- Year 2017

In this paper we build on our previously proposed MPI I/O NVRAM distributed cache for high performance computing. In each cluster node it incorporates NVRAMs which are used as an intermediate cache layer between an application and a file for fast read/write operations supported through wrappers of MPI I/O functions. In this paper we propose optimizations of the solution including handling of write requests with a synchronous mode,...

Full text to download in external service

Modelling and simulation of GPU processing in the MERPSYS environment

Publication

- Scalable Computing: Practice and Experience - Year 2018

In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...

Full text available to download

Nieliniowa statyka 6-parametrowych powłok sprężysto plastycznych. Efektywne obliczenia MES

Publication

S. Burzyński

- Year 2021

Głównym zagadnieniem omawianym w monografii jest sformułowanie sprężysto-plastycznego prawa konstytutywnego w nieliniowej 6-parametrowej teorii powłok. Wyróżnikiem tej teorii jest występujący w niej w naturalny sposób tzw. stopień 6 swobody, czyli owinięcie (drilling rotation). Podstawowe założenie pracy to przyjęcie płaskiego stanu naprężenia uogólnionego na ośrodek typu Cosseratów. Takie podejście stanowi oryginalny aspekt opracowania....

Full text to download in external service

Acceleration of Electromagnetic Simulations on Reconfigurable FPGA Card

Publication

T. Topa
A. Noga
T. Stefański

- Year 2023

In this contribution, the hardware acceleration of electromagnetic simulations on the reconfigurable field-programmable-gate-array (FPGA) card is presented. In the developed implementation of scientific computations, the matrix-assembly phase of the method of moments (MoM) is accelerated on the Xilinx Alveo U200 card. The computational method involves discretization of the frequency-domain mixed potential integral equation using...

Full text to download in external service

Advanced Potential Energy Surfaces for Molecular Simulation

Publication

A. Albaugh
H. Boateng
R. Bradshaw
O. Demerdash
J. Dziedzic
Y. Mao
D. Margul
J. Swails
Q. Zeng
D. Case... and 10 others

- JOURNAL OF PHYSICAL CHEMISTRY B - Year 2016

Advanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models...

Full text available to download

NUMERICAL ESTIMATION OF HULL HYDRODYNAMIC DERIVATIVES IN SHIP MANOUVERING PREDICTION

Publication

R. Kołodziej
P. Hoffmann

- Year 2021

Operating in crowded waterways pose a risk of accidents and disasters due to maneuvering limitations of the ship. In order to predict ship’s maneuvering characteristics at the design stage, model tests are often executed as the most accurate prediction tool. Two approaches can be distinguished here: free running model tests and numerical simulations based on planar motion model with the use of hydrodynamic derivatives obtained...

Full text to download in external service

Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications

Publication

- JOURNAL OF SUPERCOMPUTING - Year 2017

The aim of this paper is to evaluate performance of new CUDA mechanisms—unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically,...

Full text available to download

Molecular dynamics simulations reveal the balance of forces governing the formation of a guanine tetrad—a common structural unit of G-quadruplex DNA

Publication

- NUCLEIC ACIDS RESEARCH - Year 2016

G-quadruplexes (G4) are nucleic acid conformations of guanine-rich sequences, in which guanines are arranged in the square-planar G-tetrads, stacked on one another. G4 motifs form in vivo and are implicated in regulation of such processes as gene expression and chromosome maintenance. The structure and stability of various G4 topologies were determined experimentally; however, the driving forces for their formation are not fully...

Full text available to download

A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache

Publication

A. Malinowski
P. Czarnul
P. Dorożyński
K. Czuryło
Ł. Dorau
M. Maciejewski
P. Skowron

- Annals of Computer Science and Information Systems - Year 2016

While many scientiﬁc, large-scale applications are data-intensive, fast and efﬁcient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind...

Full text available to download

DL_MG: A Parallel Multigrid Poisson and Poisson–Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution

Publication

J. Womack
L. Anton
J. Dziedzic
P. Hasnip
M. Probert
C. Skylaris

- Journal of Chemical Theory and Computation - Year 2018

The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential -- a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the...

Full text available to download

Wpływ kontekstu na efektywność wykonania interaktywnych aplikacji iteracyjnych w dedykowanej przestrzeni usług

Publication

S. Nasiadka

- Year 2013

Tematyka rozprawy dotyczy aplikacji kontekstowych wykonywanych w środowisku czasu rzeczywistego typu *pervasive computing*. To środowisko nazywane jest przestrzenią inteligentną a aplikacje w niej wykonywane określane są jako Interaktywne Aplikacje Iteracyjne (IAI). IAI analizuje w sposób ciągły sytuacje (wyrażone przez kontekst) zachodzące w przestrzeni i w zależności od bieżącego kontekstu podejmuje określone działania. W skład...

Sensitivity of the Baltic Sea level prediction to spatial model resolution

Publication

M. Kowalewski

- Year 2017

he three-dimensional hydrodynamic model of the Baltic Sea (M3D) and...

Full text to download in external service

DATABASE AND BIGDATA PROCESSING SYSTEM FOR ANALYSIS OF AIS MESSAGES IN THE NETBALTIC RESEARCH PROJECT

Publication

M. Lewczuk
P. Cichocki
J. Woźniak

- TASK Quarterly - Year 2017

A specialized database and a software tool for graphical and numerical presentation of maritime measurement results has been designed and implemented as part of the research conducted under the netBaltic project (Internet over the Baltic Sea – the implementation of a multi-system, self-organizing broadband communications network over the sea for enhancing navigation safety through the development of e-navigation services.) The...

Full text available to download

Processing of Satellite Data in the Cloud

Publication

- TASK Quarterly - Year 2017

The dynamic development of digital technologies, especially those dedicated to devices generating large data streams, such as all kinds of measurement equipment (temperature and humidity sensors, cameras, radio-telescopes and satellites – Internet of Things) enables more in-depth analysis of the surrounding reality, including better understanding of various natural phenomenon, starting from atomic level reactions, through macroscopic...

Full text available to download

Search

Filters

Catalog

Category

Year

Options

Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING