Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING

Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING

results on page:
embed this view on your website

Filters

total: 121

clear all filters disabled

GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method
Publication
- RADIOENGINEERING - Year 2017
This paper discusses a strategy for speeding up the mesh deformation process in the design-byoptimization of high-frequency components involving electromagnetic field simulations using the 3D finite element method (FEM). The mesh deformation is assumed to be described by a linear elasticity model of a rigid body; therefore, each time the shape of the device is changed, an auxiliary elasticity finite-element problem must be solved....

Full text available to download
An MOR Algorithm Based on the Immittance Zero and Pole Eigenvectors for Fast FEM Simulations of Two-Port Microwave Structures
Publication
- G. Fotyga
- D. Szypulski
- A. Lamęcki
- P. Sypek
- M. Rewieński
- V. de la Rubia
- M. Mrozowski
- IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES - Year 2022
The aim of this article is to present a novel model-order reduction (MOR) algorithm for fast finite-element frequency-domain simulations of microwave two-port structures. The projection basis used to construct the reduced-order model (ROM) comprises two sets: singular vectors and regular vectors. The first set is composed of the eigenvectors associated with the poles of the finite-element method (FEM) state-space system, while...

Full text available to download
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
Publication
- T. M. Boiński
- P. Czarnul
- COMPUTER JOURNAL - Year 2021
In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...

Full text available to download
Kamil Andrzej Rybacki mgr inż.

People

Born on 23 October 1993 in Gdańsk. In 2017, I have received the M.Sc. Degree at the Faculty of Applied Physics and Mathematics, Gdańsk University of Technology, Poland. My main fields of interest include computer simulations of molecular systems, parallel computing in application to computational physics methods and development of various simulation software. Currently, my research is focused on the development of hybrid Molecular...
International Parallel Computing Workshop

Conferences
PODEJŚCIE WARIANTOWE WE WSTĘPNYM PROJEKTOWANIU STATKÓW Variant methods approach to the preliminary ship design.
Publication
- A. Karczewski
- J. Kozak
- Mechanik - Year 2017
Klasyczna metoda projektowania okrętów jest metodą iteracyjną, bazującą na zgromadzonym doświadczeniu ze statków już zbu-dowanych. Natomiast w przypadku statku całkowicie nowego typu, bez „posagu wcześniejszych doświadczeń”, projektowanie polega na opracowaniu szeregu równoległych, wariantowych rozwiązań z wykorzystaniem optymalizacji. Artykuł wskazuje wybrane metody projektowe wykorzystujące optymalizacje, używane we wstępnym...

Full text available to download
Nowoczesne koncepcje integracji usług w systemie BeesyCluster
Publication
- P. Czarnul
- Year 2010
Opisano funkcje aktualnej wersji systemu BeesyCluster jakowarstwy pośredniej w dostępie do rozproszonych zasobów wraz podsystemami integracji usług, wyboru usług oraz ich wykonania. Zaprezentowano rozszerzenia podsystemu integracji usług zorientowane na green computing. Omówiono problemy inteligentnego wyszukiwania usług, wykorzystanie GPU, współpracę z urządzeniami mobilnymi oraz przetwarzanie w przestrzeniach inteligentnych.Dodatkowo...
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Publication
- P. Rościszewski
- J. Kaliski
- Year 2017
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Full text to download in external service
Online sound restoration system for digital library applications
Publication
- Year 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...

Full text to download in external service
Mobile devices and computing cloud resources allocation for interactive applications
Publication
- H. Krawczyk
- M. Nykiel
- Archives of Control Sciences - Year 2017
Using mobile devices such as smartphones or iPads for various interactive applications is currently very common. In the case of complex applications, e.g. chess games, the capabilities of these devices are insufficient to run the application in real time. One of the solutions is to use cloud computing. However, there is an optimization problem of mobile device and cloud resources allocation. An iterative heuristic algorithm for...

Full text available to download
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
Publication
- P. Czarnul
- CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE - Year 2023
In the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...

Full text available to download
Drawing maps with advice
Publication
- D. Dereniowski
- A. Pelc
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING - Year 2012
Rozważamy następujący problem obliczeniowy. Agent zostaje umieszczony w wierzchołku nieznanego mu grafu. Wierzchołki grafu są nierozróżnialne, natomiast krawędzie posiadają numery portów. Zadaniem agenta jest wyznaczenie mapy, tzn. obliczenie izomorficznej kopii grafu, lub obliczenie dowolnego drzewa spinającego grafu. Bez dodatkowej informacji zadań tych nie można wykonać. W artykule wyznaczamy oszacowania na minimalną liczbę...

Full text to download in external service
Generation of large finite-element matrices on multiple graphics processors
Publication
- INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING - Year 2013
This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...

Full text to download in external service
International Symposium on Parallel and Distributed Computing

Conferences
Online sound restoration system for digital library applications.
Publication
- Journal of the Acoustical Society of America - Year 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
Further Developments of the Online Sound Restoration System for Digital Library Applications
Publication
- Year 2014
New signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...

Full text to download in external service
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
Publication
- P. Czarnul
- Year 2002
This work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...

Full text to download in external service
Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym
Publication
- K. Łopatka
- A. Czyżewski
- Year 2015
A method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...
On the influence of shell element properties on the response of car model in crash test
Publication
- Year 2017
It goes without saying that numerical simulations play important role in the modern engineering practice. Contemporary CAD environments combined with FEM solvers, along with computer power of modern processors, give the engineer fast and efficient tool. Ultimately, however it is the user alone who is responsible for the correctness of the results. As long as the FEM calculations remain in the sphere of academic exercise, the inevitable...

Full text available to download
The impact of the AC922 Architecture on Performance of Deep Neural Network Training
Publication
- P. Rościszewski
- M. Iwański
- P. Czarnul
- Year 2020
Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report...

Full text to download in external service
An Efficient Framework For Fast Computer Aided Design of Microwave Circuits Based on the Higher-Order 3D Finite-Element Method
Publication
- RADIOENGINEERING - Year 2014
In this paper, an efficient computational framework for the full-wave design by optimization of complex microwave passive devices, such as antennas, filters, and multiplexers, is described. The framework consists of a computational engine, a 3D object modeler, and a graphical user interface. The computational engine, which is based on a finite element method with curvilinear higher-order tetrahedral elements, is coupled with built-in...

Full text available to download
International European Conference on Parallel and Distributed Computing

Conferences
IFIP International Conference on Network and Parallel Computing

Conferences
International Conference on Massively Parallel Computing Systems

Conferences
Australasian Symposium on Parallel and Distributed Computing (was AusGrid)

Conferences
General Provisioning Strategy for Local Specialized Cloud Computing Environments
Publication
- P. Orzechowski
- H. Krawczyk
- Year 2023
The well-known management strategies in cloud computing based on SLA requirements are considered. A deterministic parallel provisioning algorithm has been prepared and used to show its behavior for three different requirements: load balancing, consolidation, and fault tolerance. The impact of these strategies on the total execution time of different sets of services is analyzed for randomly chosen sets of data. This makes it possible...

Full text available to download
Performance Analysis of the OpenCL Environment on Mobile Platforms
Publication
- P. Falkowski-Gilski
- M. Plewka
- Year 2022
Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Full text to download in external service
A Stand for Measurement and Prediction of Scattering Properties of Diffusers
Publication
- Year 2018
In this paper we present a set of solutions which may be used for prototyping and simulation of acoustic scattering devices. A system proposed is capable of measuring sound field. Also a way to use an open source solution for simulation of scattering phenomena occurring in proximity of acoustic diffusers is shown. The result of our work are measurement procedure and a prototype of the simulation script based on FEniCS - an open source...

Full text to download in external service
Influence of nonlinearities on the efficiency and accuracy of FEM calculations on the example of a steel build-up thin-walled column
Publication
- P. Deniziak
- K. Winkelmann
- MATEC Web of Conferences - Year 2018
Due to the increase of computing capabilities of standard processing units, it is possible to perform complex analyses, considering a number of nonlinearities, such as geometric, material and boundary (contact) even on personal computers. In the paper, the authors have analysed the efficiency and accuracy of standard PC’s FEM calculations performed in Abaqus CAE 2017 software on the example of a critical load assessment of a thin-walled...

Full text available to download
Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
Publication
- P. Czarnul
- Year 2018
The paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...

Full text to download in external service
The Quick Measure of a Nurbs Surface Curvature for Accurate Triangular Meshing
Publication
- A. Kniat
- Polish Maritime Research - Year 2014
NURBS surfaces are the most widely used surfaces for three-dimensional models in CAD/CAE programs. As a model for FEM calculation is prepared with a CAD program it is inevitable to mesh it finally. There are many algorithms for meshing planar regions. Some of them may be used for meshing surfaces but it is necessary to take the curvature of the surface under consideration to avoid poor quality mesh. The mesh must be denser in the...

Full text available to download
Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems
Publication
- P. Rościszewski
- Year 2014
Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...

Full text to download in external service
A method to determine the tightening sequence for standing rigging of a mast
Publication
- L. Samson
- M. Kahsin
- Polish Maritime Research - Year 2019
The article proposes an alternative method to determine the sequence of generation of pre-tension forces in standing rigging of a mast. The proposed approach has been verified on both a virtual simulation experiment and laboratory tests. In this method, the desired tension values are obtained using the influence matrix which allows to calculate the effect of tension change in an individual rope on the tension distribution in the...

Full text available to download
International Conference on Parallel and Distributed Computing, Applications and Technologies

Conferences
A memory efficient and fast sparse matrix vector product on a Gpu
Publication
- Progress in Electromagnetics Research-PIER - Year 2011
This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Full text to download in external service
Three levels of fail-safe mode in MPI I/O NVRAM distributed cache
Publication
- A. Malinowski
- P. Czarnul
- Procedia Computer Science - Year 2018
The paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which...

Full text available to download
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
Publication
- A. Sieradzan
- J. Sans‐Duñó
- E. Lubecka
- C. Czaplewski
- A. Lipska
- H. Leszczyński
- K. Ocetkiewicz
- J. Proficz
- P. Czarnul
- H. Krawczyk
- A. Liwo
- JOURNAL OF COMPUTATIONAL CHEMISTRY - Year 2023
We report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...

Full text available to download
Simulating propagation of coherent light in random media using the Fredholm type integral equation
Publication
- M. Kraszewski
- J. Pluciński
- Year 2017
Studying propagation of light in random scattering materials is important for both basic and applied research. Such studies often require usage of numerical method for simulating behavior of light beams in random media. However, if such simulations require consideration of coherence properties of light, they may become a complex numerical problems. There are well established methods for simulating multiple scattering of light (e.g....

Full text available to download
Euro-Par: International European Conference on Parallel and Distributed Computing

Conferences
MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems
Publication
- SIMULATION MODELLING PRACTICE AND THEORY - Year 2017
In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...

Full text available to download
Network-aware Data Prefetching Optimization of Computations in a Heterogeneous HPC Framework
Publication
- P. Rościszewski
- International Journal of Computer Networks & Communications (IJCNC) - Year 2014
Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...

Full text available to download
Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors
Publication
- P. Czarnul
- K. Jabłońska
- International Journal of Computer Information Systems and Industrial Management Applications - Year 2020
In the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...

Full text to download in external service
Video Analytics-Based Algorithm for Monitoring Egress from Buildings
Publication
- M. Szczodrak
- A. Czyżewski
- Year 2013
A concept and practical implementation of the algorithm for detecting of potentially dangerous situations of crowding in passages is presented. An example of such situation is a crush which may be caused by obstructed pedestrian pathway. Surveillance video camera signal analysis performed on line is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of implemented algorithm which uses...

Full text to download in external service
Use of ICT infrastructure for teaching HPC
Publication
- P. Czarnul
- M. Matuszek
- Year 2019
In this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...

Full text to download in external service
New potential functions for greedy independence and coloring
Publication
- P. Borowiecki
- D. Rautenbach
- DISCRETE APPLIED MATHEMATICS - Year 2015
A potential function $f_G$ of a finite, simple and undirected graph $G=(V,E)$ is an arbitrary function $f_G : V(G) \rightarrow \mathbb{N}_0$ that assigns a nonnegative integer to every vertex of a graph $G$. In this paper we define the iterative process of computing the step potential function $q_G$ such that $q_G(v)\leq d_G(v)$ for all $v\in V(G)$. We use this function in the development of new Caro-Wei-type and Brooks-type...

Full text available to download
Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations
Publication
- K. A. Hossain
- M. Jurkowski
- J. Czub
- M. Kogut
- Computational and Structural Biotechnology Journal - Year 2021
Because of high stability and slow unfolding rates of G-quadruplexes (G4), cells have evolved specialized helicases that disrupt these non-canonical DNA and RNA structures in an ATP-dependent manner. One example is DHX36, a DEAH-box helicase, which participates in gene expression and replication by recognizing and unwinding parallel G4s. Here, we studied the molecular basis for the high affinity and specificity of DHX36 for parallel-type...

Full text available to download
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
Publication
- ENERGIES - Year 2023
High-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...

Full text available to download
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
Publication
- SIAM JOURNAL ON SCIENTIFIC COMPUTING - Year 2015
In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....

Full text to download in external service
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
Publication
- P. Czarnul
- Electronics - Year 2021
The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Full text available to download
International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

Conferences

Search

Filters

Catalog

Search results for: FEM, ITERATIVE SOLVERS, GPU, PARALLEL COMPUTING

Kamil Andrzej Rybacki mgr inż.