Filters
total: 3465
-
Catalog
- Publications 2751 available results
- Journals 167 available results
- Conferences 115 available results
- Publishing Houses 1 available results
- People 88 available results
- Projects 4 available results
- e-Learning Courses 58 available results
- Events 2 available results
- Open Research Data 279 available results
displaying 1000 best results Help
Search results for: parallel computational implementation
-
Executing Multiple Simulations in the MERPSYS Environment
PublicationThe chapter investigates the steps necessary to perform a simulation instance in the MERPSYS environment and discusses potential limitations in case when vast numbers of simulations are required. An extended architecture is proposed which includes a JMS-based simulation queue and multiple distributed simulators, overcoming the potential bottlenecks. The chapter introduces also methods for preparing suites of multiple simulations...
-
ACM SIGPLAN Workshop on Types in Language Design and Implementation (was TIC)
Conferences -
Bounds on the cover time of parallel rotor walks
PublicationThe rotor-router mechanism was introduced as a deterministic alternative to the random walk in undirected graphs. In this model, a set of k identical walkers is deployed in parallel, starting from a chosen subset of nodes, and moving around the graph in synchronous steps. During the process, each node successively propagates walkers visiting it along its outgoing arcs in round-robin fashion, according to a fixed ordering. We consider...
-
Infrared techniques for natural convection investigations in channels between two vertical, parallel, isothermal and symmetrically heated plates
PublicationThe effect of the gap width between two symmetrically heated vertical, parallel, isothermal plates on intensity of natural convective heat transfer in a gas (Pr = 0.71) was experimentally studied using the balance and gradient methods. In the former method heat fluxes were determined based on measurements of the voltage and electric current supplying the heaters placed inside the walls. In the latter, heat fluxes were calculated...
-
Performance evaluation of the parallel object tracking algorithm employing the particle filter
PublicationAn algorithm based on particle filters is employed to track moving objects in video streams from fixed and non-fixed cameras. Particle weighting is based on color histograms computed in the iHLS color space. Particle computations are parallelized with CUDA framework. The algorithm was tested on various GPU devices: a desktop GPU card, a mobile chipset and two embedded GPU platforms. The processing speed depending on the number...
-
Scheduling of identical jobs with bipartite incompatibility graphs on uniform machines. Computational experiments
PublicationWe consider the problem of scheduling unit-length jobs on three or four uniform parallel machines to minimize the schedule length or total completion time. We assume that the jobs are subject to some types of mutual exclusion constraints, modeled by a bipartite graph of a bounded degree. The edges of the graph correspond to the pairs of jobs that cannot be processed on the same machine. Although the problem is generally NP-hard,...
-
Online sound restoration system for digital library applications
PublicationAudio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
-
Multi-agent large-scale parallel crowd simulation with NVRAM-based distributed cache
PublicationThis paper presents the architecture, main components and performance results for a parallel and modu-lar agent-based environment aimed at crowd simulation. The environment allows to simulate thousandsor more agents on maps of square kilometers or more, features a modular design and incorporates non-volatile RAM (NVRAM) with a fail-safe mode that can be activated to allow to continue computationsfrom a recently analyzed state in...
-
Tuning matrix-vector multiplication on GPU
PublicationA matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
-
Parallelization of Compute Intensive Applications into Workflows based on Services in BeesyCluster
PublicationThe paper presents an approach for modeling, optimization and execution of workflow applications based on services that incorporates both service selection and partitioning of input data for parallel processing by parallel workflow paths. A compute-intensive workflow application for parallel integration is presented. An impact of the input data partitioning on the scalability is presented. The paper shows a comparison of the theoretical...
-
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
PublicationIn the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...
-
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
PublicationThe paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...
-
Mariusz Figurski prof. dr hab. inż.
PeopleMariusz Józef Figurski (born 27 April 1964 in Łasinie, Poland) - Polish geodesist, professor of technical sciences, professor at the Gdańsk University of Technology. Early life and education He passed the matriculation examination in 1983 after he had finished Jan III Sobieski High school in Grudziądz. He graduated the Military University of Technology on an individual mode at the Faculty of Electromechanics and Civil Engineering...
-
A Power-Efficient Digital Technique for Gain and Offset Correction in Slope ADCs
PublicationIn this brief, a power-efficient digital technique for gain and offset correction in slope analog-to-digital converters (ADCs) has been proposed. The technique is especially useful for imaging arrays with massively parallel image acquisition where simultaneous compensation of dark signal non-uniformity (DSNU) as well as photo-response non-uniformity (PRNU) is critical. The presented approach is based on stopping the ADC clock by...
-
Process arrival pattern aware algorithms for acceleration of scatter and gather operations
PublicationImbalanced process arrival patterns (PAPs) are ubiquitous in many parallel and distributed systems, especially in HPC ones. The collective operations, e.g. in MPI, are designed for equal process arrival times (PATs), and are not optimized for deviations in their appearance. We propose eight new PAP-aware algorithms for the scatter and gather operations. They are binomial or linear tree adaptations introducing additional process...
-
Theory and implementation of a virtualisation level Future Internet defence in depth architecture
PublicationAn EU Future Internet Engineering project currently underway in Poland defines three parallel internets (PIs). The emerging IIP system (IIPS, abbreviating the project’s Polish name), has a four-level architecture, with level 2 responsible for creation of virtual resources of the PIs. This paper proposes a three-tier security architecture to address level 2 threats of unauthorised traffic injection and IIPS traffic manipulation...
-
Scheduling with Complete Multipartite Incompatibility Graph on Parallel Machines
PublicationIn this paper we consider a problem of job scheduling on parallel machines with a presence of incompatibilities between jobs. The incompatibility relation can be modeled as a complete multipartite graph in which each edge denotes a pair of jobs that cannot be scheduled on the same machine. Our research stems from the works of Bodlaender, Jansen, and Woeginger (1994) and Bodlaender and Jansen (1993). In particular, we pursue the...
-
A Fail-Safe NVRAM Based Mechanism for Efficient Creation and Recovery of Data Copies in Parallel MPI Applications
PublicationThe paper presents a fail-safe NVRAM based mechanism for creation and recovery of data copies during parallel MPI application runtime. Specifically, we target a cluster environment in which each node has an NVRAM installed in it. Our previously developed extension to the MPI I/O API can take advantage of NVRAM regions in order to provide an NVRAM based cache like mechanism to significantly speed up I/O operations and allow to preload...
-
Behavior Analysis and Dynamic Crowd Management in Video Surveillance System
PublicationA concept and practical implementation of a crowd management system which acquires input data by the set of monitoring cameras is presented. Two leading threads are considered. First concerns the crowd behavior analysis. Second thread focuses on detection of a hold-ups in the doorway. The optical flow combined with soft computing methods (neural network) is employed to evaluate the type of crowd behavior, and fuzzy logic aids detection...
-
A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems
PublicationModern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...
-
Distributed NVRAM Cache – Optimization and Evaluation with Power of Adjacency Matrix
PublicationIn this paper we build on our previously proposed MPI I/O NVRAM distributed cache for high performance computing. In each cluster node it incorporates NVRAMs which are used as an intermediate cache layer between an application and a file for fast read/write operations supported through wrappers of MPI I/O functions. In this paper we propose optimizations of the solution including handling of write requests with a synchronous mode,...
-
A novel hybrid adaptive framework for support vector machine-based reliability analysis: A comparative study
PublicationThis study presents an innovative hybrid Adaptive Support Vector Machine - Monte Carlo Simulation (ASVM-MCS) framework for reliability analysis in complex engineering structures. These structures often involve highly nonlinear implicit functions, making traditional gradient-based first or second order reliability algorithms and Monte Carlo Simulation (MCS) time-consuming. The application of surrogate models has proven effective...
-
Planning optimised multi-tasking operations under the capability for parallel machining
PublicationThe advent of advanced multi-tasking machines (MTMs) in the metalworking industry has provided the opportunity for more efficient parallel machining as compared to traditional sequential processing. It entailed the need for developing appropriate reasoning schemes for efficient process planning to take advantage of machining capabilities inherent in these machines. This paper addresses an adequate methodical approach for a non-linear...
-
Implementation of spatial/polarization diversity for improved-performance circularly polarized multiple-input-multiple-output ultra-wideband antenna
PublicationIn this paper, spatial and polarization diversities are simultaneously implemented in an ultra-wideband (UWB) multiple-input-multiple-output (MIMO) antenna to reduce the correlation between the parallel-placed radiators. The keystone of the antenna is systematically modified coplanar ground planes that enable excitation of circular polarization (CP). To realize one sense of circular polarization as well as ultra-wideband operation,...
-
Performance/energy aware optimization of parallel applications on GPUs under power capping
PublicationIn the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...
-
Multi-source-supplied parallel hybrid propulsion of the inland passenger ship STA.H. Research work on energy efficiency of a hybrid propulsion system operating in the electric motor drive mode
PublicationIn the Faculty of Ocean Engineering and Ship Technology, Gdansk University of Technology, design has recently been developed of a small inland ship with hybrid propulsion and supply system. The ship will be propelled by a specially designed so called parallel hybrid propulsion system. The work was aimed at carrying out the energy efficiency analysis of a hybrid propulsion system operating in the electric motor drive mode and at...
-
Hydrodynamic reanalysis of currents in the Baltic Sea using the PM3D model
Open Research DataThe dataset contains the results of numerical modeling of currents in the Baltic Sea since 1998. A long-term reanalysis was performed using a three-dimensional hydrodynamic model PM3D (Kowalewski and Kowalewska-Kalkowska, 2017), a new version of the M3D model (Kowalewski, 1997).
-
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
PublicationIn the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...
-
Investigation of Mechanical and Microstructural Properties of Welded Specimens of AA6061-T6 Alloy with Friction Stir Welding and Parallel Friction Stir Welding Methods
PublicationThe present study investigates the effect of two parameters of process type and tool offset on tensile, microhardness, and microstructure properties of AA6061-T6 aluminum alloy joints. Three methods of Friction Stir Welding (FSW), Advancing Parallel-Friction Stir Welding (AP-FSW), and Retreating Parallel-Friction Stir Welding (RP-FSW) were used. In addition, four modes of 0.5, 1, 1.5, and 2 mm of tool offset were used in two welding...
-
Experimental Research on the Energy Efficiency of a Parallel Hybrid Drive for an Inland Ship
PublicationThe growing requirements for limiting the negative impact of all modes of transport on the natural environment mean that clean technologies are becoming more and more important. The global trend of e-mobility also applies to sea and inland water transport. This article presents the results of experimental tests carried out on a life-size, parallel diesel-electric hybrid propulsion system. The eciency of the propulsion system was...
-
Scheduling with Complete Multipartite Incompatibility Graph on Parallel Machines: Complexity and Algorithms
PublicationIn this paper, the problem of scheduling on parallel machines with a presence of incompatibilities between jobs is considered. The incompatibility relation can be modeled as a complete multipartite graph in which each edge denotes a pair of jobs that cannot be scheduled on the same machine. The paper provides several results concerning schedules, optimal or approximate with respect to the two most popular criteria of optimality:...
-
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
PublicationIn the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...
-
Long-term hindcast simulation of sea ice in the Baltic Sea
Open Research DataThe data set contains the results of numerical modeling of sea ice over a period of 50 years (1958-2007) in the Baltic Sea. A long-term hindcast simulation was performed using a three-dimensional hydrodynamic model PM3D (Kowalewski and Kowalewska-Kalkowska, 2017), a new version of the M3D model (Kowalewski, 1997). A numerical dynamic-thermodynamic model...
-
Hydrodynamic reanalysis of ice conditions in the Baltic Sea using the PM3D model
Open Research DataThe dataset contains the results of numerical modeling of sea ice in the Baltic Sea since 1998. A long-term reanalysis was performed using the three-dimensional hydrodynamic model PM3D (Kowalewski and Kowalewska-Kalkowska, 2017), a new version of the M3D model (Kowalewski, 1997). A numerical dynamic-thermodynamic model of sea ice (Herman et al. 2011)...
-
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
PublicationThe paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...
-
Identification of nonstationary processes using noncausal bidirectional lattice filtering
PublicationThe problem of off-line identification of a nonstationary autoregressive process with a time-varying order and a time-varying degree of nonstationarity is considered and solved using the parallel estimation approach. The proposed parallel estimation scheme is made up of several bidirectional (noncausal) exponentially weighted lattice algorithms with different estimation memory and order settings. It is shown that optimization of...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublicationThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
Rigid finite elements and multibody modeling in analyses of a robot shaped elastic/plastic deformations of a beam
PublicationDynamics analysis of a system composed of a parallel manipulator and of an elastic beam is presented in the paper. Classic 3RRR parallel manipulator is considered and used to deform the beam. Elasto-plastic deformations are investigated. Rigid-finite-elements technique is employed to deal with dynamics of the beam. A multibody structure is associated with the introduced hybrid system in order to model its dynamics. Idea of the...
-
A Novel Coupling Matrix Synthesis Technique for Generalized Chebyshev Filters With Resonant Source–Load Connection
PublicationThis paper reports a novel synthesis method for microwave bandpass filters with resonant source–load connection. In effect, a network realizing N+1 transmission zeros (where N is the number of reflection zeros) is obtained. The method is based on a prototype transversal coupling matrix (N+2, N+2) with source and load connected by a resonant circuit formed by a capacitor in parallel with a frequency-invariant susceptance. To complement...
-
Multipulse inverter structures with low voltage distortion
PublicationA novel approach to the voltage source inverters (VSI) construction is presented in the paper. The invented inverter structures allow to operate several DC/AC converters in parallel resulting in lower voltage distortions at extremely low switching frequency. The research presented in the paper describes such a parallel operation of the VSI’s which is possible thanks to the use of coupled inductors. The eighteen-pulse three-level...
-
Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs
PublicationThis paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...
-
Modular multipulse voltage source inverters with integrating coupled reactors
PublicationA novel approach to the voltage source inverters (VSI) construction is presented in the paper. The invented inverter structures allow to operate several DC/AC converters in parallel resulting in lower voltage distortions at extremely low switching frequency. The research presented in the paper describes such a parallel operation of the VSI’s which is possible thanks to the use of coupled inductors. The eighteen-pulse and twenty-four-pulse...
-
Scalable Measurement System for Multiple Impedance Gas Sensors
PublicationAuthor proposes scalable architecture of the measurement system for gas sensor with impedance dependance of the gas concentration. The main part of the system is a single-board impedance analyser. The number of analysers working in parallel can be configured according to specific application. The system is controlled by a single computer which organises the measurement cycle and store the acquired measurement data. The system is...
-
Scheduling of compatible jobs on parallel machines
PublicationThe dissertation discusses the problems of scheduling compatible jobs on parallel machines. Some jobs are incompatible, which is modeled as a binary relation on the set of jobs; the relation is often modeled by an incompatibility graph. We consider two models of machines. The first model, more emphasized in the thesis, is a classical model of scheduling, where each machine does one job at time. The second one is a model of p-batching...
-
A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache
PublicationWhile many scientific, large-scale applications are data-intensive, fast and efficient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind...
-
Hydrodynamic reanalysis of sea level in the Baltic Sea using the PM3D model
Open Research DataThe data set contains the results of numerical modelling of sea level fluctuations in the Baltic Sea in the Baltic Sea since 1998. A long-term reanalysis was performed using a three-dimensional hydrodynamic model PM3D (Kowalewski and Kowalewska-Kalkowska, 2017), a new version of the M3D model (Kowalewski, 1997).
-
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublicationIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
-
Unraveling the Interplay between DNA and Proteins: A Computational Exploration of Sequence and Structure-Specific Recognition Mechanisms
PublicationMy PhD dissertation focused on DNA-protein interactions and the recognition of specific DNA sequences and structures. I discovered that acidic amino acid residues (Asp/Glu) play a crucial role by exhibiting a preference for cytosine. Their contribution to binding affinity depends on nearby cytosines, balancing electrostatic repulsion with specific interactions. Acidic residues act as negative selectors, discouraging non-cytosine...
-
Self-optimizing generalized adaptive notch filters - comparison of three optimization strategies
PublicationThe paper provides comparison of three different approaches to on-line tuning of generalized adaptive notch filters (GANFs) the algorithms used for identification/tracking of quasi-periodically varying dynamic systems. Tuning is needed to adjust adaptation gains, which control tracking performance of ANF algorithms, to the unknown and/or time time-varying rate of system nonstationarity. Two out ofthree compared approaches are classical...
-
Hydrodynamic reanalysis of water temperature and salinity in the Baltic Sea using the PM3D model
Open Research DataThe dataset contains the results of numerical modeling of water temperature and salinity in the Baltic Sea since 1998. A long-term reanalysis was performed using the three-dimensional hydrodynamic model PM3D (Kowalewski and Kowalewska-Kalkowska, 2017), a new version of the M3D model (Kowalewski, 1997). A numerical dynamic-thermodynamic model of sea...