Filters
total: 7002
filtered: 4737
-
Catalog
- Publications 4737 available results
- Journals 483 available results
- Conferences 279 available results
- People 297 available results
- Inventions 1 available results
- Projects 20 available results
- Research Teams 1 available results
- Research Equipment 2 available results
- e-Learning Courses 219 available results
- Events 8 available results
- Open Research Data 955 available results
Chosen catalog filters
displaying 1000 best results Help
Search results for: parallel%20and%20dispersed%20systems
-
Performance/energy aware optimization of parallel applications on GPUs under power capping
PublicationIn the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...
-
Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
PublicationThe aim of this paper is to evaluate performance of new CUDA mechanisms—unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically,...
-
A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages
PublicationThese days, a lot of crime-related events take place all over the world. Most of them are reported in news portals and social media. Crime-related event extraction from the published texts can allow monitoring, analysis, and comparison of police or criminal activities in different countries or regions. Existing approaches to event extraction mainly suggest processing texts in English, French, Chinese, and some other resource-rich...
-
The influence of type of dispersed phase on rolling contact fatigue of lubricating greases on mineral base oil
Publication -
High power, zero ripples active filtering system with power modules operating in parallel
Publication -
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
PublicationThis paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...
-
Efficient parallel algorithms in global optimization of potential energy functions for peptides, proteins, and crystals
Publication -
Parallel implementation of background subtraction algorithms for real-time video processing on a supercomputer platform
PublicationResults of evaluation of the background subtraction algorithms implemented on a supercomputer platform in a parallel manner are presented in the paper. The aim of the work is to chose an algorithm, a number of threads and a task scheduling method, that together provide satisfactory accuracy and efficiency of a real-time processing of high resolution camera images, maintaining the cost of resources usage at a reasonable level. Two...
-
Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
PublicationThe paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...
-
Parallel simulations of electrophysiological phenomena in myocardium on large 32 and 64-bit Linux clusters.
PublicationW pracy podjęto badania i przeprowadzono symulacje zjawisk elektrofizjologicznych w mięśniu sercowym z wykorzystaniem wytworzonego w tym celu oprogramowania równoległego opartego na MPI. Zaimplementowano i zbadano ulepszenia kodu prowadzące do uzyskania dobrej skalowalności oraz przeprowadzono testy wydajności na najnowszych 32 i 64-bitowych klastrach linuksowych. Praca stanowi próbę równoległej implementacji znanego podejścia...
-
Portable parallel simulator using MPI for 2D and 3D domains: design and performance testing
PublicationW artykule prezentujemy szczegóły projektowo-implementacyjne naszego modularnego kodu symulacyjnego z wykorzystaniem MPI, w tym nakładaniem obliczeń i komunikacji. Podkreślamy modularność naszej implementacji pozwalającą na łatwą adaptację kodu dla innych zasotosowań. Prezentujemy związek pomiędzy przyspieszeniem obliczeń, rozmiarem i kształtami trójwymiarowych domen z różnymi stosunkami liczby węzłów aktualizowanych przez procesor...
-
Multi-agent large-scale parallel crowd simulation with NVRAM-based distributed cache
PublicationThis paper presents the architecture, main components and performance results for a parallel and modu-lar agent-based environment aimed at crowd simulation. The environment allows to simulate thousandsor more agents on maps of square kilometers or more, features a modular design and incorporates non-volatile RAM (NVRAM) with a fail-safe mode that can be activated to allow to continue computationsfrom a recently analyzed state in...
-
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
PublicationWe report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...
-
Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform
PublicationPerformance evaluation of selected complex video processing algorithms, implemented on a parallel, embedded GPU platform Tegra X1, is presented. Three algorithms were chosen for evaluation: a GMM-based object detection algorithm, a particle filter tracking algorithm and an optical flow based algorithm devoted to people counting in a crowd flow. The choice of these algorithms was based on their computational complexity and parallel...
-
Graphical presentation of the power of energy losses and power developed in the elements hydrostatic drive and control system. Part II. Rotational hydraulic motor speed parallel throtling control and volumetric control systems
PublicationPrzedstawiono interpretację graficzną mocy strat energetycznych występujących w elementach układów napędu i sterowania hydrostatycznego, a także mocy rozwijanych przez te elementy. Dokonano analizy układu indywidualnego ze sterowaniem dławieniowym równoległym prędkości silnika hydraulicznego obrotowego, układu indywidualnego ze sterowaniem objętościowym, pompą o zmiennej wydajności, prędkości silnika hydrailicznego obrotowego,...
-
Construction of highly stable parallel two-step Runge-Kutta methods for delay differential equations
PublicationW pracy pokazano, że każda A-stabilna dwukrokowa metoda Rungego-Kutty dla równań różniczkowych zwyczajnych rzędu p1 i rzędu etapowego q=p1 może być uogólniona do P-stabilnej metody dla równań różniczkowych z opóźnieniem zbieżnej jednostajnie z rzędem p=p1.
-
Modelling of First- and Second-order Chemical Reactions on ARUZ – Massively-parallel FPGA-based Machine
Publication -
Carbonized Lanthanum-Based Metal-Organic Framework with Parallel Arranged Channels for Azo-Dye Adsorption
Publication -
DL_MG: A Parallel Multigrid Poisson and Poisson–Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution
PublicationThe solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential -- a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the...
-
Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs
PublicationThe paper proposes an approach for parallelization of computations across a collection of clusters with heterogeneous nodes with both GPUs and CPUs. The proposed system partitions input data into chunks and assigns to par- ticular devices for processing using OpenCL kernels defined by the user. The sys- tem is able to minimize the execution time of the application while maintaining the power consumption of the utilized GPUs and...
-
Parallel Implementation of the Discrete Green's Function Formulation of the FDTD Method on a Multicore Central Processing Unit
PublicationParallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method was developed on a multicore central processing unit. DGF-FDTD avoids computations of the electromagnetic field in free-space cells and does not require domain termination by absorbing boundary conditions. Computed DGF-FDTD solutions are compatible with the FDTD grid enabling the perfect hybridization of FDTD...
-
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
PublicationIn the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...
-
Performance Assessment of Using Docker for Selected MPI Applications in a Parallel Environment Based on Commodity Hardware
PublicationIn the paper, we perform detailed performance analysis of three parallel MPI applications run in a parallel environment based on commodity hardware, using Docker and bare-metal configurations. The testbed applications are representative of the most typical parallel processing paradigms: master–slave, geometric Single Program Multiple Data (SPMD) as well as divide-and-conquer and feature characteristic computational and communication...
-
Feedline Alterations for Optimization-Based Design of Compact Super-Wideband MIMO Antennas in Parallel Configuration
PublicationThis letter presents a technique for size reduction of wideband multiple-input-multiple-output (MIMO) antennas. Our approach is a two-stage procedure. At the first stage, the antenna structure is modified to improve its impedance matching. This is achieved through incorporation of an n-section tapered feedline, followed by reoptimization of geometry parameters. Reducing the maximum in-band reflection well beyond the acceptance...
-
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
PublicationIn the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...
-
Electro oxidation of methanol and ethanol on poly(3,4-ethylenedioxythiophene) with dispersed Pt, Pt+Sn, and Pt+Pb particles.
PublicationWpływ mikrocząstek cyny i ołowiu na aktywność katalityczną platyny zdyspergowanej na podłożu polimerowym poli(3,4 etylenodioksytiofenu) w stosunku do reakcji elektrodowego utleniania alkoholu etylowego i metylowego określono za pomocą metody chronowoltamperometrycznej, spektroskopii fotoelektronów w zakresie promieniowania x (XPS) oraz mikrowagi elektrochemicznej. Wykazano,że zarówno cyna jak i ołów zwiększają szybkość procesu...
-
Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations
Publication -
A Fail-Safe NVRAM Based Mechanism for Efficient Creation and Recovery of Data Copies in Parallel MPI Applications
PublicationThe paper presents a fail-safe NVRAM based mechanism for creation and recovery of data copies during parallel MPI application runtime. Specifically, we target a cluster environment in which each node has an NVRAM installed in it. Our previously developed extension to the MPI I/O API can take advantage of NVRAM regions in order to provide an NVRAM based cache like mechanism to significantly speed up I/O operations and allow to preload...
-
A CMOS Pixel With Embedded ADC, Digital CDS and Gain Correction Capability for Massively Parallel Imaging Array
PublicationIn the paper, a CMOS pixel has been proposed for imaging arrays with massively parallel image acquisition and simultaneous compensation of dark signal nonuniformity (DSNU) as well as photoresponse nonuniformity (PRNU). In our solution the pixel contains all necessary functional blocks: a photosensor and an analog-to-digital converter (ADC) with built-in correlated double sampling (CDS) integrated together. It is implemented in...
-
A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache
PublicationWhile many scientific, large-scale applications are data-intensive, fast and efficient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind...
-
Taking advantage of the shared explicit cache system based critical sections in the shared memory parallel architectures
PublicationArtykuł prezentuje nową metodę implementacji sekcji krytycznych w równoległych architekturach z pamięcią współdzieloną, takich jak systemy zintegrowane wielowątkowe wieloprocesorowe. Metoda stanowi modyfikację i rozbudowanie metody zwanej Folding, dostępnej w procesorach sieciowych oraz jest w założeniach podobna do techniki zwanej cache-based locking. W porównaniu do dostępnych metod, nowa metoda usuwa problemy skalowalności i...
-
Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations
PublicationBecause of high stability and slow unfolding rates of G-quadruplexes (G4), cells have evolved specialized helicases that disrupt these non-canonical DNA and RNA structures in an ATP-dependent manner. One example is DHX36, a DEAH-box helicase, which participates in gene expression and replication by recognizing and unwinding parallel G4s. Here, we studied the molecular basis for the high affinity and specificity of DHX36 for parallel-type...
-
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
PublicationIn the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...
-
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublicationIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
-
Specificity of automatic control of micro-turbines (steam or gas -driven and expanders) in dispersed generation system of heat and electric power
PublicationThis paper presents specific problems of automatic control of steam micro-turbines and expanders intended for the dispersed, combined generating of heat and electric power. The investigations concern ensurance of certainty of energy supply and its required quality.
-
Experimental analysis of wear resistance of compacts of fine-dispersed iron powder and tungsten monocarbide nanopowder produced by impulse pressing
PublicationThe paper presents the results of studying the structure and wear resistance of compacts produced from fine dispersed reduced iron powder (average particle size 3–mu m) with the addition of tungsten carbide (WC) nanopowder with the average particle size of 25–30 nm. The mass fraction of tungsten carbide (wolfram carbide) in the powder composition was 5% and 10% of the total mass. Impulse pressing was conducted using the modified...
-
Molecular Simulations Using Boltzmann’s Thermally Activated Diffusion - Implementation on ARUZ – Massively-parallel FPGA-based Machine
Publication -
Infrared techniques for natural convection investigations in channels between two vertical, parallel, isothermal and symmetrically heated plates
PublicationThe effect of the gap width between two symmetrically heated vertical, parallel, isothermal plates on intensity of natural convective heat transfer in a gas (Pr = 0.71) was experimentally studied using the balance and gradient methods. In the former method heat fluxes were determined based on measurements of the voltage and electric current supplying the heaters placed inside the walls. In the latter, heat fluxes were calculated...
-
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
PublicationIn the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...
-
Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs
PublicationThe paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation,...
-
Effective configuration of a double triad planar parallel manipulator for precise positioning of heavy details during their assembling process
PublicationIn the paper, dynamics analysis of a parallel manipulator is presented. It is an atypical manipulator, devoted to help in assembling of heavy industrial constructions. Few atypical properties are required: small workspace; slow velocities; high loads. Initially, a short discussion about definition of the parallel manipulators is presented, as well as the sketch of the proposed structure. In parallel, some definitions, assumptions...
-
Low-Power Receivers for Wireless Capacitive Coupling Transmission in 3-D-Integrated Massively Parallel CMOS Imager
PublicationThe paper presents pixel receivers for massively parallel transmission of video signal between capacitive coupled integrated circuits (ICs). The receivers meet the key requirements for massively parallel transmission, namely low-power consumption below a single μW, small area of less than 205 μm2, high sensitivity better than 160 mV, and good immunity to crosstalk. The receivers were implemented and measured in a 3-D IC (two face-to-face...
-
Air Pollution Research Based on Spider Web and Parallel Continuous Particulate Monitoring—A Comparison Study Coupled with Identification of Sources
Publication -
Improved conformational space annealing method to treat β-structure with the UNRES force-field and to enhance scalability of parallel implementation
Publication -
Checkpointing of Parallel MPI Applications using MPI One-sided API with Support for Byte-addressable Non-volatile RAM
PublicationThe increasing size of computational clusters results in an increasing probability of failures, which in turn requires application checkpointing in order to survive those failures. Traditional checkpointing requires data to be copied from application memory into persistent storage medium, which increases application execution time as it is usually done in a separate step. In this paper we propose to use emerging byte-addressable...
-
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
PublicationThe paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...
-
Adaptive system for recognition of sounds indicating threats to security of people and property employing parallel processing of audio data streams
PublicationA system for recognition of threatening acoustic events employing parallel processing on a supercomputing cluster is featured. The methods for detection, parameterization and classication of acoustic events are introduced. The recognition engine is based onthreshold-based detection with adaptive threshold and Support Vector Machine classifcation. Spectral, temporal and mel-frequency descriptors are used as signal features. The...
-
Measurements of Dispersed Phase Velocity in Two-Phase Flows in Pipelines Using Gamma-Absorption Technique and Phase of the Cross-Spectral Density Function
PublicationThis paper concerns the application of the gamma radiation absorption method in the measurements of dispersed phase velocity in two-phase flows: liquid–gas flow in a horizontal pipe- line and liquid–solid particles in a vertical pipe. Radiometric sets containing two linear 241Am gamma radiation sources and two NaI(Tl) scintillation detectors were used in the research. Due to the stochastic nature of the signals obtained from the...
-
Transcriptome profiling reveals distinctive traits of retinol metabolism and neonatal parallels in the MRL/MpJ mouse
PublicationBackground: The MRL/MpJ mouse is a laboratory inbred strain known for regenerative abilities which are manifested by scarless closure of ear pinna punch holes. Enhanced healing responses have been reported in other organs. A remarkable feature of the strain is that the adult MRL/MpJ mouse retains several embryonic biochemical characteristics, including increased expression of stem cell markers. Results: We explored the transcriptome...
-
On possible lowering fuel oil consumption by differentiating loads on ship diesel engines running in parrallel.
PublicationPodano możliwości zmniejszenia zużycia paliwa przez silniki okrętowe pracujące równolegle dzieki różnicowaniu ich obciążeń eksploatacyjnych. Zaprezentowano podstawy teoretyczne oraz sposób postępowania przy pomiarach zużycia paliwa w czasie rzeczywistym.