Filtry
wszystkich: 756
-
Katalog
Wyniki wyszukiwania dla: PARALLEL
-
ARUZ — Large-scale, massively parallel FPGA-based analyzer of real complex systems
Publikacja -
Parallel simulations of electrophysiological phenomena in myocardium on large 32 and 64-bit Linux clusters.
PublikacjaW pracy podjęto badania i przeprowadzono symulacje zjawisk elektrofizjologicznych w mięśniu sercowym z wykorzystaniem wytworzonego w tym celu oprogramowania równoległego opartego na MPI. Zaimplementowano i zbadano ulepszenia kodu prowadzące do uzyskania dobrej skalowalności oraz przeprowadzono testy wydajności na najnowszych 32 i 64-bitowych klastrach linuksowych. Praca stanowi próbę równoległej implementacji znanego podejścia...
-
Portable parallel simulator using MPI for 2D and 3D domains: design and performance testing
PublikacjaW artykule prezentujemy szczegóły projektowo-implementacyjne naszego modularnego kodu symulacyjnego z wykorzystaniem MPI, w tym nakładaniem obliczeń i komunikacji. Podkreślamy modularność naszej implementacji pozwalającą na łatwą adaptację kodu dla innych zasotosowań. Prezentujemy związek pomiędzy przyspieszeniem obliczeń, rozmiarem i kształtami trójwymiarowych domen z różnymi stosunkami liczby węzłów aktualizowanych przez procesor...
-
Parallel Implementation of the Discrete Green's Function Formulation of the FDTD Method on a Multicore Central Processing Unit
PublikacjaParallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method was developed on a multicore central processing unit. DGF-FDTD avoids computations of the electromagnetic field in free-space cells and does not require domain termination by absorbing boundary conditions. Computed DGF-FDTD solutions are compatible with the FDTD grid enabling the perfect hybridization of FDTD...
-
Performance Assessment of Using Docker for Selected MPI Applications in a Parallel Environment Based on Commodity Hardware
PublikacjaIn the paper, we perform detailed performance analysis of three parallel MPI applications run in a parallel environment based on commodity hardware, using Docker and bare-metal configurations. The testbed applications are representative of the most typical parallel processing paradigms: master–slave, geometric Single Program Multiple Data (SPMD) as well as divide-and-conquer and feature characteristic computational and communication...
-
Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs
PublikacjaThe paper proposes an approach for parallelization of computations across a collection of clusters with heterogeneous nodes with both GPUs and CPUs. The proposed system partitions input data into chunks and assigns to par- ticular devices for processing using OpenCL kernels defined by the user. The sys- tem is able to minimize the execution time of the application while maintaining the power consumption of the utilized GPUs and...
-
Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems
PublikacjaRapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...
-
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
PublikacjaIn the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...
-
Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming
PublikacjaIn the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...
-
DL_MG: A Parallel Multigrid Poisson and Poisson–Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution
PublikacjaThe solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential -- a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the...
-
Feedline Alterations for Optimization-Based Design of Compact Super-Wideband MIMO Antennas in Parallel Configuration
PublikacjaThis letter presents a technique for size reduction of wideband multiple-input-multiple-output (MIMO) antennas. Our approach is a two-stage procedure. At the first stage, the antenna structure is modified to improve its impedance matching. This is achieved through incorporation of an n-section tapered feedline, followed by reoptimization of geometry parameters. Reducing the maximum in-band reflection well beyond the acceptance...
-
Construction of highly stable parallel two-step Runge-Kutta methods for delay differential equations
PublikacjaW pracy pokazano, że każda A-stabilna dwukrokowa metoda Rungego-Kutty dla równań różniczkowych zwyczajnych rzędu p1 i rzędu etapowego q=p1 może być uogólniona do P-stabilnej metody dla równań różniczkowych z opóźnieniem zbieżnej jednostajnie z rzędem p=p1.
-
Modelling of First- and Second-order Chemical Reactions on ARUZ – Massively-parallel FPGA-based Machine
Publikacja -
Carbonized Lanthanum-Based Metal-Organic Framework with Parallel Arranged Channels for Azo-Dye Adsorption
Publikacja -
A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache
PublikacjaWhile many scientific, large-scale applications are data-intensive, fast and efficient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind...
-
Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations
PublikacjaBecause of high stability and slow unfolding rates of G-quadruplexes (G4), cells have evolved specialized helicases that disrupt these non-canonical DNA and RNA structures in an ATP-dependent manner. One example is DHX36, a DEAH-box helicase, which participates in gene expression and replication by recognizing and unwinding parallel G4s. Here, we studied the molecular basis for the high affinity and specificity of DHX36 for parallel-type...
-
A CMOS Pixel With Embedded ADC, Digital CDS and Gain Correction Capability for Massively Parallel Imaging Array
PublikacjaIn the paper, a CMOS pixel has been proposed for imaging arrays with massively parallel image acquisition and simultaneous compensation of dark signal nonuniformity (DSNU) as well as photoresponse nonuniformity (PRNU). In our solution the pixel contains all necessary functional blocks: a photosensor and an analog-to-digital converter (ADC) with built-in correlated double sampling (CDS) integrated together. It is implemented in...
-
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
PublikacjaIn the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...
-
A Fail-Safe NVRAM Based Mechanism for Efficient Creation and Recovery of Data Copies in Parallel MPI Applications
PublikacjaThe paper presents a fail-safe NVRAM based mechanism for creation and recovery of data copies during parallel MPI application runtime. Specifically, we target a cluster environment in which each node has an NVRAM installed in it. Our previously developed extension to the MPI I/O API can take advantage of NVRAM regions in order to provide an NVRAM based cache like mechanism to significantly speed up I/O operations and allow to preload...
-
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
PublikacjaThe paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...
-
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublikacjaIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
-
Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations
Publikacja -
Taking advantage of the shared explicit cache system based critical sections in the shared memory parallel architectures
PublikacjaArtykuł prezentuje nową metodę implementacji sekcji krytycznych w równoległych architekturach z pamięcią współdzieloną, takich jak systemy zintegrowane wielowątkowe wieloprocesorowe. Metoda stanowi modyfikację i rozbudowanie metody zwanej Folding, dostępnej w procesorach sieciowych oraz jest w założeniach podobna do techniki zwanej cache-based locking. W porównaniu do dostępnych metod, nowa metoda usuwa problemy skalowalności i...
-
Przetwarzanie Równoległe CUDA/Parallel processing on CUDA
Kursy Online -
Parallel Processing Letters
Czasopisma -
Effective configuration of a double triad planar parallel manipulator for precise positioning of heavy details during their assembling process
PublikacjaIn the paper, dynamics analysis of a parallel manipulator is presented. It is an atypical manipulator, devoted to help in assembling of heavy industrial constructions. Few atypical properties are required: small workspace; slow velocities; high loads. Initially, a short discussion about definition of the parallel manipulators is presented, as well as the sketch of the proposed structure. In parallel, some definitions, assumptions...
-
Low-Power Receivers for Wireless Capacitive Coupling Transmission in 3-D-Integrated Massively Parallel CMOS Imager
PublikacjaThe paper presents pixel receivers for massively parallel transmission of video signal between capacitive coupled integrated circuits (ICs). The receivers meet the key requirements for massively parallel transmission, namely low-power consumption below a single μW, small area of less than 205 μm2, high sensitivity better than 160 mV, and good immunity to crosstalk. The receivers were implemented and measured in a 3-D IC (two face-to-face...
-
Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge
PublikacjaAuto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...
-
Infrared techniques for natural convection investigations in channels between two vertical, parallel, isothermal and symmetrically heated plates
PublikacjaThe effect of the gap width between two symmetrically heated vertical, parallel, isothermal plates on intensity of natural convective heat transfer in a gas (Pr = 0.71) was experimentally studied using the balance and gradient methods. In the former method heat fluxes were determined based on measurements of the voltage and electric current supplying the heaters placed inside the walls. In the latter, heat fluxes were calculated...
-
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
PublikacjaIn the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...
-
Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs
PublikacjaThe paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation,...
-
Molecular Simulations Using Boltzmann’s Thermally Activated Diffusion - Implementation on ARUZ – Massively-parallel FPGA-based Machine
Publikacja -
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
PublikacjaThe paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...
-
Adaptive system for recognition of sounds indicating threats to security of people and property employing parallel processing of audio data streams
PublikacjaA system for recognition of threatening acoustic events employing parallel processing on a supercomputing cluster is featured. The methods for detection, parameterization and classication of acoustic events are introduced. The recognition engine is based onthreshold-based detection with adaptive threshold and Support Vector Machine classifcation. Spectral, temporal and mel-frequency descriptors are used as signal features. The...
-
Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption
PublikacjaMany important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various characteristics. Utilizing full power of such systems requires programming parallel applications that are hybrid in two meanings: they can utilize parallelism on multiple levels at the same time and combine together programming interfaces...
-
Improved conformational space annealing method to treat β-structure with the UNRES force-field and to enhance scalability of parallel implementation
Publikacja -
Air Pollution Research Based on Spider Web and Parallel Continuous Particulate Monitoring—A Comparison Study Coupled with Identification of Sources
Publikacja -
Checkpointing of Parallel MPI Applications using MPI One-sided API with Support for Byte-addressable Non-volatile RAM
PublikacjaThe increasing size of computational clusters results in an increasing probability of failures, which in turn requires application checkpointing in order to survive those failures. Traditional checkpointing requires data to be copied from application memory into persistent storage medium, which increases application execution time as it is usually done in a separate step. In this paper we propose to use emerging byte-addressable...
-
Numerical Study on Mitigation of Flow Maldistribution in Parallel Microchannel Heat Sink: Channels Variable Width Versus Variable Height Approach
PublikacjaMicrochannel heat sink on one hand enjoys benefits of intensified several folds heat transfer performance but on the other hand has to suffer aggravated form of trifling limitations associated with imperfect hydrodynamics and heat transfer behavior. Flow maldistribution is one of such limitation that exaggerates temperature nonuniformity across parallel microchannels leading to increase in maximum base temperature. Recently, variable...
-
Massively parallel linear-scaling Hartree–Fock exchange and hybrid exchange–correlation functionals with plane wave basis set accuracy
PublikacjaWe extend our linear-scaling approach for the calculation of Hartree–Fock exchange energy using localized in situ optimized orbitals [Dziedzic et al., J. Chem. Phys. 139, 214103 (2013)] to leverage massive parallelism. Our approach has been implemented in the ONETEP (Order-N Electronic Total Energy Package) density functional theory framework, which employs a basis of non-orthogonal generalized Wannier functions (NGWFs) to achieve...
-
Redundantly Actuated 3RRR Parallel Planar Manipulator - Numerical Analyses of its Dynamics Sensitivity on Modifications of its Platform’s Inertia Parameters
PublikacjaIn the paper, numerical analyses, as well as dynamics of a complex mechanism, are presented. Two objectives are crucial for the paper: inverse dynamic model is needed (dedicated to be use in the model predictive controller); an identification method is searched (some trajectory parameters are controlled, when specific trajectory is tracked under an open-loop model-based control), as selected parameters must be identified for the...
-
Investigation of Mechanical and Microstructural Properties of Welded Specimens of AA6061-T6 Alloy with Friction Stir Welding and Parallel Friction Stir Welding Methods
PublikacjaThe present study investigates the effect of two parameters of process type and tool offset on tensile, microhardness, and microstructure properties of AA6061-T6 aluminum alloy joints. Three methods of Friction Stir Welding (FSW), Advancing Parallel-Friction Stir Welding (AP-FSW), and Retreating Parallel-Friction Stir Welding (RP-FSW) were used. In addition, four modes of 0.5, 1, 1.5, and 2 mm of tool offset were used in two welding...
-
Grid Implementation of a Parallel Multiobjective Genetic Algorithm for Optimized Allocation of Chlorination Stations in Drinking Water Distribution Systems: Chojnice Case Study
PublikacjaSolving multiobjective optimization problems requires suitable algorithms to find a satisfactory approximation of a globally optimal Pareto front. Furthermore, it is a computationally demanding task. In this paper, the grid implementation of a distributed multiobjective genetic algorithm is presented. The distributed version of the algorithm is based on the island algorithm with forgetting island elitism used instead of a genetic...
-
Neural, Parallel and Scientific Computations
Czasopisma -
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
Czasopisma -
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
Czasopisma -
Algorytmy równoległe i rozproszone/Parallel and distributed >> algorithms
Kursy Online -
Multiprocessor implementation of Parallel Multiobjective Genetic Algorithm for Optimized Allocation of Chlorination Stations in Drinking Water Distribution System - a new water quality model approach
Publikacja -
Multiprocessor Implementation of Parallel Multiobjective Genetic Algorithm for Optimized Allocation of Chlorination Stations in Drinking Water Distribution System a New Water Quality Model Approach
PublikacjaThe Critical Infrastructure Systems (CISs) have received in recent years a considerable attention due to their heavy impact on sustainable development of modern societies. Most CISs may be classied as large scale complex systems of network structure, in uenced by strong interactions form the surrounding environment, internal and external interconnections. The later is a result of inter-CIS dependencies. The control, monitoring...
-
Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym
PublikacjaA method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...