Filters
total: 420
filtered: 410
Search results for: PARALLEL ALGORITHMS
-
Generalized regression neural network and fitness dependent optimization: Application to energy harvesting of centralized TEG systems
PublicationThe thermoelectric generator (TEG) system has attracted extensive attention because of its applications in centralized solar heat utilization and recoverable heat energy. The operating efficiency of the TEG system is highly affected by operating conditions. In a series-parallel structure, due to diverse temperature differences, the TEG modules show non-linear performance. Due to the non-uniform temperature distribution (NUTD) condition,...
-
Grid Implementation of a Parallel Multiobjective Genetic Algorithm for Optimized Allocation of Chlorination Stations in Drinking Water Distribution Systems: Chojnice Case Study
PublicationSolving multiobjective optimization problems requires suitable algorithms to find a satisfactory approximation of a globally optimal Pareto front. Furthermore, it is a computationally demanding task. In this paper, the grid implementation of a distributed multiobjective genetic algorithm is presented. The distributed version of the algorithm is based on the island algorithm with forgetting island elitism used instead of a genetic...
-
Energy consumption optimization in wastewater treatment plants: Machine learning for monitoring incineration of sewage sludge
PublicationBiomass management in terms of energy consumption optimization has become a recent challenge for developed countries. Nevertheless, the multiplicity of materials and operating parameters controlling energy consumption in wastewater treatment plants necessitates the need for sophisticated well-organized disciplines in order to minimize energy consumption and dissipation. Sewage sludge (SS) disposal management is the key stage of...
-
Parallelization of Compute Intensive Applications into Workflows based on Services in BeesyCluster
PublicationThe paper presents an approach for modeling, optimization and execution of workflow applications based on services that incorporates both service selection and partitioning of input data for parallel processing by parallel workflow paths. A compute-intensive workflow application for parallel integration is presented. An impact of the input data partitioning on the scalability is presented. The paper shows a comparison of the theoretical...
-
Modeling Parallel Applications in the MERPSYS Environment
PublicationThe chapter presents how to model parallel computational applications for which simulation of execution in a large-scale parallel or distributed environment is performed within the MERPSYS environment. Specifically, it is shown what approaches can be adopted to model key paradigms often used for parallel applications: master-slave, geometric parallelism (single program multiple data), pipelined and divide-and-conquer applications....
-
Multi-agent large-scale parallel crowd simulation
PublicationThis paper presents design, implementation and performance results of a new modular, parallel, agent-based and large scale crowd simulation environment. A parallel application, implemented with C and MPI, was implemented and run in this parallel environment for simulation and visualization of an evacuation scenario at Gdansk University of Technology, Poland and further in the area of districts of Gdansk. The application uses a...
-
Block-based Representation of Application Execution on Modern Parallel Systems
PublicationThe chapter presents how to model execution of a parallel computational application that is to be executed in a large-scale parallel or distributed environment with potentially thousands to millions of execution units. The representation uses pre- viously attributes and factors representative of modern high performance systems including multicore CPUs, GPUs, dedicated accelerators such as Intel Phi.
-
Simulation of parallel similarity measure computations for large data sets
PublicationThe paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components...
-
Acceleration of the DGF-FDTD method on GPU using the CUDA technology
PublicationWe present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...
-
Rigid finite elements and multibody modeling in analyses of a robot shaped elastic/plastic deformations of a beam
PublicationDynamics analysis of a system composed of a parallel manipulator and of an elastic beam is presented in the paper. Classic 3RRR parallel manipulator is considered and used to deform the beam. Elasto-plastic deformations are investigated. Rigid-finite-elements technique is employed to deal with dynamics of the beam. A multibody structure is associated with the introduced hybrid system in order to model its dynamics. Idea of the...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublicationThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
Multipulse inverter structures with low voltage distortion
PublicationA novel approach to the voltage source inverters (VSI) construction is presented in the paper. The invented inverter structures allow to operate several DC/AC converters in parallel resulting in lower voltage distortions at extremely low switching frequency. The research presented in the paper describes such a parallel operation of the VSI’s which is possible thanks to the use of coupled inductors. The eighteen-pulse three-level...
-
Executing Multiple Simulations in the MERPSYS Environment
PublicationThe chapter investigates the steps necessary to perform a simulation instance in the MERPSYS environment and discusses potential limitations in case when vast numbers of simulations are required. An extended architecture is proposed which includes a JMS-based simulation queue and multiple distributed simulators, overcoming the potential bottlenecks. The chapter introduces also methods for preparing suites of multiple simulations...
-
Effective configuration of a double triad planar parallel manipulator for precise positioning of heavy details during their assembling process
PublicationIn the paper, dynamics analysis of a parallel manipulator is presented. It is an atypical manipulator, devoted to help in assembling of heavy industrial constructions. Few atypical properties are required: small workspace; slow velocities; high loads. Initially, a short discussion about definition of the parallel manipulators is presented, as well as the sketch of the proposed structure. In parallel, some definitions, assumptions...
-
Scalable Measurement System for Multiple Impedance Gas Sensors
PublicationAuthor proposes scalable architecture of the measurement system for gas sensor with impedance dependance of the gas concentration. The main part of the system is a single-board impedance analyser. The number of analysers working in parallel can be configured according to specific application. The system is controlled by a single computer which organises the measurement cycle and store the acquired measurement data. The system is...
-
Modular multipulse voltage source inverters with integrating coupled reactors
PublicationA novel approach to the voltage source inverters (VSI) construction is presented in the paper. The invented inverter structures allow to operate several DC/AC converters in parallel resulting in lower voltage distortions at extremely low switching frequency. The research presented in the paper describes such a parallel operation of the VSI’s which is possible thanks to the use of coupled inductors. The eighteen-pulse and twenty-four-pulse...
-
50’ Sail Catamaran with Hybrid Propulsion, Design, Theoretical and Experimental Studies
PublicationThe development of modern lithium batteries and propulsion systems now allows the use of complex propulsion systems for vessels of various sizes. As part of the research and implementation project, a parallel hybrid drive system was designed, built and then tested in the laboratory. The experimental studies conducted allowed for the measurements of power, fuel consumption and electric power distribution in various operating modes...
-
A Workflow Application for Parallel Processing of Big Data from an Internet Portal
PublicationThe paper presents a workflow application for efficient parallel processing of data downloaded from an Internet portal. The workflow partitions input files into subdirectories which are further split for parallel processing by services installed on distinct computer nodes. This way, analysis of the first ready subdirectories can start fast and is handled by services implemented as parallel multithreaded applications using multiple...
-
Parallel Implementation of the Discrete Green's Function Formulation of the FDTD Method on a Multicore Central Processing Unit
PublicationParallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method was developed on a multicore central processing unit. DGF-FDTD avoids computations of the electromagnetic field in free-space cells and does not require domain termination by absorbing boundary conditions. Computed DGF-FDTD solutions are compatible with the FDTD grid enabling the perfect hybridization of FDTD...
-
Performance Assessment of Using Docker for Selected MPI Applications in a Parallel Environment Based on Commodity Hardware
PublicationIn the paper, we perform detailed performance analysis of three parallel MPI applications run in a parallel environment based on commodity hardware, using Docker and bare-metal configurations. The testbed applications are representative of the most typical parallel processing paradigms: master–slave, geometric Single Program Multiple Data (SPMD) as well as divide-and-conquer and feature characteristic computational and communication...
-
Low-Power Receivers for Wireless Capacitive Coupling Transmission in 3-D-Integrated Massively Parallel CMOS Imager
PublicationThe paper presents pixel receivers for massively parallel transmission of video signal between capacitive coupled integrated circuits (ICs). The receivers meet the key requirements for massively parallel transmission, namely low-power consumption below a single μW, small area of less than 205 μm2, high sensitivity better than 160 mV, and good immunity to crosstalk. The receivers were implemented and measured in a 3-D IC (two face-to-face...
-
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
PublicationThis paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...
-
Performance evaluation of parallel background subtraction on GPU platforms
PublicationImplementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...
-
Parallel Computations of Text Similarities for Categorization Task
PublicationIn this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....
-
Modeling SPMD Application Execution Time
PublicationParallel applications in a Single Process Multiple Data paradigm assume splitting huge amounts of data to multiple processors working in parallel at small data packets. As the individual data packets are not independent, the processors must interact with each other to exchange results of the calculations with their adjacent partners and take these results into account in their own computations. An example of SPMD is geometric parallelism...
-
Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs
PublicationThe paper proposes an approach for parallelization of computations across a collection of clusters with heterogeneous nodes with both GPUs and CPUs. The proposed system partitions input data into chunks and assigns to par- ticular devices for processing using OpenCL kernels defined by the user. The sys- tem is able to minimize the execution time of the application while maintaining the power consumption of the utilized GPUs and...
-
Image Processing Techniques for Distributed Grid Applications
PublicationParallel approaches to 2D and 3D convolution processing of series of images have been presented. A distributed, practically oriented, 2D spatial convolution scheme has been elaborated and extended into the temporal domain. Complexity of the scheme has been determined and analysed with respect to coefficients in convolution kernels. Possibilities of parallelisation of the convolution operations have been analysed and the results...
-
Two Stage SVM and kNN Text Documents Classifier
PublicationThe paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...
-
Tryton Supercomputer Capabilities for Analysis of Massive Data Streams
PublicationThe recently deployed supercomputer Tryton, located in the Academic Computer Center of Gdansk University of Technology, provides great means for massive parallel processing. Moreover, the status of the Center as one of the main network nodes in the PIONIER network enables the fast and reliable transfer of data produced by miscellaneous devices scattered in the area of the whole country. The typical examples of such data are streams...
-
The complexity of bicriteria tree-depth
PublicationThe tree-depth problem can be seen as finding an elimination tree of minimum height for a given input graph G. We introduce a bicriteria generalization in which additionally the width of the elimination tree needs to be bounded by some input integer b. We are interested in the case when G is the line graph of a tree, proving that the problem is NP-hard and obtaining a polynomial-time additive 2b-approximation algorithm. This particular...
-
In-ADC, Rank-Order Filter for Digital Pixel Sensors
PublicationThis paper presents a new implementation of the rank-order filter, which is established on a parallel-operated array of single-slope (SS) analog-to-digital converters (ADCs). The SS ADCs use an “on-the-ramp processing” technique, i.e., filtration is performed along with analog-to-digital conversion, so the final states of the converters represent a filtered image. A proof-of-concept 64 × 64 array of SS ADCs, integrated with MOS...
-
MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems
PublicationIn this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...
-
Parallel Programming for Modern High Performance Computing Systems
PublicationIn view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...
-
ENERGY EFFICIENT AND ENVIRONMENTALLY FRIENDLY HYBRID CONVERSION OF INLAND PASSENGER VESSEL
PublicationThe development and growing availability of modern technologies, along with more and more severe environment protection standards which frequently take a form of legal regulations, are the reason why attempts are made to find a quiet and economical propulsion system not only for newly built watercraft units, but also for modernised ones. Correct selection of the propulsion and supply system for a given vessel affects significantly...
-
Numerical Study on Mitigation of Flow Maldistribution in Parallel Microchannel Heat Sink: Channels Variable Width Versus Variable Height Approach
PublicationMicrochannel heat sink on one hand enjoys benefits of intensified several folds heat transfer performance but on the other hand has to suffer aggravated form of trifling limitations associated with imperfect hydrodynamics and heat transfer behavior. Flow maldistribution is one of such limitation that exaggerates temperature nonuniformity across parallel microchannels leading to increase in maximum base temperature. Recently, variable...
-
Modeling energy consumption of parallel applications
PublicationThe paper presents modeling and simulation of energy consumption of two types of parallel applications: geometric Single Program Multiple Data (SPMD) and divide-and-conquer (DAC). Simulation is performed in a new MERPSYS environment. Model of an application uses the Java language with extension representing message exchange between processes working in parallel. Simulation is performed by running threads representing distinct process...
-
Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment
PublicationThe paper presents design, implementation and real life uses of a visualization subsystem for a distributed framework for parallelization of workflow-based computations among clusters with nodes that feature both CPUs and GPUs. Firstly, the proposed system presents a graphical view of the infrastructure with clusters, nodes and compute devices along with parameters and runtime graphs of load, memory available, fan speeds etc. Secondly,...
-
Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems
PublicationRapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...
-
Performance and Power-Aware Modeling of MPI Applications for Cluster Computing
PublicationThe paper presents modeling of performance and power consumption when running parallel applications on modern cluster-based systems. The model includes basic so-called blocks representing either computations or communication. The latter includes both point-to-point and collective communication. Real measurements were performed using MPI applications and routines run on three different clusters with both Infiniband and Gigabit Ethernet...
-
Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems
PublicationThis paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals,...
-
A Solution to Image Processing with Parallel MPI I/O and Distributed NVRAM Cache
PublicationThe paper presents a new approach to parallel image processing using byte addressable, non-volatile memory (NVRAM). We show that our custom built MPI I/O implementation of selected functions that use a distributed cache that incorporates NVRAMs located in cluster nodes can be used for efficient processing of large images. We demonstrate performance benefits of such a solution compared to a traditional implementation without NVRAM...
-
A New Approach for the Mitigating of Flow Maldistribution in Parallel Microchannel Heat Sink
PublicationThe problem of flow maldistribution is very critical in microchannel heat sinks (MCHS). It induces temperature nonuniformity, which may ultimately lead to the breakdown of associated system. In the present communication, a novel approach for the mitigation of flow maldistribution problem in parallel MCHS has been proposed using variable width microchannels. Numerical simulation of copper made parallel MCHS consisting of 25 channels...
-
Testing for conformance of parallel programming pattern languages
PublicationThis paper reports on the project being run by TUG and IMAG, aimed at reducing the volume of tests required to exercise parallel programming language compilers and libraries. The idea is to use the ISO STEP standard scheme for conformance testing of software products. A detailed example illustrating the ongoing work is presented.
-
Bounds on the Cover Time of Parallel Rotor Walks
PublicationThe rotor-router mechanism was introduced as a deterministic alternative to the random walk in undirected graphs. In this model, a set of k identical walkers is deployed in parallel, starting from a chosen subset of nodes, and moving around the graph in synchronous steps. During the process, each node maintains a cyclic ordering of its outgoing arcs, and successively propagates walkers which visit it along its outgoing arcs in...
-
Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations
PublicationBecause of high stability and slow unfolding rates of G-quadruplexes (G4), cells have evolved specialized helicases that disrupt these non-canonical DNA and RNA structures in an ATP-dependent manner. One example is DHX36, a DEAH-box helicase, which participates in gene expression and replication by recognizing and unwinding parallel G4s. Here, we studied the molecular basis for the high affinity and specificity of DHX36 for parallel-type...
-
Machine Learning in Multi-Agent Systems using Associative Arrays
PublicationIn this paper, a new machine learning algorithm for multi-agent systems is introduced. The algorithm is based on associative arrays, thus it becomes less complex and more efficient substitute of artificial neural networks and Bayesian networks, which is confirmed by performance measurements. Implementation of machine learning algorithm in multi-agent system for aided design of selected control systems allowed to improve the performance...
-
Acceleration of the discrete Green's function computations
PublicationResults of the acceleration of the 3-D discrete Green's function (DGF) computations on the multicore processor are presented. The code was developed in the multiple precision arithmetic with use of the OpenMP parallel programming interface. As a result, the speedup factor of three orders of magnitude compared to the previous implementation was obtained thus applicability of the DGF in FDTD simulations was significantly improved.
-
Propagation in rectangular waveguides with a pseudochiral Ω slab
PublicationThe transfer matrix approach is applied for analysis of waveguides loaded with a uniaxial pseudochiral Ω slab. In particular a pseudochiral parallel plate and rectangular guides are investigated. Based on the numerical analysis the influence of the pseudochirality on propagation characteristics and field distribution are examined. Other feature such as a field displacement phenomenon appearing in the both considered structures...
-
Three levels of fail-safe mode in MPI I/O NVRAM distributed cache
PublicationThe paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which...
-
Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge
PublicationAuto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...