Filters
total: 3465
filtered: 2751
-
Catalog
- Publications 2751 available results
- Journals 167 available results
- Conferences 115 available results
- Publishing Houses 1 available results
- People 88 available results
- Projects 4 available results
- e-Learning Courses 58 available results
- Events 2 available results
- Open Research Data 279 available results
Chosen catalog filters
displaying 1000 best results Help
Search results for: parallel computational implementation
-
Parallel implementation of a Sailing Assistance Application in a Cloud Environment
PublicationSailboat weather routing is a highly complex problem in terms of both the computational time and memory. The reason for this is a large search resulting in a multitude of possible routes and a variety of user preferences. Analysing all possible routes is only feasible for small sailing regions, low-resolution maps, or sailboat movements on a grid. Therefore, various heuristic approaches are often applied, which can find solutions...
-
From Sequential to Parallel Implementation of NLP Using the Actor Model
PublicationThe article focuses on presenting methods allowing easy parallelization of an existing, sequential Natural Language Processing (NLP) application within a multi-core system. The actor-based solution implemented with the Akka framework has been applied and compared to an application based on Task Parallel Library (TPL) and to the original sequential application. Architectures, data and control flows are described along with execution...
-
Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology
PublicationThe discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...
-
Implementation of FDTD-compatible Green's function on heterogeneous CPU-GPU parallel processing system
PublicationThis paper presents an implementation of the FDTD-compatible Green's function on a heterogeneous parallel processing system. The developed implementation simultaneously utilizes computational power of the central processing unit (CPU) and the graphics processing unit (GPU) to the computational tasks best suited to each architecture. Recently, closed-form expression for this discrete Green's function (DGF) was derived, which facilitates...
-
Parallel implementation of background subtraction algorithms for real-time video processing on a supercomputer platform
PublicationResults of evaluation of the background subtraction algorithms implemented on a supercomputer platform in a parallel manner are presented in the paper. The aim of the work is to chose an algorithm, a number of threads and a task scheduling method, that together provide satisfactory accuracy and efficiency of a real-time processing of high resolution camera images, maintaining the cost of resources usage at a reasonable level. Two...
-
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
PublicationWe report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...
-
Parallel Implementation of the Discrete Green's Function Formulation of the FDTD Method on a Multicore Central Processing Unit
PublicationParallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method was developed on a multicore central processing unit. DGF-FDTD avoids computations of the electromagnetic field in free-space cells and does not require domain termination by absorbing boundary conditions. Computed DGF-FDTD solutions are compatible with the FDTD grid enabling the perfect hybridization of FDTD...
-
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
PublicationIn the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...
-
Molecular Simulations Using Boltzmann’s Thermally Activated Diffusion - Implementation on ARUZ – Massively-parallel FPGA-based Machine
Publication -
Improved conformational space annealing method to treat β-structure with the UNRES force-field and to enhance scalability of parallel implementation
Publication -
Grid Implementation of a Parallel Multiobjective Genetic Algorithm for Optimized Allocation of Chlorination Stations in Drinking Water Distribution Systems: Chojnice Case Study
PublicationSolving multiobjective optimization problems requires suitable algorithms to find a satisfactory approximation of a globally optimal Pareto front. Furthermore, it is a computationally demanding task. In this paper, the grid implementation of a distributed multiobjective genetic algorithm is presented. The distributed version of the algorithm is based on the island algorithm with forgetting island elitism used instead of a genetic...
-
Advanced Control With PLC—Code Generator for aMPC Controller Implementation and Cooperation With External Computational Server for Dealing With Multidimensionality, Constraints and LMI Based Robustness
PublicationThe manufacturers of Programmable Logic Controllers (PLC) usually equip their products with extremely simple control algorithms, such as PID and on-off regulators. However, modern PLCs have much more efficient processors and extensive memory, which enables implementing more sophisticated controllers. The paper discusses issues related to the implementation of matrix operations, time limitations for code execution within one PLC...
-
Multiprocessor Implementation of Parallel Multiobjective Genetic Algorithm for Optimized Allocation of Chlorination Stations in Drinking Water Distribution System a New Water Quality Model Approach
PublicationThe Critical Infrastructure Systems (CISs) have received in recent years a considerable attention due to their heavy impact on sustainable development of modern societies. Most CISs may be classied as large scale complex systems of network structure, in uenced by strong interactions form the surrounding environment, internal and external interconnections. The later is a result of inter-CIS dependencies. The control, monitoring...
-
Multiprocessor implementation of Parallel Multiobjective Genetic Algorithm for Optimized Allocation of Chlorination Stations in Drinking Water Distribution System - a new water quality model approach
Publication -
Implementation of Molecular Dynamics and Its Extensions with the Coarse-Grained UNRES Force Field on Massively Parallel Systems: Toward Millisecond-Scale Simulations of Protein Structure, Dynamics, and Thermodynamics
Publication -
Architecture and implementation of distributed data storage using Web Services, CORBA i PVM. W: Proceedings. PPAM 2003. Parallel Processing and Applied Mathematics. Fifth International Conference. Częstochowa, 7-10 September 2003. Architektura i implementacja rozproszonego zarządzania danymi używając systemów Web Services, CORBA i PVN.
PublicationProponujemy architekturę i jej implementację PVMWeb Cluster I/O przeznaczoną do rozproszonego zarządzania danymi. Dane zapisywane są w systemie Web Services z geograficznie odległych klientów lub przez wywołania CORBA z wewnątrz danego klastra co oferuje lepsze osiągi.
-
The chapter analyses the K-Means algorithm in its parallel setting. We provide detailed description of the algorithm as well as the way we paralellize the computations. We identified complexity of the particular steps of the algorithm that allows us to build the algorithm model in MERPSYS system. The simulations with the MERPSYS have been performed for different size of the data as well as for different number of the processors used for the computations. The results we got using the model have been compared to the results obtained from real computational environment.
PublicationThe chapter analyses the K-Means algorithm in its parallel setting. We provide detailed description of the algorithm as well as the way we paralellize the computations. We identified complexity of the particular steps of the algorithm that allows us to build the algorithm model in MERPSYS system. The simulations with the MERPSYS have been performed for different size of the data as well as for different number of the processors used...
-
Simulation of parallel similarity measure computations for large data sets
PublicationThe paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components...
-
Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi
PublicationParallel algorithms are popular method of increasing system performance. Apart from showing their properties using asymptotic analysis, proof-of-concept implementation and practical experiments are often required. In order to speed up the development and provide simple and easily accessible testing environment that enables execution of reliable experiments, the paper proposes a platform with multi-core computational accelerator:...
-
Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment
PublicationThe paper presents design, implementation and real life uses of a visualization subsystem for a distributed framework for parallelization of workflow-based computations among clusters with nodes that feature both CPUs and GPUs. Firstly, the proposed system presents a graphical view of the infrastructure with clusters, nodes and compute devices along with parameters and runtime graphs of load, memory available, fan speeds etc. Secondly,...
-
A distributed system for conducting chess games in parallel
PublicationThis paper proposes a distributed and scalable cloud based system designed to play chess games in parallel. Games can be played between chess engines alone or between clusters created by combined chess engines. The system has a built-in mechanism that compares engines, based on Elo ranking which finally presents the strength of each tested approach. If an approach needs more computational power, the design of the system allows...
-
GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM
PublicationThis paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...
-
Computationally Effcient Solution of a 2D Diffusive Wave Equation Used for Flood Inundation Problems
PublicationThis paper presents a study dealing with increasing the computational efficiency in modeling floodplain inundation using a two-dimensional diffusive wave equation. To this end, the domain decomposition technique was used. The resulting one-dimensional diffusion equations were approximated in space with the modified finite element scheme, whereas time integration was carried out using the implicit two-level scheme. The proposed...
-
Acceleration of Electromagnetic Simulations on Reconfigurable FPGA Card
PublicationIn this contribution, the hardware acceleration of electromagnetic simulations on the reconfigurable field-programmable-gate-array (FPGA) card is presented. In the developed implementation of scientific computations, the matrix-assembly phase of the method of moments (MoM) is accelerated on the Xilinx Alveo U200 card. The computational method involves discretization of the frequency-domain mixed potential integral equation using...
-
KernelHive: a new workflow-based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs
PublicationThe paper presents a new open-source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available....
-
A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM
PublicationThe paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...
-
Modeling Parallel Applications in the MERPSYS Environment
PublicationThe chapter presents how to model parallel computational applications for which simulation of execution in a large-scale parallel or distributed environment is performed within the MERPSYS environment. Specifically, it is shown what approaches can be adopted to model key paradigms often used for parallel applications: master-slave, geometric parallelism (single program multiple data), pipelined and divide-and-conquer applications....
-
Parallel Computations of Text Similarities for Categorization Task
PublicationIn this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....
-
Block-based Representation of Application Execution on Modern Parallel Systems
PublicationThe chapter presents how to model execution of a parallel computational application that is to be executed in a large-scale parallel or distributed environment with potentially thousands to millions of execution units. The representation uses pre- viously attributes and factors representative of modern high performance systems including multicore CPUs, GPUs, dedicated accelerators such as Intel Phi.
-
Multi-agent large-scale parallel crowd simulation
PublicationThis paper presents design, implementation and performance results of a new modular, parallel, agent-based and large scale crowd simulation environment. A parallel application, implemented with C and MPI, was implemented and run in this parallel environment for simulation and visualization of an evacuation scenario at Gdansk University of Technology, Poland and further in the area of districts of Gdansk. The application uses a...
-
Performance evaluation of parallel background subtraction on GPU platforms
PublicationImplementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...
-
A Solution to Image Processing with Parallel MPI I/O and Distributed NVRAM Cache
PublicationThe paper presents a new approach to parallel image processing using byte addressable, non-volatile memory (NVRAM). We show that our custom built MPI I/O implementation of selected functions that use a distributed cache that incorporates NVRAMs located in cluster nodes can be used for efficient processing of large images. We demonstrate performance benefits of such a solution compared to a traditional implementation without NVRAM...
-
Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform
PublicationPerformance evaluation of selected complex video processing algorithms, implemented on a parallel, embedded GPU platform Tegra X1, is presented. Three algorithms were chosen for evaluation: a GMM-based object detection algorithm, a particle filter tracking algorithm and an optical flow based algorithm devoted to people counting in a crowd flow. The choice of these algorithms was based on their computational complexity and parallel...
-
NVRAM as Main Storage of Parallel File System
PublicationModern cluster environments' main trouble used to be lack of computational power provided by CPUs and GPUs, but recently they suffer more and more from insufficient performance of input and output operations. Apart from better network infrastructure and more sophisticated processing algorithms, a lot of solutions base on emerging memory technologies. This paper presents evaluation of using non-volatile random-access memory as a...
-
Modeling energy consumption of parallel applications
PublicationThe paper presents modeling and simulation of energy consumption of two types of parallel applications: geometric Single Program Multiple Data (SPMD) and divide-and-conquer (DAC). Simulation is performed in a new MERPSYS environment. Model of an application uses the Java language with extension representing message exchange between processes working in parallel. Simulation is performed by running threads representing distinct process...
-
Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems
PublicationRapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...
-
Fast implementation of FDTD-compatible green's function on multicore processor
PublicationIn this letter, numerically efficient implementation of the finite-difference time domain (FDTD)-compatible Green's function on a multicore processor is presented. Recently, closed-form expression of this discrete Green's function (DGF) was derived, which simplifies its application in the FDTD simulations of radiation and scattering problems. Unfortunately, the new DGF expression involves binomial coefficients, whose computations...
-
Parallel Background Subtraction in Video Streams Using OpenCL on GPU Platforms
PublicationImplementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU architecture, such as memory access,...
-
Performance Assessment of Using Docker for Selected MPI Applications in a Parallel Environment Based on Commodity Hardware
PublicationIn the paper, we perform detailed performance analysis of three parallel MPI applications run in a parallel environment based on commodity hardware, using Docker and bare-metal configurations. The testbed applications are representative of the most typical parallel processing paradigms: master–slave, geometric Single Program Multiple Data (SPMD) as well as divide-and-conquer and feature characteristic computational and communication...
-
Acceleration of the DGF-FDTD method on GPU using the CUDA technology
PublicationWe present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...
-
Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption
PublicationMany important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various characteristics. Utilizing full power of such systems requires programming parallel applications that are hybrid in two meanings: they can utilize parallelism on multiple levels at the same time and combine together programming interfaces...
-
Numerical Study on Mitigation of Flow Maldistribution in Parallel Microchannel Heat Sink: Channels Variable Width Versus Variable Height Approach
PublicationMicrochannel heat sink on one hand enjoys benefits of intensified several folds heat transfer performance but on the other hand has to suffer aggravated form of trifling limitations associated with imperfect hydrodynamics and heat transfer behavior. Flow maldistribution is one of such limitation that exaggerates temperature nonuniformity across parallel microchannels leading to increase in maximum base temperature. Recently, variable...
-
An Implementation of a Compact Smart Resistive Sensor Based on a Microcontroller with an Internal ADC
PublicationIn the paper a new implementation of a compact smart resistive sensor based on a microcontroller with internal ADCs is proposed and analysed. The solution is based only on a (already existing in the system) microcontroller and a simple sensor interface circuit working as a voltage divider consisting of a reference resistor and the resistive sensor connected in parallel with an interference suppression capacitor. The measurement...
-
A Parallel Genetic Algorithm for Creating Virtual Portraits of Historical Figures
PublicationIn this paper we present a genetic algorithm (GA) for creating hypothetical virtual portraits of historical figures and other individuals whose facial appearance is unknown. Our algorithm uses existing portraits of random people from specific historical period and social background to evolve a set of face images potentially resembling the person whose image is to be found. We then use portraits of the person's relatives to judge...
-
Energy consumption optimization in wastewater treatment plants: Machine learning for monitoring incineration of sewage sludge
PublicationBiomass management in terms of energy consumption optimization has become a recent challenge for developed countries. Nevertheless, the multiplicity of materials and operating parameters controlling energy consumption in wastewater treatment plants necessitates the need for sophisticated well-organized disciplines in order to minimize energy consumption and dissipation. Sewage sludge (SS) disposal management is the key stage of...
-
A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages
PublicationThese days, a lot of crime-related events take place all over the world. Most of them are reported in news portals and social media. Crime-related event extraction from the published texts can allow monitoring, analysis, and comparison of police or criminal activities in different countries or regions. Existing approaches to event extraction mainly suggest processing texts in English, French, Chinese, and some other resource-rich...
-
50’ Sail Catamaran with Hybrid Propulsion, Design, Theoretical and Experimental Studies
PublicationThe development of modern lithium batteries and propulsion systems now allows the use of complex propulsion systems for vessels of various sizes. As part of the research and implementation project, a parallel hybrid drive system was designed, built and then tested in the laboratory. The experimental studies conducted allowed for the measurements of power, fuel consumption and electric power distribution in various operating modes...
-
DL_MG: A Parallel Multigrid Poisson and Poisson–Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution
PublicationThe solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential -- a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the...
-
Computer experiments with a parallel clonal selection algorithm for the graph coloring problem
PublicationArtificial immune systems (AIS) are algorithms that are based on the structure and mechanisms of the vertebrate immune system. Clonal selection is a process that allows lymphocytes to launch a quick response to known pathogens and to adapt to new, previously unencountered ones. This paper presents a parallel island model algorithm based on the clonal selection principles for solving the Graph Coloring Problem. The performance of...
-
Acceleration of the discrete Green's function computations
PublicationResults of the acceleration of the 3-D discrete Green's function (DGF) computations on the multicore processor are presented. The code was developed in the multiple precision arithmetic with use of the OpenMP parallel programming interface. As a result, the speedup factor of three orders of magnitude compared to the previous implementation was obtained thus applicability of the DGF in FDTD simulations was significantly improved.