Filters
total: 448
filtered: 425
Search results for: PARALLEL PROCESSING
-
Controlled grafting of vinylic monomers on polyolefins: a robust mathematical modeling approach
PublicationExperimental and mathematical modeling analyses were used for controlling melt free-radical grafting of vinylic monomers on polyolefins and, thereby, reducing the disturbance of undesired cross-linking of polyolefins. Response surface, desirability function, and artificial intelligence methodologies were blended to modeling/optimization of grafting reaction in terms of vinylic monomer content, peroxide initiator concentration,...
-
FPGA Acceleration of Matrix-Assembly Phase of RWG-Based MoM
PublicationIn this letter, the field-programmable-gate-array accelerated implementation of matrix-assembly phase of the method of moments (MoM) is presented. The solution is based on a discretization of the frequency-domain mixed potential integral equation using the Rao-Wilton-Glisson basis functions and their extension to wire-to-surface junctions. To take advantage of the given hardware resources (i.e., Xilinx Alveo U200 accelerator card),...
-
Parallelization of Selected Algorithms on Multi-core CPUs, a Cluster and in a Hybrid CPU+Xeon Phi Environment
PublicationIn the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The parallel codes, implementing the master-slave model in C+MPI, differ in computation to communication ratios. The considered problems include: a genetic algorithm with various ratios of master processing time to communication...
-
Performance Analysis of the OpenCL Environment on Mobile Platforms
PublicationToday’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...
-
Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method
PublicationThis paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...
-
Behavior Analysis and Dynamic Crowd Management in Video Surveillance System
PublicationA concept and practical implementation of a crowd management system which acquires input data by the set of monitoring cameras is presented. Two leading threads are considered. First concerns the crowd behavior analysis. Second thread focuses on detection of a hold-ups in the doorway. The optical flow combined with soft computing methods (neural network) is employed to evaluate the type of crowd behavior, and fuzzy logic aids detection...
-
Pipelined Two-Operand Modular Adders
PublicationPipelined two-operand modular adder (TOMA) is one of basic components used in digital signal processing (DSP) systems that use the residue number system (RNS). Such modular adders are used in binary/residue and residue/binary converters, residue multipliers and scalers as well as within residue processing channels. The structure of pipelined TOMAs is usually obtained by inserting an appropriate number of pipeline register layers within...
-
Three levels of fail-safe mode in MPI I/O NVRAM distributed cache
PublicationThe paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which...
-
Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology
PublicationThe discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...
-
Massively parallel linear-scaling Hartree–Fock exchange and hybrid exchange–correlation functionals with plane wave basis set accuracy
PublicationWe extend our linear-scaling approach for the calculation of Hartree–Fock exchange energy using localized in situ optimized orbitals [Dziedzic et al., J. Chem. Phys. 139, 214103 (2013)] to leverage massive parallelism. Our approach has been implemented in the ONETEP (Order-N Electronic Total Energy Package) density functional theory framework, which employs a basis of non-orthogonal generalized Wannier functions (NGWFs) to achieve...
-
Shared processor scheduling
PublicationWe study the shared processor scheduling problem with a single shared processor to maximize total weighted overlap, where an overlap for a job is the amount of time it is processed on its private and shared processor in parallel. A polynomial-time optimization algorithm has been given for the problem with equal weights in the literature. This paper extends that result by showing an (log)-time optimization algorithm for a class...
-
GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM
PublicationThis paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...
-
Acceleration of Electromagnetic Simulations on Reconfigurable FPGA Card
PublicationIn this contribution, the hardware acceleration of electromagnetic simulations on the reconfigurable field-programmable-gate-array (FPGA) card is presented. In the developed implementation of scientific computations, the matrix-assembly phase of the method of moments (MoM) is accelerated on the Xilinx Alveo U200 card. The computational method involves discretization of the frequency-domain mixed potential integral equation using...
-
Video Analytics-Based Algorithm for Monitoring Egress from Buildings
PublicationA concept and practical implementation of the algorithm for detecting of potentially dangerous situations of crowding in passages is presented. An example of such situation is a crush which may be caused by obstructed pedestrian pathway. Surveillance video camera signal analysis performed on line is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of implemented algorithm which uses...
-
A self-optimization mechanism for generalized adaptive notch smoother
PublicationTracking of nonstationary narrowband signals is often accomplished using algorithms called adaptive notch filters (ANFs). Generalized adaptive notch smoothers (GANSs) extend the concepts of adaptive notch filtering in two directions. Firstly, they are designed to estimate coefficients of nonstationary quasi-periodic systems, rather than signals. Secondly, they employ noncausal processing, which greatly improves their accuracy and...
-
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
PublicationIn this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....
-
Tuning matrix-vector multiplication on GPU
PublicationA matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
-
Parallel frequency tracking with built-in performance evaluation
PublicationThe problem of estimation of instantaneous frequency of a nonstationary complex sinusoid (cisoid) buried in wideband noise is considered. The proposed approach employs a bank of adaptive notch filters, extended with a nontrivial performance assessment mechanism which automatically chooses the best performing filter in the bank. Additionally, a computationally attractive method of implementing the bank is proposed. The new structure...
-
A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM
PublicationThe paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...
-
Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs
PublicationThis paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...
-
Online sound restoration system for digital library applications
PublicationAudio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
-
Total Completion Time Minimization for Scheduling with Incompatibility Cliques
PublicationThis paper considers parallel machine scheduling with incompatibilities between jobs. The jobs form a graph equivalent to a collection of disjoint cliques. No two jobs in a clique are allowed to be assigned to the same machine. Scheduling with incompatibilities between jobs represents a well-established line of research in scheduling theory and the case of disjoint cliques has received increasing attention in recent...
-
Thermal and technological aspects of double face grinding of Al2O3 ceramic materials
PublicationDouble face grinding with planetary kinematics is a process to manufacture workpieces with plan parallel functional surfaces, such as bearing rings or sealing shims. In order to increase the economic efficiency of this process, it has to be advanced permanently. The temperature in the contact zone of most grinding processes has a huge influence on the process efficiency and the workpiece qualities. In contrast to most grinding...
-
Thermal and technological aspects of double face grinding of C45 carbon steel
PublicationIn grinding, the contact zone temperature is a decisive factor influencing the achievable surface quality and the grinding tool wear. In contrast to other grinding processes, only few information regarding double face grinding with planetary kinematics when processing steel is known up to date. Since the successive substitution of in-dustrial double-sided lapping processes by double-sided grinding, it has become necessary to the...
-
An Ultra-Low-Energy Analog Comparator for A/D Converters in CMOS Image Sensors
PublicationThis paper proposes a new solution of an ultra-low-energy analog comparator, dedicated to slope analog-to-digital converters (ADC), particularly suited for CMOS image sensors (CISs) featuring a large number of ADCs. For massively parallel imaging arrays, this number may be as high as tens-hundreds of thousands ADCs. As each ADC includes an analog comparator, the number of these comparators in CIS is always high. Detailed analysis...
-
Online sound restoration system for digital library applications.
PublicationAudio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
-
Further Developments of the Online Sound Restoration System for Digital Library Applications
PublicationNew signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...
-
Identification of nonstationary multivariate autoregressive processes– Comparison of competitive and collaborative strategies for joint selection of estimation bandwidth and model order
PublicationThe problem of identification of multivariate autoregressive processes (systems or signals) with unknown and possibly time-varying model order and time-varying rate of parameter variation is considered and solved using parallel estimation approach. Under this approach, several local estimation algorithms, with different order and bandwidth settings, are run simultaneously and compared based on their predictive performance. First,...
-
Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs
PublicationThe paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation,...
-
Locally Adaptive Cooperative Kalman Smoothing and Its Application to Identification of Nonstationary Stochastic Systems
PublicationOne of the central problems of the stochastic approximation theory is the proper adjustment of the smoothing algorithm to the unknown, and possibly time-varying, rate and mode of variation of the estimated signals/parameters. In this paper we propose a novel locally adaptive parallel estimation scheme which can be used to solve the problem of fixed-interval Kalman smoothing in the presence of model uncertainty. The proposed solution...
-
Fire Protection and Materials Flammability Control by Artificial Intelligence
PublicationFire safety has become a major challenge of materials developers because of the massive production of organic materials, often combustibles, and their use for different purposes. In this sense, fire safety is critically considered in the development of engineering materials [1, 2]. The multiplicity of parameters contributing to the development of formulation of flame-retardant materials from one side and the sustainability concerns...
-
L1 Cell Adhesion Molecule Overexpression Down Regulates Phosphacan and Up Regulates Structural Plasticity-Related Genes Rostral and Caudal to the Complete Spinal Cord Transection
PublicationL1 cell adhesion molecule (L1CAM) supports spinal cord cellular milieu after contusion and compression lesions, contributing to neuroprotection, promoting axonal outgrowth, and reducing outgrowth-inhibitory molecules in lesion proximity. We extended investigations into L1CAM molecular targets and explored long-distance effects of L1CAM rostral and caudal to complete spinal cord transection (SCT) in...
-
Modeling of Passive and Forced Convection Heat Transfer in Channels with Rib Turbulators
PublicationThe main goal of the research presented in this paper was the experimental and numerical analysis of heat enhancement and aerodynamic phenomena during air flow in a channel equipped with flow turbulators in the form of properly configured ribs. The use of ribs intensifies the heat transfer and at the same time increases not only the flow resistance but also the energy costs. Therefore, designing modern heat exchangers with optimal...
-
Modeling Parallel Applications in the MERPSYS Environment
PublicationThe chapter presents how to model parallel computational applications for which simulation of execution in a large-scale parallel or distributed environment is performed within the MERPSYS environment. Specifically, it is shown what approaches can be adopted to model key paradigms often used for parallel applications: master-slave, geometric parallelism (single program multiple data), pipelined and divide-and-conquer applications....
-
Multi-agent large-scale parallel crowd simulation
PublicationThis paper presents design, implementation and performance results of a new modular, parallel, agent-based and large scale crowd simulation environment. A parallel application, implemented with C and MPI, was implemented and run in this parallel environment for simulation and visualization of an evacuation scenario at Gdansk University of Technology, Poland and further in the area of districts of Gdansk. The application uses a...
-
Block-based Representation of Application Execution on Modern Parallel Systems
PublicationThe chapter presents how to model execution of a parallel computational application that is to be executed in a large-scale parallel or distributed environment with potentially thousands to millions of execution units. The representation uses pre- viously attributes and factors representative of modern high performance systems including multicore CPUs, GPUs, dedicated accelerators such as Intel Phi.
-
Simulation of parallel similarity measure computations for large data sets
PublicationThe paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components...
-
Identification of nonstationary processes using noncausal bidirectional lattice filtering
PublicationThe problem of off-line identification of a nonstationary autoregressive process with a time-varying order and a time-varying degree of nonstationarity is considered and solved using the parallel estimation approach. The proposed parallel estimation scheme is made up of several bidirectional (noncausal) exponentially weighted lattice algorithms with different estimation memory and order settings. It is shown that optimization of...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublicationThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
Rigid finite elements and multibody modeling in analyses of a robot shaped elastic/plastic deformations of a beam
PublicationDynamics analysis of a system composed of a parallel manipulator and of an elastic beam is presented in the paper. Classic 3RRR parallel manipulator is considered and used to deform the beam. Elasto-plastic deformations are investigated. Rigid-finite-elements technique is employed to deal with dynamics of the beam. A multibody structure is associated with the introduced hybrid system in order to model its dynamics. Idea of the...
-
Multipulse inverter structures with low voltage distortion
PublicationA novel approach to the voltage source inverters (VSI) construction is presented in the paper. The invented inverter structures allow to operate several DC/AC converters in parallel resulting in lower voltage distortions at extremely low switching frequency. The research presented in the paper describes such a parallel operation of the VSI’s which is possible thanks to the use of coupled inductors. The eighteen-pulse three-level...
-
Computer experiments with a parallel clonal selection algorithm for the graph coloring problem
PublicationArtificial immune systems (AIS) are algorithms that are based on the structure and mechanisms of the vertebrate immune system. Clonal selection is a process that allows lymphocytes to launch a quick response to known pathogens and to adapt to new, previously unencountered ones. This paper presents a parallel island model algorithm based on the clonal selection principles for solving the Graph Coloring Problem. The performance of...
-
Executing Multiple Simulations in the MERPSYS Environment
PublicationThe chapter investigates the steps necessary to perform a simulation instance in the MERPSYS environment and discusses potential limitations in case when vast numbers of simulations are required. An extended architecture is proposed which includes a JMS-based simulation queue and multiple distributed simulators, overcoming the potential bottlenecks. The chapter introduces also methods for preparing suites of multiple simulations...
-
Effective configuration of a double triad planar parallel manipulator for precise positioning of heavy details during their assembling process
PublicationIn the paper, dynamics analysis of a parallel manipulator is presented. It is an atypical manipulator, devoted to help in assembling of heavy industrial constructions. Few atypical properties are required: small workspace; slow velocities; high loads. Initially, a short discussion about definition of the parallel manipulators is presented, as well as the sketch of the proposed structure. In parallel, some definitions, assumptions...
-
Modular multipulse voltage source inverters with integrating coupled reactors
PublicationA novel approach to the voltage source inverters (VSI) construction is presented in the paper. The invented inverter structures allow to operate several DC/AC converters in parallel resulting in lower voltage distortions at extremely low switching frequency. The research presented in the paper describes such a parallel operation of the VSI’s which is possible thanks to the use of coupled inductors. The eighteen-pulse and twenty-four-pulse...
-
Scalable Measurement System for Multiple Impedance Gas Sensors
PublicationAuthor proposes scalable architecture of the measurement system for gas sensor with impedance dependance of the gas concentration. The main part of the system is a single-board impedance analyser. The number of analysers working in parallel can be configured according to specific application. The system is controlled by a single computer which organises the measurement cycle and store the acquired measurement data. The system is...
-
50’ Sail Catamaran with Hybrid Propulsion, Design, Theoretical and Experimental Studies
PublicationThe development of modern lithium batteries and propulsion systems now allows the use of complex propulsion systems for vessels of various sizes. As part of the research and implementation project, a parallel hybrid drive system was designed, built and then tested in the laboratory. The experimental studies conducted allowed for the measurements of power, fuel consumption and electric power distribution in various operating modes...
-
Self-optimizing generalized adaptive notch filters - comparison of three optimization strategies
PublicationThe paper provides comparison of three different approaches to on-line tuning of generalized adaptive notch filters (GANFs) the algorithms used for identification/tracking of quasi-periodically varying dynamic systems. Tuning is needed to adjust adaptation gains, which control tracking performance of ANF algorithms, to the unknown and/or time time-varying rate of system nonstationarity. Two out ofthree compared approaches are classical...
-
Low-Power Receivers for Wireless Capacitive Coupling Transmission in 3-D-Integrated Massively Parallel CMOS Imager
PublicationThe paper presents pixel receivers for massively parallel transmission of video signal between capacitive coupled integrated circuits (ICs). The receivers meet the key requirements for massively parallel transmission, namely low-power consumption below a single μW, small area of less than 205 μm2, high sensitivity better than 160 mV, and good immunity to crosstalk. The receivers were implemented and measured in a 3-D IC (two face-to-face...
-
A Parallel Genetic Algorithm for Creating Virtual Portraits of Historical Figures
PublicationIn this paper we present a genetic algorithm (GA) for creating hypothetical virtual portraits of historical figures and other individuals whose facial appearance is unknown. Our algorithm uses existing portraits of random people from specific historical period and social background to evolve a set of face images potentially resembling the person whose image is to be found. We then use portraits of the person's relatives to judge...