Filters
total: 77
filtered: 74
Chosen catalog filters
Search results for: SPARSE MATRIX TIMES VECTOR MULTIPLICATION (SPMV)
-
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
PublicationIn this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....
-
Tuning matrix-vector multiplication on GPU
PublicationA matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
-
A memory efficient and fast sparse matrix vector product on a Gpu
PublicationThis paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...
-
Sparse vector autoregressive modeling of audio signals and its application to the elimination of impulsive disturbances
PublicationArchive audio files are often corrupted by impulsive disturbances, such as clicks, pops and record scratches. This paper presents a new method for elimination of impulsive disturbances from stereo audio signals. The proposed approach is based on a sparse vector autoregressive signal model, made up of two components: one taking care of short-term signal correlations, and the other one taking care of long-term correlations. The method...
-
Benchmarking overlapping communication and computations with multiple streams for modern GPUs
PublicationThe paper presents benchmarking a multi-stream application processing a set of input data arrays. Tests have been performed and execution times measured for various numbers of streams and various compute intensities measured as the ratio of kernel compute time and data transfer time. As such, the application and benchmarking is representative of frequently used operations such as vector weighted sum, matrix multiplication etc....
-
Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
PublicationIn this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...
-
Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA
PublicationLarge-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems is challenging due to a large memory footprint, sparse computational patterns, and irregular memory accesses with poor locality. Intel’s Programmable Integrated Unffied Memory Architecture (PIUMA) is designed to address these challenges for graph analytics. In this paper, a detailed characterization of GCNs is presented using the Open-Graph Benchmark...
-
Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
PublicationThis letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow...
-
Parallelization of Selected Algorithms on Multi-core CPUs, a Cluster and in a Hybrid CPU+Xeon Phi Environment
PublicationIn the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The parallel codes, implementing the master-slave model in C+MPI, differ in computation to communication ratios. The considered problems include: a genetic algorithm with various ratios of master processing time to communication...
-
Relativity of arithmetic as a fundamental symmetry of physics
PublicationArithmetic operations can be defined in various ways, even if one assumes commutativity and associativity of addition and multiplication, and distributivity of multiplication with respect to addition. In consequence, whenever one encounters ‘plus’ or ‘times’ one has certain freedom of interpreting this operation. This leads to some freedom in definitions of derivatives, integrals and, thus, practically all equations occurring in...
-
A graph coloring approach to scheduling of multiprocessor tasks on dedicated machines with availability constraints
PublicationWe address a generalization of the classical 1- and 2-processor unit execution time scheduling problem on dedicated machines. In our chromatic model of scheduling machines have non-simultaneous availability times and tasks have arbitrary release times and due dates. Also, the versatility of our approach makes it possible to generalize all known classical criteria of optimality. Under these stipulations we show that the problem...
-
A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM
PublicationThe paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...
-
Finite element matrix generation on a GPU
PublicationThis paper presents an efficient technique for fast generation of sparse systems of linear equations arising in computational electromagnetics in a finite element method using higher order elements. The proposed approach employs a graphics processing unit (GPU) for both numerical integration and matrix assembly. The performance results obtained on a test platform consisting of a Fermi GPU (1x Tesla C2075) and a CPU (2x twelve-core...
-
GPU-accelerated finite element method
PublicationIn this paper the results of the acceleration of computations involved in analysing electromagnetic problems by means of the finite element method (FEM), obtained with graphics processors (GPU), are presented. A 4.7-fold acceleration was achieved thanks to the massive parallelization of the most time-consuming steps of FEM, namely finite-element matrix-generation and the solution of a sparse system of linear equations with the...
-
Reduction of Computational Complexity in Simulations of the Flow Process in Transmission Pipelines
PublicationThe paper addresses the problem of computational efficiency of the pipe-flow model used in leak detection and identification systems. Analysis of the model brings attention to its specific structure, where all matrices are sparse. With certain rearrangements, the model can be reduced to a set of equations with tridiagonal matrices. Such equations can be solved using the Thomas algorithm. This method provides almost the same values...
-
GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method
PublicationThe letter discusses a fast implementation of the conjugate gradient iterative method with ${rm E}$-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results...
-
Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation
PublicationThis paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral meshes that lead to sparse matrices too large to be dealt with on a typical workstation using direct methods. We discuss the problem of rapid matrix generation and assembly, as well as...
-
Application of Barycentric Coordinates in Space Vector PWM Computations
PublicationThis paper proposes the use of barycentric coordinates in the development and implementationof space-vector pulse-width modulation (SVPWM) methods, especially for inverters with deformed space-vector diagrams. The proposed approach is capable of explicit calculation of vector duty cycles, independentof whether they assume ideal positions or are displaced due to the DC-link voltage imbalance. The use ofbarycentric coordinates also...
-
Jacobi and gauss-seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method
PublicationIn this paper two implementations of iterative solvers for solving complex symmetric and sparse systems resulting from finite element method applied to wave equation are discussed. The problem under investigation is a dielectric resonator antenna (DRA) discretized by FEM with vector elements of the second order (LT/QN). The solvers use the preconditioned conjugate gradient (pcg) method implemented on Graphics Processing Unit (GPU)...
-
A vector-enzymatic DNA fragment amplification-expression technology for construction of artificial, concatemeric DNA, RNA and proteins for novel biomaterials, biomedical and industrial applications
PublicationA DNA fragment amplification/expression technology for the production of new generation biomaterials for scientific, industrial and biomedical applications is described. The technology enables the formation of artificial Open Reading Frames (ORFs) encoding concatemeric RNAs and proteins. It recruits the Type IIS SapI restriction endonuclease (REase) for an assembling of DNA fragments in an ordered head-to-tail-orientation. The...
-
Geometric analogue of holographic reduced representation
PublicationHolographic reduced representations (HRRs) are distributed representations of cognitive structuresbased on superpositions of convolution-bound n-tuples. Restricting HRRs to n-tuples consisting of 1,one reinterprets the variable binding as a representation of the additive group of binary n-tupleswith addition modulo 2. Since convolutions are not defined for vectors, the HRRs cannot be directlyassociated with geometric structures....
-
A comparison of geometric analogues of holographic reduced representations, original holographic reduced representations and binary spatter codes
PublicationGeometric Analogues of Holographic Reduced Representations (GA HRR) employ role-filler binding based on geometric products. Atomic objects are real-valued vectors in n-dimensional Euclidean space and complex statements belong to a hierarchy of multivectors. The paper reports a battery of tests aimed at comparison of GA HRR with Holographic Reduced Representation (HRR) and Binary Spatter Codes (BSC). Firstly, we perform a test of...
-
Generation of large finite-element matrices on multiple graphics processors
PublicationThis paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...
-
Compressive Sensing Approach to Harmonics Detection in the Ship Electrical Network
PublicationThe contribution of this paper is to show the opportunities for using the compressive sensing (CS) technique for detecting harmonics in a frequency sparse signal. The signal in a ship’s electrical network, polluted by harmonic distortions, can be modeled as a superposition of a small number of sinusoids and the discrete Fourier transform (DFT) basis forms its sparse domain. According to the theory of CS, a signal may be reconstructed...
-
Three solvers for MIMO noise radar clutter cancellation - a performance comparison
PublicationThe problem of canceling strong clutter echos in a MIMO noise radar is considered. Execution times of three algorithms is compared. The first solution is a standard Least Squares approach employing Cholesky decomposition of the transmitted signal sample autocorrelation matrix. The second approach is based on careful waveform design which guarantees that the signal sample autocorrelation matrix has Toeplitz structure. This enables...
-
Accuracy, Memory and Speed Strategies in GPU-based Finite-Element Matrix-Generation
PublicationThis paper presents strategies on how to optimize GPU-based finite-element matrix-generation that occurs in the finite-element method (FEM) using higher order curvilinear elements. The goal of the optimization is to increase the speed of evaluation and assembly of large finite-element matrices on a single GPU (Graphics Processing Unit) while maintaining the accuracy of numerical integration at the desired level. For this reason,...
-
Convergence to equilibrium under a random Hamiltonian
PublicationWe analyze equilibration times of subsystems of a larger system under a random total Hamiltonian, in which the basis of the Hamiltonian is drawn from the Haar measure. We obtain that the time of equilibration is of the order of the inverse of the arithmetic average of the Bohr frequencies. To compute the average over a random basis, we compute the inverse of a matrix of overlaps of operators which permute four systems. We first...
-
Matrix-based robust joint fingerprinting and decryption method for multicast distribution of multimedia
PublicationThis paper addresses the problem of unauthorized redistribution of multimedia content by malicious users (pirates). The solution proposed here is a new joint fingerprinting and decryption method which meets the requirements for both imperceptibility and robustness of fingerprints and scalability in terms of design and distribution of fingerprinted multimedia content. The proposed method uses a simple block cipher based on matrix...
-
ZASTOSOWANIE MACIERZY PSEUDO ODWROTNEJ W METODACH ALOKACJI PĘDNIKÓW UKŁADU DYNAMICZNEGO POZYCJONOWANIA STATKU
PublicationUkłady kontroli alokacji pędników stanowią ważną część systemów dynamicznego pozycjonowania na statku. Określają one sygnały sterujące nastawami pędników, na podstawie uogólnionego wektora sił wzdłużnej, poprzecznej i momentu skręcającego, uzyskiwanych z prawa sterowania. W artykule przedstawiono wybrane algorytmy kontroli alokacji pędników, różniące się sposobem wyznaczania macierzy pseudo odwrotnej oraz algorytm bezpośredniej...
-
Study of the Effect of Filling Thermoplastic Medical Polyurethane with PVA, PLA or Diatomite on the Relaxation Times Distributions of 1H NMR
PublicationIn this work, to characterize the mobility of diferent sections of the macromole‑ cules of polyurethane (PUR), polyvinyl alcohol (PVA), and polylactic acid (PLA), as well as the density of crosslinks of the polymer chains when using fllers, we used the distributions of spin–lattice and spin–spin relaxation times for the protons. It is shown that the rigidity of the thermoplastic polymers depends on the sizes of the granules of...
-
Multistatyczny, Dopplerowski System określania położenia i prędkości ruchomych celów w wodzie
PublicationW omawianym w pracy multistatycznym, dopplerowskim systemie określania położenia i prędkości ruchomych celów w wodzie źródłem sygnału są dwa nadajniki emitujące sinusoidalne, akustyczne fale ciągłe o różnych częstotliwościach, które po odbiciu od ruchomego celu są obierane przez cztery hydrofony. W artykule przedstawiono analize teoretyczna efektu Dopplera, na którym oparte jest działanie systemu oraz metodę rozwiązania głównych...
-
Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs
PublicationConsidering ongoing developments of both modern CPUs, especially in the context of increasing numbers of cores, cache memory and architectures as well as compilers there is a constant need for benchmarking representative and frequently run workloads. The key metric is speed-up as the computational power of modern CPUs stems mainly from using multiple cores. In this paper, we show and discuss results from running codes such as:...
-
Two examples of Quantum Dynamical Semigroups
PublicationThe Hamiltonians of the considered bi-partite systems are of the form $$ H_{S,R} = H_S /times 1_R + Q_{S} /times M_R + 1_S /times H_R $$ Subindex $S$ corresponds to the observed system and $R$ to the reservoir (the enviroment of $S$). Two classes of systems are distinguished: the discrete-continuous...
-
Analysis of Corrugated Coaxial Line with the Use of Body of Revolution and Finite Element Method
PublicationA combination of the body-of-revolution and finite element methods is utilized to the analysis of coaxial lines with corrugated rod and wall. Both periodic and non-periodic structures can be investigated. As the structure is axially symmetrical the two dimensional scalar-vector finite element method can be used, which allows for the investigation of complex geometries and is computationally efficient. A generalized impedance matrix...
-
A few steps more towards NPT bound entanglement
PublicationIn this paper, existence of bound entangled states with nonpositive partial transpose (NPT) is considered. As one knows, existence of such states would in particular imply nonadditivity of distillable entanglement. Moreover, it would rule out a simple mathematical description of the set of distillable states. The particular state, known to be 1-copy nondistillable and supposed to be bound entangled, is considered. The problem of...
-
Various types of semiconductor photocatalysts modified by CdTe QDs and Pt NPs for toluene photooxidation in the gas phase under visible light
PublicationA novel synthesis process was used to prepare TiO2 microspheres, TiO2 P-25, SrTiO3 and KTaO3 decorated by CdTe QDs and/or Pt NPs. The effect of semiconductor matrix, presence of CdTe QDs and/or Pt NPs on the semiconductor surface as well as deposition technique of Pt NPs (photodeposition or radiolysis) on the photocatalytic activity were investigated. The as-prepared samples were characterized by X-ray powder diffractometry (XRD),...
-
Performance Analysis of the OpenCL Environment on Mobile Platforms
PublicationToday’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...
-
A New Approach to the PWM Modulation for the Multiphase Matrix Converters Supplying Loads with Open-End Winding.
PublicationThis article presents three variants of the Pulse Width Modulation (PWM) for the Double Square Multiphase type Conventional Matrix Converters (DSM-CMC) supplying loads with the open-end winding. The first variant of PWM offers the ability to obtain zero value of the common-mode voltage at the load's terminals and applies only six switches within the modulation period. The second proposal archives for less Total Harmonic Distortion...
-
Larmor diamagnetism and Van Vleck paramagnetism in relativistic quantumtheory: the Gordon decomposition approach
PublicationWe consider a charged Dirac particle bound in a scalar potential perturbed by a classical magnetic field derivable from a vector potential A(r). Using a procedure based on the Gordon decomposition of a field-induced current, we identify diamagnetic and paramagnetic contributions to the second-order perturbationtheory correction to the particle's energy. In contradiction to earlier findings, based on the sum-over-states approach,...
-
Zero-range potentials for Dirac particles: Bound-state problems
PublicationA model in which a massive Dirac particle in $\mathbb{R}^{3}$ is bound by $N\geqslant1$ spatially distributed zero-range potentials is presented. Interactions between the particle and the potentials are modeled by subjecting a particle's bispinor wave function to certain limiting conditions at the potential centers. Each of these conditions is parametrized by a $2\times2$ Hermitian matrix (or, equivalently, a real scalar and a...
-
Application of Analytic Signal and Smooth Interpolation in Pulse Width Modulation for Conventional Matrix Converters
PublicationThe paper proposes an alternative and novel approach to the PWM duty cycles computation for Conventional Matrix Converters (CMC) fed by balanced, unbalanced or non–sinusoidal AC voltage sources. The presented solution simplifies the prototyping of direct modulation algorithms. PWM duty cycles are calculated faster by the smooth interpolation technique, using only vector coordinates, without trigonometric functions and angles. Both...
-
A Simplified SVPWM Technique for Five-leg Inverter with Dual Three-phase Output
PublicationThis article proposes a simplified space vector pulse-width modulation (SVPWM) technique five-leg inverter with dual three-phase output. An idea to fed the dual tree-phase machine by the multiphase voltage source inverters (VSIs) is not new. Dual- and multi-motor drive systems are widely used in the industry applications. The most popular fields are: electric vehicles (EVs) and traction systems. Moreover, the specific characteristic...
-
Computing methods for fast and precise body surface area estimation of selected body parts
PublicationCurrently used body surface area (BSA) formulas give satisfactory results only for individuals with typical physique, while for elderly, obese or anorectic people accurate results cannot be expected. Particularly noteworthy are the results for individuals with severe obesity (body-mass index greater than 35 kg/m2), for which BSA estimation errors reached 80%. The main goal of our study is the development of precise BSA models for...
-
Hybridized Space-Vector Pulsewidth Modulation for Multiphase Two-Level Voltage Source Inverter
PublicationIn space vector pulsewidth modulation (SVPWM) algorithms for multiphase two-level voltage source inverters (VSI), the components of active vectors in all orthogonal spaces have to be calculated within the processor and stored in its memory. These necessitate intensive computational efforts of the processor and large memory space. This article presents a hybridized SVPWM for multiphase two-level VSI. In this algorithm, elements...
-
Determination of Local Dye Concentration in Hybrid Porous Silica Thin Films
PublicationThe idea of determination of local dye concentration in a nanoporous matrix is proposed based on donor − acceptor energy transfer. The method was tested for a Rhodamine 110 − Rhodamine 101 system in silica and methylated silica nanolayers. Evaluation of acceptor (Rhodamine 101) local concentration was carried out by comparing the results of Monte Carlo simulation of energy transfer from donor (Rhodamine 110) to acceptor (Rhodamine...
-
Reduced-cost electromagnetic-driven optimisation of antenna structures by means of trust-region gradient-search with sparse Jacobian updates
PublicationNumerical optimisation plays more and more important role in the antenna design. Because of lack of design-ready theoretical models, electromagnetic (EM)-simulation-driven adjustment of geometry parameters is a necessary step of the design process. At the same time, traditional parameter sweeping cannot handle complex topologies and large number of design variables. On the other hand, high computational cost of the conventional...
-
Discrete-time estimation of nonlinear continuous-time stochastic systems
PublicationIn this paper we consider the problem of state estimation of a dynamic system whose evolution is described by a nonlinear continuous-time stochastic model. We also assume that the system is observed by a sensor in discrete-time moments. To perform state estimation using uncertain discrete-time data, the system model needs to be discretized. We compare two methods of discretization. The first method uses the classical forward Euler...
-
Discrete-time estimation of nonlinear continuous-time stochastic systems
PublicationIn this paper we consider the problem of state estimation of a dynamic system whose evolution is described by a nonlinear continuous-time stochastic model. We also assume that the system is observed by a sensor in discrete-time moments. To perform state estimation using uncertain discrete-time data, the system model needs to be discretized. We compare two methods of discretization. The first method uses the classical forward Euler...
-
A Direct Modulation for Matrix Converters based on the Onecycle Atomic operation developed in Verilog HDL.
PublicationThis paper presents a fast direct Pulse Width Modulation (PWM) algorithm for the Conventional Matrix Converters (CMC) developed in Verilog Hardware Description language (HDL). All PWM duty cycle calculations are performed in one cycle by an atomic operation designed as a digital module using FPGA basic blocks. The algorithm can be extended to any number of output phase. The improved version of the discontinuous Direct Analytic...
-
In uence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classi cation
PublicationWe present a comprehensive evaluation of the infuence of 'harmonic' and rhythmic sections contained in an audio file on automatic music genre classi cation. The study is performed using the ISMIS database composed of music files, which are represented by vectors of acoustic parameters describing low-level music features. Non-negative Matrix Factorization serves for blind separation of instrument components. Rhythmic components...