Filters
total: 3467
filtered: 2753
-
Catalog
- Publications 2753 available results
- Journals 167 available results
- Conferences 115 available results
- Publishing Houses 1 available results
- People 88 available results
- Projects 4 available results
- e-Learning Courses 58 available results
- Events 2 available results
- Open Research Data 279 available results
Chosen catalog filters
displaying 1000 best results Help
Search results for: parallel computational implementation
-
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
PublicationThe paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...
-
Process arrival pattern aware algorithms for acceleration of scatter and gather operations
PublicationImbalanced process arrival patterns (PAPs) are ubiquitous in many parallel and distributed systems, especially in HPC ones. The collective operations, e.g. in MPI, are designed for equal process arrival times (PATs), and are not optimized for deviations in their appearance. We propose eight new PAP-aware algorithms for the scatter and gather operations. They are binomial or linear tree adaptations introducing additional process...
-
A Power-Efficient Digital Technique for Gain and Offset Correction in Slope ADCs
PublicationIn this brief, a power-efficient digital technique for gain and offset correction in slope analog-to-digital converters (ADCs) has been proposed. The technique is especially useful for imaging arrays with massively parallel image acquisition where simultaneous compensation of dark signal non-uniformity (DSNU) as well as photo-response non-uniformity (PRNU) is critical. The presented approach is based on stopping the ADC clock by...
-
Theory and implementation of a virtualisation level Future Internet defence in depth architecture
PublicationAn EU Future Internet Engineering project currently underway in Poland defines three parallel internets (PIs). The emerging IIP system (IIPS, abbreviating the project’s Polish name), has a four-level architecture, with level 2 responsible for creation of virtual resources of the PIs. This paper proposes a three-tier security architecture to address level 2 threats of unauthorised traffic injection and IIPS traffic manipulation...
-
Modeling the effect of external load variations on single, serie and parallel connected microbial fuel cells
PublicationThis paper presents a microbial fuel cell (MFC) model designed to analyze the effect of the external load on MFC performance. The model takes into account the voltage and the chemical oxygen demand (COD) dependence on the external load. The value of the model parameters were calibrated by means of the voltage relaxation method tests using a controlled load current. Laboratory measurements and MATLAB Simulink model computations...
-
Scheduling with Complete Multipartite Incompatibility Graph on Parallel Machines
PublicationIn this paper we consider a problem of job scheduling on parallel machines with a presence of incompatibilities between jobs. The incompatibility relation can be modeled as a complete multipartite graph in which each edge denotes a pair of jobs that cannot be scheduled on the same machine. Our research stems from the works of Bodlaender, Jansen, and Woeginger (1994) and Bodlaender and Jansen (1993). In particular, we pursue the...
-
A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems
PublicationModern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...
-
Behavior Analysis and Dynamic Crowd Management in Video Surveillance System
PublicationA concept and practical implementation of a crowd management system which acquires input data by the set of monitoring cameras is presented. Two leading threads are considered. First concerns the crowd behavior analysis. Second thread focuses on detection of a hold-ups in the doorway. The optical flow combined with soft computing methods (neural network) is employed to evaluate the type of crowd behavior, and fuzzy logic aids detection...
-
Distributed NVRAM Cache – Optimization and Evaluation with Power of Adjacency Matrix
PublicationIn this paper we build on our previously proposed MPI I/O NVRAM distributed cache for high performance computing. In each cluster node it incorporates NVRAMs which are used as an intermediate cache layer between an application and a file for fast read/write operations supported through wrappers of MPI I/O functions. In this paper we propose optimizations of the solution including handling of write requests with a synchronous mode,...
-
A Fail-Safe NVRAM Based Mechanism for Efficient Creation and Recovery of Data Copies in Parallel MPI Applications
PublicationThe paper presents a fail-safe NVRAM based mechanism for creation and recovery of data copies during parallel MPI application runtime. Specifically, we target a cluster environment in which each node has an NVRAM installed in it. Our previously developed extension to the MPI I/O API can take advantage of NVRAM regions in order to provide an NVRAM based cache like mechanism to significantly speed up I/O operations and allow to preload...
-
A novel hybrid adaptive framework for support vector machine-based reliability analysis: A comparative study
PublicationThis study presents an innovative hybrid Adaptive Support Vector Machine - Monte Carlo Simulation (ASVM-MCS) framework for reliability analysis in complex engineering structures. These structures often involve highly nonlinear implicit functions, making traditional gradient-based first or second order reliability algorithms and Monte Carlo Simulation (MCS) time-consuming. The application of surrogate models has proven effective...
-
Planning optimised multi-tasking operations under the capability for parallel machining
PublicationThe advent of advanced multi-tasking machines (MTMs) in the metalworking industry has provided the opportunity for more efficient parallel machining as compared to traditional sequential processing. It entailed the need for developing appropriate reasoning schemes for efficient process planning to take advantage of machining capabilities inherent in these machines. This paper addresses an adequate methodical approach for a non-linear...
-
Implementation of spatial/polarization diversity for improved-performance circularly polarized multiple-input-multiple-output ultra-wideband antenna
PublicationIn this paper, spatial and polarization diversities are simultaneously implemented in an ultra-wideband (UWB) multiple-input-multiple-output (MIMO) antenna to reduce the correlation between the parallel-placed radiators. The keystone of the antenna is systematically modified coplanar ground planes that enable excitation of circular polarization (CP). To realize one sense of circular polarization as well as ultra-wideband operation,...
-
Performance/energy aware optimization of parallel applications on GPUs under power capping
PublicationIn the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...
-
Multi-source-supplied parallel hybrid propulsion of the inland passenger ship STA.H. Research work on energy efficiency of a hybrid propulsion system operating in the electric motor drive mode
PublicationIn the Faculty of Ocean Engineering and Ship Technology, Gdansk University of Technology, design has recently been developed of a small inland ship with hybrid propulsion and supply system. The ship will be propelled by a specially designed so called parallel hybrid propulsion system. The work was aimed at carrying out the energy efficiency analysis of a hybrid propulsion system operating in the electric motor drive mode and at...
-
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
PublicationIn the paper, we have proposed a framework that allows programming a parallel application for a multi-node system, with one or more GPUs per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to manage CUDA objects, data and launch kernels. The framework hides inter-node MPI communication from the programmer who can benefit from...
-
Investigation of Mechanical and Microstructural Properties of Welded Specimens of AA6061-T6 Alloy with Friction Stir Welding and Parallel Friction Stir Welding Methods
PublicationThe present study investigates the effect of two parameters of process type and tool offset on tensile, microhardness, and microstructure properties of AA6061-T6 aluminum alloy joints. Three methods of Friction Stir Welding (FSW), Advancing Parallel-Friction Stir Welding (AP-FSW), and Retreating Parallel-Friction Stir Welding (RP-FSW) were used. In addition, four modes of 0.5, 1, 1.5, and 2 mm of tool offset were used in two welding...
-
Experimental Research on the Energy Efficiency of a Parallel Hybrid Drive for an Inland Ship
PublicationThe growing requirements for limiting the negative impact of all modes of transport on the natural environment mean that clean technologies are becoming more and more important. The global trend of e-mobility also applies to sea and inland water transport. This article presents the results of experimental tests carried out on a life-size, parallel diesel-electric hybrid propulsion system. The eciency of the propulsion system was...
-
Scheduling with Complete Multipartite Incompatibility Graph on Parallel Machines: Complexity and Algorithms
PublicationIn this paper, the problem of scheduling on parallel machines with a presence of incompatibilities between jobs is considered. The incompatibility relation can be modeled as a complete multipartite graph in which each edge denotes a pair of jobs that cannot be scheduled on the same machine. The paper provides several results concerning schedules, optimal or approximate with respect to the two most popular criteria of optimality:...
-
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
PublicationIn the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...
-
Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams
PublicationThe paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams,...
-
Identification of nonstationary processes using noncausal bidirectional lattice filtering
PublicationThe problem of off-line identification of a nonstationary autoregressive process with a time-varying order and a time-varying degree of nonstationarity is considered and solved using the parallel estimation approach. The proposed parallel estimation scheme is made up of several bidirectional (noncausal) exponentially weighted lattice algorithms with different estimation memory and order settings. It is shown that optimization of...
-
Rigid finite elements and multibody modeling in analyses of a robot shaped elastic/plastic deformations of a beam
PublicationDynamics analysis of a system composed of a parallel manipulator and of an elastic beam is presented in the paper. Classic 3RRR parallel manipulator is considered and used to deform the beam. Elasto-plastic deformations are investigated. Rigid-finite-elements technique is employed to deal with dynamics of the beam. A multibody structure is associated with the introduced hybrid system in order to model its dynamics. Idea of the...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublicationThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
A Novel Coupling Matrix Synthesis Technique for Generalized Chebyshev Filters With Resonant Source–Load Connection
PublicationThis paper reports a novel synthesis method for microwave bandpass filters with resonant source–load connection. In effect, a network realizing N+1 transmission zeros (where N is the number of reflection zeros) is obtained. The method is based on a prototype transversal coupling matrix (N+2, N+2) with source and load connected by a resonant circuit formed by a capacitor in parallel with a frequency-invariant susceptance. To complement...
-
Multipulse inverter structures with low voltage distortion
PublicationA novel approach to the voltage source inverters (VSI) construction is presented in the paper. The invented inverter structures allow to operate several DC/AC converters in parallel resulting in lower voltage distortions at extremely low switching frequency. The research presented in the paper describes such a parallel operation of the VSI’s which is possible thanks to the use of coupled inductors. The eighteen-pulse three-level...
-
Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs
PublicationThis paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...
-
Scalable Measurement System for Multiple Impedance Gas Sensors
PublicationAuthor proposes scalable architecture of the measurement system for gas sensor with impedance dependance of the gas concentration. The main part of the system is a single-board impedance analyser. The number of analysers working in parallel can be configured according to specific application. The system is controlled by a single computer which organises the measurement cycle and store the acquired measurement data. The system is...
-
Modular multipulse voltage source inverters with integrating coupled reactors
PublicationA novel approach to the voltage source inverters (VSI) construction is presented in the paper. The invented inverter structures allow to operate several DC/AC converters in parallel resulting in lower voltage distortions at extremely low switching frequency. The research presented in the paper describes such a parallel operation of the VSI’s which is possible thanks to the use of coupled inductors. The eighteen-pulse and twenty-four-pulse...
-
Scheduling of compatible jobs on parallel machines
PublicationThe dissertation discusses the problems of scheduling compatible jobs on parallel machines. Some jobs are incompatible, which is modeled as a binary relation on the set of jobs; the relation is often modeled by an incompatibility graph. We consider two models of machines. The first model, more emphasized in the thesis, is a classical model of scheduling, where each machine does one job at time. The second one is a model of p-batching...
-
A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache
PublicationWhile many scientific, large-scale applications are data-intensive, fast and efficient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind...
-
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublicationIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
-
Unraveling the Interplay between DNA and Proteins: A Computational Exploration of Sequence and Structure-Specific Recognition Mechanisms
PublicationMy PhD dissertation focused on DNA-protein interactions and the recognition of specific DNA sequences and structures. I discovered that acidic amino acid residues (Asp/Glu) play a crucial role by exhibiting a preference for cytosine. Their contribution to binding affinity depends on nearby cytosines, balancing electrostatic repulsion with specific interactions. Acidic residues act as negative selectors, discouraging non-cytosine...
-
Self-optimizing generalized adaptive notch filters - comparison of three optimization strategies
PublicationThe paper provides comparison of three different approaches to on-line tuning of generalized adaptive notch filters (GANFs) the algorithms used for identification/tracking of quasi-periodically varying dynamic systems. Tuning is needed to adjust adaptation gains, which control tracking performance of ANF algorithms, to the unknown and/or time time-varying rate of system nonstationarity. Two out ofthree compared approaches are classical...
-
Circular polarization diversity implementation for correlation reduction in wideband low-cost multiple-input-multiple-output antenna
PublicationIn this paper, a multiple-input-multiple-output (MIMO) antenna featuring circular polarization diversity, and designed on a common coplanar ground is presented. The proposed antenna design utilizes a coplanar waveguide (CPW) feeding technique with three parallel coplanar ground planes, and two feedlines in-between. For circular polarization (CP), quasi-loops are created by etching slots on the outermost ground planes. With this...
-
Modeling SPMD Application Execution Time
PublicationParallel applications in a Single Process Multiple Data paradigm assume splitting huge amounts of data to multiple processors working in parallel at small data packets. As the individual data packets are not independent, the processors must interact with each other to exchange results of the calculations with their adjacent partners and take these results into account in their own computations. An example of SPMD is geometric parallelism...
-
Krytyczne działania i czynniki sukcesu wdrażania projektów informatycznych
PublicationCelem niniejszego artykułu jest określenie, jakie działania powinny podjąć organizacje gospodarcze, aby zidentyfikowane krytyczne czynniki sukcesu projektów informatycznych, pozwalające na zwiększenie prawdopodobieństwa pomyślnego zakończenia przedsięwzięcia, zmaterializowały się. Na potrzeby niniejszego artykułu przeprowadzono badania własne. Jako metodę wybrano analizę przypadków. Badanie przeprowadzono w organizacji, która prowadziła...
-
The complexity of bicriteria tree-depth
PublicationThe tree-depth problem can be seen as finding an elimination tree of minimum height for a given input graph G. We introduce a bicriteria generalization in which additionally the width of the elimination tree needs to be bounded by some input integer b. We are interested in the case when G is the line graph of a tree, proving that the problem is NP-hard and obtaining a polynomial-time additive 2b-approximation algorithm. This particular...
-
Image Processing Techniques for Distributed Grid Applications
PublicationParallel approaches to 2D and 3D convolution processing of series of images have been presented. A distributed, practically oriented, 2D spatial convolution scheme has been elaborated and extended into the temporal domain. Complexity of the scheme has been determined and analysed with respect to coefficients in convolution kernels. Possibilities of parallelisation of the convolution operations have been analysed and the results...
-
Tryton Supercomputer Capabilities for Analysis of Massive Data Streams
PublicationThe recently deployed supercomputer Tryton, located in the Academic Computer Center of Gdansk University of Technology, provides great means for massive parallel processing. Moreover, the status of the Center as one of the main network nodes in the PIONIER network enables the fast and reliable transfer of data produced by miscellaneous devices scattered in the area of the whole country. The typical examples of such data are streams...
-
Two Stage SVM and kNN Text Documents Classifier
PublicationThe paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...
-
Redundantly Actuated 3RRR Parallel Planar Manipulator - Numerical Analyses of its Dynamics Sensitivity on Modifications of its Platform’s Inertia Parameters
PublicationIn the paper, numerical analyses, as well as dynamics of a complex mechanism, are presented. Two objectives are crucial for the paper: inverse dynamic model is needed (dedicated to be use in the model predictive controller); an identification method is searched (some trajectory parameters are controlled, when specific trajectory is tracked under an open-loop model-based control), as selected parameters must be identified for the...
-
ENERGY EFFICIENT AND ENVIRONMENTALLY FRIENDLY HYBRID CONVERSION OF INLAND PASSENGER VESSEL
PublicationThe development and growing availability of modern technologies, along with more and more severe environment protection standards which frequently take a form of legal regulations, are the reason why attempts are made to find a quiet and economical propulsion system not only for newly built watercraft units, but also for modernised ones. Correct selection of the propulsion and supply system for a given vessel affects significantly...
-
A Generalized Framework Towards Structural Mechanics of Three-layered Composite Structures
PublicationThree-layered composite structures find a broad application. Increasingly, composites are being used whose layer thicknesses and material properties diverge strongly. In the perspective of structural mechanics, classical approaches to analysis fail at such extraordinary composites. Therefore, emphasis of the present approach is on arbitrary transverse shear rigidities and structural thicknesses of the individual layers. Therewith...
-
The Quick Measure of a Nurbs Surface Curvature for Accurate Triangular Meshing
PublicationNURBS surfaces are the most widely used surfaces for three-dimensional models in CAD/CAE programs. As a model for FEM calculation is prepared with a CAD program it is inevitable to mesh it finally. There are many algorithms for meshing planar regions. Some of them may be used for meshing surfaces but it is necessary to take the curvature of the surface under consideration to avoid poor quality mesh. The mesh must be denser in the...
-
Parallelization of Selected Algorithms on Multi-core CPUs, a Cluster and in a Hybrid CPU+Xeon Phi Environment
PublicationIn the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The parallel codes, implementing the master-slave model in C+MPI, differ in computation to communication ratios. The considered problems include: a genetic algorithm with various ratios of master processing time to communication...
-
Studies of the Modelling Accuracy of Steam Turbine Control Systems for Diagnostic Tests of Automatic Synchronizers
PublicationOne of the important factors that affect the reliable operation of the power system and the rapid restitution after disaster is a quick and effective combining synchronous electric power facilities to operate in parallel. Hence, diagnostics of automatic synchronizers at every stage of their life, from building a prototype, through the whole life, until removing such devices from the operation, is an extremely important and responsible...
-
Genetic Positioning of Fire Stations Utilizing Grid-computing Platform
PublicationA chapter presents a model for determining near-optimal locations of fire stations based on topography of a given area and location of forests, rivers, lakes and other elements of the site. The model is based on principals of genetic algorithms and utilizes the power of the grid to distribute and execute in parallel most performance-demanding computations involved in the algorithm.
-
Propagation in rectangular waveguides with a pseudochiral Ω slab
PublicationThe transfer matrix approach is applied for analysis of waveguides loaded with a uniaxial pseudochiral Ω slab. In particular a pseudochiral parallel plate and rectangular guides are investigated. Based on the numerical analysis the influence of the pseudochirality on propagation characteristics and field distribution are examined. Other feature such as a field displacement phenomenon appearing in the both considered structures...
-
The ONETEP linear-scaling density functional theory program
PublicationWe present an overview of the ONETEP program for linear-scaling density functional theory (DFT) calculations with large basis set (planewave) accuracy on parallel computers. The DFT energy is computed from the density matrix, which is constructed from spatially localized orbitals we call Non-orthogonal Generalized Wannier Functions (NGWFs), expressed in terms of periodic sinc (psinc) functions. During the calculation, both the...