Filters
total: 357
filtered: 290
-
Catalog
Chosen catalog filters
Search results for: cuda · unified memory · prefetching · memory oversubscription
-
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
PublicationIn the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...
-
Intensive blood pressure lowering prevents mild cognitive impairment and possible dementia and slows development of white matter lesions in brain: the SPRINT Memory and Cognition IN Decreased Hypertension (SPRINT MIND) study
Publication -
Towards an efficient multi-stage Riemann solver for nuclear physics simulations
PublicationRelativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....
-
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
PublicationThe paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...
-
Performance evaluation of parallel background subtraction on GPU platforms
PublicationImplementation of the background subtraction algorithm on parallel GPUs is presented. The algorithm processes video streams and extracts foreground pixels. The work focuses on optimizing parallel algorithm implementation by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. The algorithm is implemented in both OpenCL and CUDA. Various optimizations of...
-
„Jeśli my zapomnimy, kto będzie pamiętał?". Dzieło sztuki jako manifestacja postpamięci
PublicationTekst jest próbą ujęcia relacji między postpamięcią (lub inaczej ujmując - „pamięcią zastępczą”) a sztuką, przy czym szczególny akcent położono na sztuki wizualne. Dokonano analizy dzieł artystów młodszego pokolenia, podejmujących temat pamięci o Szoa (między innymi Libera, Bałka, Żmijewski, do pewnego stopnia Betlejewski), traktując je jako formy manifestacji postpamięci. Wychodząc z założenia, że analiza zjawisk artystycznych...
-
Missing Puzzle Pieces in Dementia Research: HCN Channels and Theta Oscillations
PublicationIncreasing evidence indicates a role of hyperpolarization activated cation (HCN) channels in controlling the resting membrane potential, pacemaker activity, memory formation, sleep, and arousal. Their disfunction may be associated with the development of epilepsy and age-related memory decline. Neuronal hyperexcitability involved in epileptogenesis and EEG desynchronization occur in the course of dementia in human Alzheimer’s Disease...
-
NVRAM as Main Storage of Parallel File System
PublicationModern cluster environments' main trouble used to be lack of computational power provided by CPUs and GPUs, but recently they suffer more and more from insufficient performance of input and output operations. Apart from better network infrastructure and more sophisticated processing algorithms, a lot of solutions base on emerging memory technologies. This paper presents evaluation of using non-volatile random-access memory as a...
-
How to meet when you forget: log-space rendezvous in arbitrary graphs
PublicationTwo identical (anonymous) mobile agents start from arbitrary nodes in an a priori unknown graph and move synchronously from node to node with the goal of meeting. This rendezvous problem has been thoroughly studied, both for anonymous and for labeled agents, along with another basic task, that of exploring graphs by mobile agents. The rendezvous problem is known to be not easier than graph exploration. A well-known recent result...
-
MEMORYSCAPES OF EASTERN POLAND
PublicationThe text investigates new phenomena emerging in the field of social memory and commemoration in contemporary Poland. On the basis of field analyses, case studies and theoretical, transdisciplinary approaches, the paper discusses the issue of contemporary memoryscapes in eastern Poland (Bialystok and Lublin). These emerging forms of remembrance are the result of the sophisticated interplay between different actors involved in the...
-
Using GPUs for Parallel Stencil Computations in Relativistic Hydrodynamic Simulation
PublicationThis paper explores the possibilities of using a GPU for complex 3D finite difference computation. We propose a new approach to this topic using surface memory and compare it with 3D stencil computations carried out via shared memory, which is currently considered to be the best approach. The case study was performed for the extensive computation of collisions between heavy nuclei in terms of relativistic hydrodynamics.
-
Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins
PublicationWe report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...
-
Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA
PublicationLarge-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems is challenging due to a large memory footprint, sparse computational patterns, and irregular memory accesses with poor locality. Intel’s Programmable Integrated Unffied Memory Architecture (PIUMA) is designed to address these challenges for graph analytics. In this paper, a detailed characterization of GCNs is presented using the Open-Graph Benchmark...
-
Polityki pamięci i tożsamości wobec (nie)chcianego dziedzictwa. Od Gdańska do Gdańzigu
PublicationCelem artykułu jest próba odpowiedzi na pytanie, jak w zależności od polityki pamięci, pamięci zbiorowej i kultury historycznej kształtowano fizyczną przestrzeń miasta i jego obraz. Z przeprowadzonych dotychczas analiz wynika, że sposób, w jaki budowano narracje tożsamościowe, oparty był zasadniczo na konstrukcjach mitotwórczych, zwłaszcza na micie XVI–XVII-wiecznego „złotego wieku”. W Gdańsku, ze względu na bogatą przeszłość historyczną...
-
Alternative Approach to Convolution Term of Viscoelasticity in Equations of Unsteady Pipe Flow
PublicationIn the paper the selected aspects concerning description of viscoelastic behavior of pipe walls during unsteady flow are analyzed. The alternative convolution expression of the viscoelastic term is presented and compared with the corresponding term referring to unsteady friction. Both approaches indicate similarities in the forms of impulse response functions and the parameter properties. The flow memory was introduced into convolution...
-
Neural network agents trained by declarative programming tutors
PublicationThis paper presents an experimental study on the development of a neural network-based agent, trained using data generated using declarative programming. The focus of the study is the application of various agents to solve the classic logic task – The Wumpus World. The paper evaluates the effectiveness of neural-based agents across different map configurations, offering a comparative analysis to underline the strengths and limitations...
-
Time versus space trade-offs for randezvous in trees
PublicationTwo identical (anonymous) mobile agents start from arbitrary nodes of an unknown tree and have to meet at some node. Agents move in synchronous rounds: in each round an agent can either stay at the current node or move to one of its neighbors. We consider deterministic algorithms for this rendezvous task. The main result of this paper is a tight trade-off between the optimal time of completing rendezvous and the size of memory...
-
On thermal stability of topological qubit in Kitaev's 4D model
PublicationWe analyse stability of the four-dimensional Kitaev model-a candidate for scalable quantum memory - in finite temperature within the weak coupling Markovian limit. It is shown that, below a critical temperature, certain topological qubit observables X and Z possess relaxation times exponentially long in the size of the system. Their construction involves polynomial in system size algorithm which uses as an input the results of...
-
Tożsamość i przestrzeń. Wokół gdańskich retoryk tożsamościowych
PublicationW długim procesie kształtowania tożsamości Gdańska ważną rolę odgrywają pamięć i przestrzeń. W nowym dyskursie tożsamościowym nie tracą one na znaczeniu, lecz obok stałych fundamentów narracji tożsamościowej (wielokulturowość, architektura, pamięć), pojawiają się nowe opowieści miejskie obejmujące różne wymiary refleksji wskazujące na złożoność: wielowymiarowość i wielowarstwowość gdańskich tożsamości. W pamięci zbiorowej gdańszczan...
-
Unsupervised machine-learning classification of electrophysiologically active electrodes during human cognitive task performance
PublicationIdentification of active electrodes that record task-relevant neurophysiological activity is needed for clinical and industrial applications as well as for investigating brain functions. We developed an unsupervised, fully automated approach to classify active electrodes showing event-related intracranial EEG (iEEG) responses from 115 patients performing a free recall verbal memory task. Our approach employed new interpretable...
-
High performance filtering for big datasets from Airborne Laser Scanning with CUDA technology
PublicationThere are many studies on the problems of processing big datasets provided by Airborne Laser Scanning (ALS). The processing of point clouds is often executed in stages or on the fragments of the measurement set. Therefore, solutions that enable the processing of the entire cloud at the same time in a simple, fast, efficient way are the subject of many researches. In this paper, authors propose to use General-Purpose computation...
-
Context Search Algorithm for Lexical Knowledge Acquisition
PublicationA Context Search algorithm used for lexical knowledge acquisition is presented. Knowledge representation based on psycholinguistic theories of cognitive processes allows for implementation of a computational model of semantic memory in the form of semantic network. A knowledge acquisition using supervised dialog templates have been performed in a word game designed to guess the concept a human user is thinking about. The game,...
-
Massively parallel linear-scaling Hartree–Fock exchange and hybrid exchange–correlation functionals with plane wave basis set accuracy
PublicationWe extend our linear-scaling approach for the calculation of Hartree–Fock exchange energy using localized in situ optimized orbitals [Dziedzic et al., J. Chem. Phys. 139, 214103 (2013)] to leverage massive parallelism. Our approach has been implemented in the ONETEP (Order-N Electronic Total Energy Package) density functional theory framework, which employs a basis of non-orthogonal generalized Wannier functions (NGWFs) to achieve...
-
Gaining knowledge through experience: developing decisional DNA applications in robotics
PublicationOmówiono nowatorskie podejscie do zastosowania wiedzy opartej na doświadczeniu i budowie decyzyjnego DNA w obszarach związanych z robotyką.In this article, we explore an approach that integrates Decisional DNA, a domain-independent, flexible, and standard knowledge representation structure, with robots in order to test the usability and suitability of this novel knowledge representation structure. Core issues in using this Decisional...
-
Hybridized Space-Vector Pulsewidth Modulation for Multiphase Two-Level Voltage Source Inverter
PublicationIn space vector pulsewidth modulation (SVPWM) algorithms for multiphase two-level voltage source inverters (VSI), the components of active vectors in all orthogonal spaces have to be calculated within the processor and stored in its memory. These necessitate intensive computational efforts of the processor and large memory space. This article presents a hybridized SVPWM for multiphase two-level VSI. In this algorithm, elements...
-
Smaller Representation of Finite State Automata
PublicationThis paper is a follow-up to Jan Daciuk's experiments on space-effcient finite state automata representation that can be used directly for traversals in main memory. We investigate several techniques of reducing memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve a gain of around 20-30%...
-
Dilemmas of Identity in Contemporary Cities. The City of Gdansk as an Example
PublicationThe article is aimed to answer the question how, depending on the historical heritage, the collective memory, the physical space of the city and their images were shaped, through the politics of memory. All known cultures and languages distinguish the ‘self’ and the ‘other’, ‘us’ and ‘them’. Neither do we know cities which wish to differ in some particular way, although they can have numerous identities. Their multitude and diversity...
-
Modelling and simulation of GPU processing in the MERPSYS environment
PublicationIn this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed...
-
Parallel implementation of the DGF-FDTD method on GPU Using the CUDA technology
PublicationThe discrete Green's function (DGF) formulation of the finite-difference time-domain method (FDTD) is accelerated on a graphics processing unit (GPU) by means of the Compute Unified Device Architecture (CUDA) technology. In the developed implementation of the DGF-FDTD method, a new analytic expression for dyadic DGF derived based on scalar DGF is employed in computations. The DGF-FDTD method on GPU returns solutions that are compatible...
-
Acceleration of the DGF-FDTD method on GPU using the CUDA technology
PublicationWe present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...
-
Smaller representation of finite state automata
PublicationThis paper is a follow-up to Jan Daciuk's experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory (Daciuk, 2000)[4]. We investigate several techniques for reducing memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve...
-
Efficient model order reduction for FEM analysis of waveguide structures and resonators
PublicationAn efficient model order reduction method for three-dimensional Finite Element Method (FEM) analysis of waveguide structures is proposed. The method is based on the Efficient Modal Order Reduction (ENOR) algorithm for creating macro-elements in cascaded subdomains. The resulting macro-elements are represented by very compact submatrices, leading to significant reduction of the overall number of unknowns. The efficiency of the model...
-
GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]
PublicationThis paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...
-
FDTD Method for Electromagnetic Simulations in Media Described by Time-Fractional Constitutive Relations
PublicationIn this paper, the finite-difference time-domain (FDTD) method is derived for electromagnetic simulations in media described by the time-fractional (TF) constitutive relations. TF Maxwell’s equations are derived based on these constitutive relations and the Grünwald–Letnikov definition of a fractional derivative. Then the FDTD algorithm, which includes memory effects and energy dissipation of the considered media, is introduced....
-
Recurrent Neural Network Based Adaptive Variable-Order Fractional PID Controller for Small Modular Reactor Thermal Power Control
PublicationThis paper presents the synthesis of an adaptive PID type controller in which the variable-order fractional operators are used. Due to the implementation difficulties of fractional order operators, both with a fixed and variable order, on digital control platforms caused by the requirement of infinite memory resources, the fractional operators that are part of the discussed controller were approximated by recurrent neural networks...
-
An Approximation of the Zero Error Capacity by a Greedy Algorithm.
PublicationWe present a greedy algorithm that determines a lower bound on the zero error capacity. The algorithm has many new advantages, e.g., it does not store a whole product graph in a computer memory and it uses the so-called distributions in all dimensions to get a better approximation of the zero error capacity. We also show an additional application of our algorithm.
-
An Approximation of the Zero Error Capacity by a Greedy Algorithm
PublicationWe present a greedy algorithm that determines a lower bound on the zero error capacity. The algorithm has many new advantages, e.g., it does not store a whole product graph in a computer memory and it uses the so-called distributions in all dimensions to get a better approximation of the zero error capacity. We also show an additional application of our algorithm.
-
Incorporating Iris, Fingerprint and Face Biometric for Fraud Prevention in e-Passports Using Fuzzy Vault
PublicationA unified frame work which provides a higher security level to e-passports is proposed. This framework integrates face, iris and fingerprint images. It involves three layers of security: the first layer maps a biometric image to another biometric image which is called biostego image. Three mapping schemes are proposed: the first scheme maps single biometric image to single biostego image, the second scheme maps dual biometric images...
-
On zero-error codes produced by greedy algorithms
PublicationWe present two greedy algorithms that determine zero-error codes and lower bounds on the zero-error capacity. These algorithms have many advantages, e.g., they do not store a whole product graph in a computer memory and they use the so-called distributions in all dimensions to get better approximations of the zero-error capacity. We also show an additional application of our algorithms.
-
Tuning a Hybrid GPU-CPU V-Cycle Multilevel Preconditioner for Solving Large Real and Complex Systems of FEM Equations
PublicationThis letter presents techniques for tuning an accelerated preconditioned conjugate gradient solver with a multilevel preconditioner. The solver is optimized for a fast solution of sparse systems of equations arising in computational electromagnetics in a finite element method using higher-order elements. The goal of the tuning is to increase the throughput while at the same time reducing the memory requirements in order to allow...
-
ОТТОКАР УЛЬ. IN MEMORIAM
PublicationОттокар Уль считается одним из самых влиятельных архитекторов Австрии второй половины ХХ века. Он внес значительный вклад в развитие партисипативных методов проектирования и пост-соборной реформы сакральной архитектуры. Статья освещает его биографию, идеи и проекты, а также и публикации. Кажется, что стоит, чтобы данный архитектор присутствовал в нашей памяти.
-
Identification of nonstationary processes using noncausal bidirectional lattice filtering
PublicationThe problem of off-line identification of a nonstationary autoregressive process with a time-varying order and a time-varying degree of nonstationarity is considered and solved using the parallel estimation approach. The proposed parallel estimation scheme is made up of several bidirectional (noncausal) exponentially weighted lattice algorithms with different estimation memory and order settings. It is shown that optimization of...
-
Zastosowanie programowania parametrycznego w planowaniu operacji obróbki elementów o powtarzalnej geometrii
PublicationPorównywano zastosowania dostępnych technik programowania obróbki numerycznej przedmiotów o powtarzających się elementach konstrukcji. Analizowano możliwości stosowania programowania parametrycznego w trybie programowania zorientowanego warsztatowo oraz pracy w środowisku systemu klasy CAM. Zwrócono uwagę na przejrzystość tworzonego programu w aspekcie możliwości jego edycji i korekty składni, formy zapisu programu w pamięci obrabiarki...
-
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
PublicationIn this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....
-
Multi-level Virtualization and Its Impact on System Performance in Cloud Computing
PublicationThe results of benchmarking tests of multi-level virtualized environments are presented. There is analysed the performance impact of hardware virtualization, container-type isolation and programming level abstraction. The comparison is made on the basis of a proposed score metric that allows you to compare different aspects of performance. There is general performance (CPU and memory), networking, disk operations and application-like...
-
Coherent-wave Monte Carlo method for simulating light propagation in tissue
PublicationSimulating propagation and scattering of coherent light in turbid media, such as biological tissues, is a complex problem. Numerical methods for solving Helmholtz or wave equation (e.g. finite-difference or finite-element methods) require large amount of computer memory and long computation time. This makes them impractical for simulating laser beam propagation into deep layers of tissue. Other group of methods, based on radiative...
-
How Can We Identify Electrophysiological iEEG Activities Associated with Cognitive Functions?
PublicationElectrophysiological activities of the brain are engaged in its various functions and give rise to a wide spectrum of low and high frequency oscillations in the intracranial EEG (iEEG) signals, commonly known as the brain waves. The iEEG spectral activities are distributed across networks of cortical and subcortical areas arranged into hierarchical processing streams. It remains a major challenge to identify these activities in...
-
Towards hardware built-in support for computer system safety
PublicationArtykuł omawia dostępne technologie wirtualizacji zasobów pamięci i systemów I/O w systemach komputerowych takie jak Execute Disable Bit (EDB) capability i Virtual Machine Architecture (VMA). Nastepnie wprowadza założenia na rozszerzenie tych technologii w celu uzyskania funkcjonalności Safe Call Execution dzieki technologii Execution Disabling Policies (EDP). Wprowadzono również założenia na funkcjonalność Memory Virtualization...
-
The system for remote monitoring of a vertical axis wind farm
PublicationThe article presents a system for remote monitoring of working parameters of a wind turbine with a vertical axis. The monitoring system was built using a Raspberry PI 3 microcomputer with the Raspbian operating system and a MicroDAQ E2000 measuring card. The developed system enables monitoring the power output of the generator, torque on the turbine shaft, turbine speed and wind speed. The values of the monitored parameters are...
-
Kod fontannowy z przyrastającą liczbą symboli źródłowych
PublicationKody fontannowe zabezpieczające transmisję przed wymazaniami wyróżnia brak określonej z góry długości i losowy sposób generacji kolejnych pakietów kodowych. W pracy pokazano jak właściwości tych kodów zależą od rozkładu macierzy generującej dla dwóch wariantów dostarczania pakietów do kodera. Szczególnie interesujący jest przypadek, gdy pakiety źródłowe trafiają do kodera stopniowo podczas transmisji. Uzyskane wyniki wskazują na...