Wyniki wyszukiwania dla: massively parallel computing

Mobile Cloud computing architecture for massively parallelizablegeometric computation

Publikacja

V. Sánchez Ribes
H. Mora-Mora
A. Sobecki
F. José Mora Gimeno

- COMPUTERS IN INDUSTRY - Rok 2020

Cloud Computing is one of the most disruptive technologies of this century. This technology has been widely adopted in many areas of the society. In the field of manufacturing industry, it can be used to provide advantages in the execution of the complex geometric computation algorithms involved on CAD/CAM processes. The idea proposed in this research consists in outsourcing part of the load to be com- puted in the client machines...

Pełny tekst do pobrania w portalu

Molecular Diffusion Simulation on ARUZ – Massively-parallel FPGA-based Machine

Publikacja

R. Kielbik
K. Halagan
K. Rudnicki
P. Polanowski
G. Jablonski
J. Jung

- Rok 2021

Pełny tekst do pobrania w serwisie zewnętrznym

Parallel Programming for Modern High Performance Computing Systems

Publikacja

P. Czarnul

- Rok 2018

In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Pełny tekst do pobrania w serwisie zewnętrznym

Highly parallel distributed computing systems with optical interconnections

Publikacja

J. Just
R. Romaniuk
R. S. Romaniuk

- Microprocessing and Microprogramming - Rok 1989

Pełny tekst do pobrania w serwisie zewnętrznym

Highly Parallel Distributed Computing System With Optical Interconnections

Publikacja

J. Just
R. Romaniuk
R. S. Romaniuk

- Rok 1990

Pełny tekst do pobrania w serwisie zewnętrznym

Review of parallel computing methods and tools for FPGA technology

Publikacja

R. Cieszewski
M. Linczuk
K. Pozniak
R. Romaniuk
R. S. Romaniuk

- Rok 2013

Pełny tekst do pobrania w serwisie zewnętrznym

ARUZ — Large-scale, massively parallel FPGA-based analyzer of real complex systems

Publikacja

R. Kiełbik
K. Hałagan
W. Zatorski
J. Jung
J. Ulański
A. Napieralski
K. Rudnicki
P. Amrozik
G. Jabłoński
D. Stożek... i 4 innych

- COMPUTER PHYSICS COMMUNICATIONS - Rok 2018

Pełny tekst do pobrania w serwisie zewnętrznym

Parallel multithread computing for spectroscopic analysis in optical coherence tomography

Publikacja

- Rok 2014

Spectroscopic Optical Coherence Tomography (SOCT) is an extension of Optical Coherence Tomography (OCT). It allows gathering spectroscopic information from individual scattering points inside the sample. It is based on time-frequency analysis of interferometric signals. Such analysis requires calculating hundreds of Fourier transforms while performing a single A-scan. Additionally, further processing of acquired spectroscopic information...

Pełny tekst do pobrania w serwisie zewnętrznym

Modelling of First- and Second-order Chemical Reactions on ARUZ – Massively-parallel FPGA-based Machine

Publikacja

P. Amrozik
K. Halagan
K. Rudnicki

- Rok 2021

Pełny tekst do pobrania w serwisie zewnętrznym

A CMOS Pixel With Embedded ADC, Digital CDS and Gain Correction Capability for Massively Parallel Imaging Array

Publikacja

- IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS - Rok 2017

In the paper, a CMOS pixel has been proposed for imaging arrays with massively parallel image acquisition and simultaneous compensation of dark signal nonuniformity (DSNU) as well as photoresponse nonuniformity (PRNU). In our solution the pixel contains all necessary functional blocks: a photosensor and an analog-to-digital converter (ADC) with built-in correlated double sampling (CDS) integrated together. It is implemented in...

Pełny tekst do pobrania w portalu

Low-Power Receivers for Wireless Capacitive Coupling Transmission in 3-D-Integrated Massively Parallel CMOS Imager

Publikacja

- IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS - Rok 2020

The paper presents pixel receivers for massively parallel transmission of video signal between capacitive coupled integrated circuits (ICs). The receivers meet the key requirements for massively parallel transmission, namely low-power consumption below a single μW, small area of less than 205 μm2, high sensitivity better than 160 mV, and good immunity to crosstalk. The receivers were implemented and measured in a 3-D IC (two face-to-face...

Pełny tekst do pobrania w portalu

Molecular Simulations Using Boltzmann’s Thermally Activated Diffusion - Implementation on ARUZ – Massively-parallel FPGA-based Machine

Publikacja

G. Jablonski
P. Amrozik
K. Halagan

- Rok 2021

Pełny tekst do pobrania w serwisie zewnętrznym

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Publikacja

- Scientific Programming - Rok 2020

This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals,...

Pełny tekst do pobrania w portalu

Massively parallel linear-scaling Hartree–Fock exchange and hybrid exchange–correlation functionals with plane wave basis set accuracy

Publikacja

J. Dziedzic
J. C. Womack
R. Ali
C. Skylaris

- JOURNAL OF CHEMICAL PHYSICS - Rok 2021

We extend our linear-scaling approach for the calculation of Hartree–Fock exchange energy using localized in situ optimized orbitals [Dziedzic et al., J. Chem. Phys. 139, 214103 (2013)] to leverage massive parallelism. Our approach has been implemented in the ONETEP (Order-N Electronic Total Energy Package) density functional theory framework, which employs a basis of non-orthogonal generalized Wannier functions (NGWFs) to achieve...

Pełny tekst do pobrania w portalu

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Publikacja

- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023

In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym

Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption

Publikacja

P. Rościszewski

- Rok 2018

Many important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various characteristics. Utilizing full power of such systems requires programming parallel applications that are hybrid in two meanings: they can utilize parallelism on multiple levels at the same time and combine together programming interfaces...

Pełny tekst do pobrania w serwisie zewnętrznym

Implementation of Molecular Dynamics and Its Extensions with the Coarse-Grained UNRES Force Field on Massively Parallel Systems: Toward Millisecond-Scale Simulations of Protein Structure, Dynamics, and Thermodynamics

Publikacja

A. Liwo
S. Ołdziej
C. Czaplewski
D. Kleinerman
P. Blood
H. Scheraga

- Journal of Chemical Theory and Computation - Rok 2010

Pełny tekst do pobrania w serwisie zewnętrznym

Machine Learning in Multi-Agent Systems using Associative Arrays

Publikacja

P. Spychalski
R. Arendt

- PARALLEL COMPUTING - Rok 2018

In this paper, a new machine learning algorithm for multi-agent systems is introduced. The algorithm is based on associative arrays, thus it becomes less complex and more efficient substitute of artificial neural networks and Bayesian networks, which is confirmed by performance measurements. Implementation of machine learning algorithm in multi-agent system for aided design of selected control systems allowed to improve the performance...

Pełny tekst do pobrania w portalu

Drawing maps with advice

Publikacja

D. Dereniowski
A. Pelc

- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING - Rok 2012

Rozważamy następujący problem obliczeniowy. Agent zostaje umieszczony w wierzchołku nieznanego mu grafu. Wierzchołki grafu są nierozróżnialne, natomiast krawędzie posiadają numery portów. Zadaniem agenta jest wyznaczenie mapy, tzn. obliczenie izomorficznej kopii grafu, lub obliczenie dowolnego drzewa spinającego grafu. Bez dodatkowej informacji zadań tych nie można wykonać. W artykule wyznaczamy oszacowania na minimalną liczbę...

Pełny tekst do pobrania w serwisie zewnętrznym

General Provisioning Strategy for Local Specialized Cloud Computing Environments

Publikacja

- Rok 2023

The well-known management strategies in cloud computing based on SLA requirements are considered. A deterministic parallel provisioning algorithm has been prepared and used to show its behavior for three different requirements: load balancing, consolidation, and fault tolerance. The impact of these strategies on the total execution time of different sets of services is analyzed for randomly chosen sets of data. This makes it possible...

Pełny tekst do pobrania w portalu

Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge

Publikacja

- Rok 2020

Auto-tuning of configuration and application param- eters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous...

Pełny tekst do pobrania w portalu

An Ultra-Low-Energy Analog Comparator for A/D Converters in CMOS Image Sensors

Publikacja

W. Jendernalik

- CIRCUITS SYSTEMS AND SIGNAL PROCESSING - Rok 2017

This paper proposes a new solution of an ultra-low-energy analog comparator, dedicated to slope analog-to-digital converters (ADC), particularly suited for CMOS image sensors (CISs) featuring a large number of ADCs. For massively parallel imaging arrays, this number may be as high as tens-hundreds of thousands ADCs. As each ADC includes an analog comparator, the number of these comparators in CIS is always high. Detailed analysis...

Pełny tekst do pobrania w portalu

Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym

Publikacja

- Rok 2015

A method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...

In-ADC, Rank-Order Filter for Digital Pixel Sensors

Publikacja

- Electronics - Rok 2024

This paper presents a new implementation of the rank-order filter, which is established on a parallel-operated array of single-slope (SS) analog-to-digital converters (ADCs). The SS ADCs use an “on-the-ramp processing” technique, i.e., filtration is performed along with analog-to-digital conversion, so the final states of the converters represent a filtered image. A proof-of-concept 64 × 64 array of SS ADCs, integrated with MOS...

Pełny tekst do pobrania w portalu

Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors

Publikacja

P. Czarnul

- Rok 2018

The paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...

Pełny tekst do pobrania w serwisie zewnętrznym

Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems

Publikacja

P. Rościszewski

- Rok 2014

Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...

Pełny tekst do pobrania w serwisie zewnętrznym

MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems

Publikacja

- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2017

In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...

Pełny tekst do pobrania w portalu

Optimization of parallel implementation of UNRES package for coarse‐grained simulations to treat large proteins

Publikacja

A. Sieradzan
J. Sans‐Duñó
E. Lubecka
C. Czaplewski
A. Lipska
H. Leszczyński
K. Ocetkiewicz
J. Proficz
P. Czarnul
H. Krawczyk
A. Liwo

- JOURNAL OF COMPUTATIONAL CHEMISTRY - Rok 2023

We report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms...

Pełny tekst do pobrania w portalu

Parallel Background Subtraction in Video Streams Using OpenCL on GPU Platforms

Publikacja

G. Szwoch

- Rok 2014

Implementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU architecture, such as memory access,...

Pełny tekst do pobrania w serwisie zewnętrznym

Fixed Pattern Noise Reduction and Linearity Improvement in Time-Mode CMOS Image Sensors

Publikacja

M. Kłosowski
Y. Sun

- SENSORS - Rok 2020

In the paper, a digital clock stopping technique for gain and offset correction in time-mode analog-to-digital converters (ADCs) has been proposed. The technique is dedicated to imagers with massively parallel image acquisition working in the time mode where compensation of dark signal non-uniformity (DSNU) as well as photo-response non-uniformity (PRNU) is critical. Fixed pattern noise (FPN) reduction has been experimentally validated...

Pełny tekst do pobrania w portalu

Towards an efficient multi-stage Riemann solver for nuclear physics simulations

Publikacja

S. Cygert
J. Porter-Sobieraj
D. Kikoła
J. Sikorski
M. Słodkowski

- Rok 2013

Relativistic numerical hydrodynamics is an important tool in high energy nuclear science. However, such simulations are extremely demanding in terms of computing power. This paper focuses on improving the speed of solving the Riemann problem with the MUSTA-FORCE algorithm by employing the CUDA parallel programming model. We also propose a new approach to 3D finite difference algorithms, which employ a GPU that uses surface memory....

Pełny tekst do pobrania w serwisie zewnętrznym

Mechanism of recognition of parallel G-quadruplexes by DEAH/RHAU helicase DHX36 explored by molecular dynamics simulations

Publikacja

- Computational and Structural Biotechnology Journal - Rok 2021

Because of high stability and slow unfolding rates of G-quadruplexes (G4), cells have evolved specialized helicases that disrupt these non-canonical DNA and RNA structures in an ATP-dependent manner. One example is DHX36, a DEAH-box helicase, which participates in gene expression and replication by recognizing and unwinding parallel G4s. Here, we studied the molecular basis for the high affinity and specificity of DHX36 for parallel-type...

Pełny tekst do pobrania w portalu

Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications

Publikacja

P. Czarnul

- Electronics - Rok 2021

The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Pełny tekst do pobrania w portalu

Acceleration of the DGF-FDTD method on GPU using the CUDA technology

Publikacja

- Rok 2015

We present a parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD) method on a graphics processing unit (GPU). The compute unified device architecture (CUDA) parallel computing platform is applied in the developed implementation. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of DGF-FDTD on GPU. The efficiency of parallel computations...

Pełny tekst do pobrania w serwisie zewnętrznym

A Power-Efficient Digital Technique for Gain and Offset Correction in Slope ADCs

Publikacja

M. Kłosowski

- IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS - Rok 2020

In this brief, a power-efficient digital technique for gain and offset correction in slope analog-to-digital converters (ADCs) has been proposed. The technique is especially useful for imaging arrays with massively parallel image acquisition where simultaneous compensation of dark signal non-uniformity (DSNU) as well as photo-response non-uniformity (PRNU) is critical. The presented approach is based on stopping the ADC clock by...

Pełny tekst do pobrania w portalu

Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC

Publikacja

P. Czarnul

- Rok 2002

This work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...

Pełny tekst do pobrania w serwisie zewnętrznym

Performance/energy aware optimization of parallel applications on GPUs under power capping

Publikacja

- Rok 2020

In the paper we present an approach and results from application of the modern power capping mechanism available for NVIDIA GPUs to the bench- marks such as NAS Parallel Benchmarks BT, SP and LU as well as cublasgemm- benchmark which are widely used for assessment of high performance computing systems’ performance. Specifically, depending on the benchmarks, various power cap configurations are best for desired trade-off of performance...

Pełny tekst do pobrania w portalu

Performance Analysis of the OpenCL Environment on Mobile Platforms

Publikacja

- Rok 2022

Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Pełny tekst do pobrania w serwisie zewnętrznym

A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache

Publikacja

A. Malinowski
P. Czarnul
P. Dorożyński
K. Czuryło
Ł. Dorau
M. Maciejewski
P. Skowron

- Annals of Computer Science and Information Systems - Rok 2016

While many scientiﬁc, large-scale applications are data-intensive, fast and efﬁcient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind...

Pełny tekst do pobrania w portalu

Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Publikacja

- Rok 2017

In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Pełny tekst do pobrania w serwisie zewnętrznym

Three levels of fail-safe mode in MPI I/O NVRAM distributed cache

Publikacja

- Procedia Computer Science - Rok 2018

The paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which...

Pełny tekst do pobrania w portalu

Network-aware Data Prefetching Optimization of Computations in a Heterogeneous HPC Framework

Publikacja

P. Rościszewski

- International Journal of Computer Networks & Communications (IJCNC) - Rok 2014

Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting of multiple collections of nodes with different types of computing devices. The execution engine of the system is open for...

Pełny tekst do pobrania w portalu

Video Analytics-Based Algorithm for Monitoring Egress from Buildings

Publikacja

- Rok 2013

A concept and practical implementation of the algorithm for detecting of potentially dangerous situations of crowding in passages is presented. An example of such situation is a crush which may be caused by obstructed pedestrian pathway. Surveillance video camera signal analysis performed on line is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of implemented algorithm which uses...

Pełny tekst do pobrania w serwisie zewnętrznym

Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors

Publikacja

- International Journal of Computer Information Systems and Industrial Management Applications - Rok 2020

In the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...

Pełny tekst do pobrania w serwisie zewnętrznym

Surface diffusion and cluster formation of gold on the silicon (111)

Publikacja

- Journal of Achievements in Materials and Manufacturing Engineering - Rok 2020

Purpose: Investigation of the gold atoms behaviour on the surface of silicon by molecular dynamics simulation method. The studies were performed for the case of one, two and four atoms, as well as incomplete and complete filling of gold atoms on the silicon surface. Design/methodology/approach: Investigations were performed by the method of molecular dynamics simulation using the Large-scale Atomic/Molecular Massively Parallel...

Pełny tekst do pobrania w portalu

Use of ICT infrastructure for teaching HPC

Publikacja

- Rok 2019

In this paper we look at modern ICT infrastructure as well as curriculum used for conducting a contemporary course on high performance computing taught over several years at the Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology, Poland. We describe the infrastructure in the context of teaching parallel programming at the cluster level using MPI, node level using OpenMP and CUDA. We present...

Pełny tekst do pobrania w serwisie zewnętrznym

Tuning matrix-vector multiplication on GPU

Publikacja

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010

A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...

Behavior Analysis and Dynamic Crowd Management in Video Surveillance System

Publikacja

- Rok 2011

A concept and practical implementation of a crowd management system which acquires input data by the set of monitoring cameras is presented. Two leading threads are considered. First concerns the crowd behavior analysis. Second thread focuses on detection of a hold-ups in the doorway. The optical flow combined with soft computing methods (neural network) is employed to evaluate the type of crowd behavior, and fuzzy logic aids detection...

Pełny tekst do pobrania w serwisie zewnętrznym

A Regular Expression Matching Application with Configurable Data Intensity for Testing Heterogeneous HPC Systems

Publikacja

- Rok 2014

Modern High Performance Computing (HPC) systems are becoming increasingly heterogeneous in terms of utilized hardware, as well as software solutions. The problems, that we wish to efficiently solve using those systems have different complexity, not only considering magnitude, but also the type of complexity: computation, data or communication intensity. Developing new mechanisms for dealing with those complexities or choosing an...

Distributed NVRAM Cache – Optimization and Evaluation with Power of Adjacency Matrix

Publikacja

- Rok 2017

In this paper we build on our previously proposed MPI I/O NVRAM distributed cache for high performance computing. In each cluster node it incorporates NVRAMs which are used as an intermediate cache layer between an application and a file for fast read/write operations supported through wrappers of MPI I/O functions. In this paper we propose optimizations of the solution including handling of write requests with a synchronous mode,...

Pełny tekst do pobrania w serwisie zewnętrznym

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: massively parallel computing