Wyniki wyszukiwania dla: INTEL OPTANE PMEM

Wyniki wyszukiwania dla: INTEL OPTANE PMEM

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 72

wyczyść wszystkie filtry niedostępne

Electromagnetic Simulations with 3D FEM and Intel Optane Persistent Memory
Publikacja
- M. Jakubowski
- P. Sypek
- Rok 2022
Abstract—Intel Optane persistent memory has the potential to induce a change in how high-performance calculations requiring a large system memory capacity are conducted. This article presents what this change may look like in the case of factorization of large sparse matrices describing electromagnetic problems arising in the 3D FEM analysis of passive highfrequency components. In numerical tests, the Intel oneAPI MKL PARDISO was...

Pełny tekst do pobrania w portalu
Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
Publikacja
- P. Czarnul
- Rok 2018
The paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...

Pełny tekst do pobrania w serwisie zewnętrznym
Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi
Publikacja
- A. Malinowski
- International Journal of Information Technology and Computer Science - Rok 2015
Parallel algorithms are popular method of increasing system performance. Apart from showing their properties using asymptotic analysis, proof-of-concept implementation and practical experiments are often required. In order to speed up the development and provide simple and easily accessible testing environment that enables execution of reliable experiments, the paper proposes a platform with multi-core computational accelerator:...

Pełny tekst do pobrania w serwisie zewnętrznym
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
Publikacja
- P. Czarnul
- INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING - Rok 2016
The paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...

Pełny tekst do pobrania w portalu
BENEFITS FROM BREAKING UP WITH LINUX NATIVE PACKET PROCESSING WHILE USING INTEL DPDK LIBRARIES
Publikacja
- M. Wieczerzycki
- M. Landowski
- S. Kaczmarek
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2015
The Intel Data Plane Development Kit (DPDK) is a set of libraries and drivers for fast packet processing in Linux. It is a dedicated framework for building efficient high-speed data plane applications supporting QoS features with poll mode drivers which are supporting virtual and physical NIC’s so environment can be used to build efficient data plane applications for packet networks. The results of test on Quality of Service Metering...
Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA
Publikacja
- M. J. Adiletta
- J. J. Tithi
- E. Farsarakis
- G. Gerogiannis
- R. Adolf
- R. Benke
- S. Kashyap
- S. Hsia
- K. Lakhotia
- F. Petrini... i 2 innych
- Rok 2023
Large-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems is challenging due to a large memory footprint, sparse computational patterns, and irregular memory accesses with poor locality. Intel’s Programmable Integrated Unffied Memory Architecture (PIUMA) is designed to address these challenges for graph analytics. In this paper, a detailed characterization of GCNs is presented using the Open-Graph Benchmark...

Pełny tekst do pobrania w serwisie zewnętrznym
Paweł Czarnul dr hab. inż.

Osoby

Dział Usług Chmurowych, Wydział Elektroniki, Telekomunikacji i Informatyki, Katedra Architektury Systemów Komputerowych

Paweł Czarnul uzyskał stopień doktora habilitowanego w dziedzinie nauk technicznych w dyscyplinie informatyka w roku 2015 zaś stopień doktora nauk technicznych w zakresie informatyki(z wyróżnieniem) nadany przez Radę Wydziału Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej w roku 2003. Dziedziny jego zainteresowań obejmują: przetwarzanie równoległei rozproszone w tym programowanie równoległe na klastrach obliczeniowych,...
Bogdan Pankiewicz dr hab. inż.

Osoby

Katedra Systemów Mikroelektronicznych

Bogdan Pankiewicz ukończył w 1993 r. Wydział Elektroniki Politechniki Gdańskiej, specjalność układy elektroniczne a w 2002 r. uzyskał stopień doktora w dziedzinie elektroniki na Wydziale ETI, PG. Od początku kariery jest związany z Politechniką Gdańską: najpierw jako asystent (lata 1994–2002), a następnie jako adiunkt (od 2002 r.) na Wydziale Elektroniki, Telekomunikacji i Informatyki. Zajmuje się projektowaniem analogowych i cyfrowych...
Marek Wójcikowski dr hab. inż.

Osoby

Katedra Systemów Mikroelektronicznych

Marek Wójcikowski ukończył w 1993 r. Wydział Elektroniki Politechniki Gdańskiej, specjalność układy elektroniczne. W 2002 r. uzyskał stopień doktora w dziedzinie elektroniki, a w 2016 r. uzyskał stopień doktora habilitowanego na Wydziale Elektroniki Telekomunikacji i Informatyki Politechniki Gdańskiej. Od początku kariery jest związany z Politechniką Gdańską: najpierw jako asystent (lata 1994–2002), a następnie jako adiunkt (od...
Jerzy Proficz dr hab. inż.

Osoby

Centrum Informat. Trójmiejskiej Akadem.Sieci Komputerowej, Katedra Architektury Systemów Komputerowych

Jerzy Proficz – dyrektor Centrum Informatycznego Trójmiejskiej Akademickiej Sieci Komputerowej (CI TASK) na Politechnice Gdańskiej. Uzyskał stopień naukowy doktora habilitowanego (2022) w dyscyplinie: Informatyka techniczna i telekomunikacja. Autor i współautor ponad 50 artykułów w czasopismach i na konferencjach naukowych związanych głównie z równoległym przetwarzaniem danych na komputerach dużej mocy (HPC, chmura obliczeniowa). Udział...
Implementacja w FPGA algorytmu detekcji krawędzi obrazu w czasie rzeczywistym
Publikacja
- P. Kowalski
- R. Smyk
- Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej - Rok 2018
W artykule przedstawiono projekt architektury oraz implementację układową toru przetwarzania wstępnego obrazu z modułem detekcji krawędzi. Układ został zaimplementowany w FPGA Intel Cyclone. Zrealizowany moduł wykorzystuje pięć wybranych algorytmów wykrywania krawędzi, w tym Robertsa, Sobela i Prewitt.

Pełny tekst do pobrania w portalu
Józef Woźniak prof. dr hab. inż.

Osoby

Prof. dr hab. inż. Józef Woźniak prof. zw. Politechniki Gdańskiej ukończył studia na Wydziale Elektroniki Politechniki Gdańskiej w 1971 r. W 1976 r. uzyskał stopień doktora nauk technicznych, a w 1991 r. stopień doktora habilitowanego w dyscyplinie telekomunikacja i specjalności teleinformatyka. W styczniu roku 2002 otrzymał tytuł profesora nauk technicznych. W 1994 r. został mianowany na stanowisko profesora nadzwyczajnego w Politechnice...
Sprzętowa implementacja transformacji Hougha w czasie rzeczywistym
Publikacja
- P. Kowalski
- R. Smyk
- Poznan University of Technology Academic Journals. Electrical Engineering - Rok 2021
W artykule przedstawiono implementację sprzętową w FPGA algorytmu do wykrywania kształtów aproksymowanych zbiorem linii prostych podczas przetwarzania obrazu cyfrowego w czasie rzeczywistym. W opracowanej strukturze sprzętowej podniesiono efektywność przetwarzania poprzez zastosowanie przetwarzania przepływowego, lookup table, wykorzystanie wyłącznie arytmetyki liczb całkowitych oraz rozproszenie pamięci głosowania. Eksperymentalnie...

Pełny tekst do pobrania w portalu
Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs
Publikacja
- B. Gawrych
- P. Czarnul
- Rok 2023
Considering ongoing developments of both modern CPUs, especially in the context of increasing numbers of cores, cache memory and architectures as well as compilers there is a constant need for benchmarking representative and frequently run workloads. The key metric is speed-up as the computational power of modern CPUs stems mainly from using multiple cores. In this paper, we show and discuss results from running codes such as:...

Pełny tekst do pobrania w serwisie zewnętrznym
GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM
Publikacja
- Communications in Computational Physics - Rok 2017
This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higherorder FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from...

Pełny tekst do pobrania w serwisie zewnętrznym
Karol Zdzisław Zalewski mgr inż.

Osoby
Extended investigation of performance-energy trade-offs under power capping in HPC environments
Publikacja
- Rok 2019
—In the paper we present investigation of performance-energy trade-offs under power capping using modern processors. The results are presented for systems targeted at both server and client markets and were collected from Intel Xeon E5 and Intel Xeon Phi server processors as well as from desktop and mobile Intel Core i7 processors. The results, when using power capping, show that we can find various interesting combinations of...
Optimal programming of critical sections in modern network processors under performance requirements.
Publikacja
- H. Krawczyk
- T. Madajczak
- Rok 2004
Przegląd konstrukcji i zastosowań metod programowania sekcji krytycznych w nowoczesnych procesorach sieciowych rodziny Intel IXP. Porównanie wydajnościowe w formie tabeli.
SEM images of Ni-Mo2CTx/Mo2Ga2C before and after catalytic dry reforming of methane
Dane Badawcze
open access
- I. Frąckiewicz
The dataset includes SEM images of Ni-Mo2CTx/Mo2Ga2C catalysts before and after the dry reforming of methane.
Investigation of Performance and Energy Consumption of Tokenization Algorithms on Multi-core CPUs Under Power Capping
Publikacja
- Rok 2024
In this paper we investigate performance-energy optimization of tokenizer algorithm training using power capping. We focus on parallel, multi-threaded implementations of Byte Pair Encoding (BPE), Unigram, WordPiece, and WordLevel run on two systems with different multi-core CPUs: Intel Xeon 6130 and desktop Intel i7-13700K. We analyze execution times and energy consumption for various numbers of threads and various power caps and...

Pełny tekst do pobrania w portalu
SEM images of Ni-Mo2CTx/Mo3AlC2 before and after catalytic dry reforming of methane
Dane Badawcze
open access
- I. Frąckiewicz
The dataset includes SEM images of Ni-Mo2CTx/Mo2Ga2C catalysts before and after the dry reforming of methane.
Block-based Representation of Application Execution on Modern Parallel Systems
Publikacja
- P. Czarnul
- Rok 2013
The chapter presents how to model execution of a parallel computational application that is to be executed in a large-scale parallel or distributed environment with potentially thousands to millions of execution units. The representation uses pre- viously attributes and factors representative of modern high performance systems including multicore CPUs, GPUs, dedicated accelerators such as Intel Phi.
Programowanie w Asemblerze oraz Oprogramowanie Mikrokomputerów
Kursy Online
- K. Cisowski
Nauka podstaw programowania w asemblerze procesorów rodziny Intel 8086
Wykłady Otwarte TAJP

Wydarzenia

24-05-2018 11:15 - 24-05-2018 13:00

"Wymagania IoT na autonomicznych platformach mobilnych" – wykład dr. Marka Zmudy z Intel Technologies w ramach przedmiotu "Współczesne systemy elektroniki morskiej".
A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU
Publikacja
- SIAM JOURNAL ON SCIENTIFIC COMPUTING - Rok 2015
In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....

Pełny tekst do pobrania w serwisie zewnętrznym
Wykłady Otwarte TAJP

Wydarzenia

19-04-2018 11:05 - 19-04-2018 12:35

Cyberbezpieczeństwo jednostek autonomicznych - wykład prowadzony przez dr. inż. Marka Zmudę z firmy Intel Technology Poland w ramach przedmiotu "Współczesne systemy elektroniki morskiej".
Playing the Sprint Retrospective
Publikacja
- M. Wawryk
- Y. Y. Ng
- Annals of Computer Science and Information Systems - Rok 2019
In agile software development, where great emphasis is put on effective informal communication, success depends heavily on human and social factors. However, Scrum does not specify any techniques that aid the human side of software development. In this paper we investigate the use of 6 collaborative games for the Sprint Retrospective. Each game was implemented twice in a Scrum team in Intel Technology Poland. The received feedback...

Pełny tekst do pobrania w portalu
KRICO Komponent rekomendacji dla inteligentnych chmur obliczeniowych

Projekty

Kierownik projektu: prof. dr hab. inż. Henryk Krawczyk Program finansujący: INNOTECH

Projekt realizowany w Centrum Informat. Trójmiejskiej Akadem.Sieci Komputerowej z dnia 2014-06-10
Optimization of the System for Determining the Volume of Tissue Needed for Breast Reconstruction
Publikacja
- J. Czałpińska
- A. Janicka
- J. Rzepkowski
- M. Kaczmarek
- T. Kocejko
- J. Kang-Hyun
- Rok 2023
This article presents techniques for reconstructing surfaces and volume calculations using a point cloud generated from 3D imaging. The main objective of this article was to optimize the voxel size for the most accurate representation of the surface of the female breast. We experimented with different methods for determining volume using images from the Intel D435i camera. In addition, we designed application and measurement station...

Pełny tekst do pobrania w serwisie zewnętrznym
3D-Breast System for Determining the Volume of Tissue Needed for Breast Reconstruction
Publikacja
- Rok 2024
3D imaging systems can be used to effectively determine breast volumes for surgical applications. This article presents methods for surface reconstruction and volume determination based on the point cloud created by 3D imaging. Such a system would be used to accurately estimate breast volume in patients classified for breast reconstruction surgery at plastic surgery centers. To develop such a system, various methods of determining...

Pełny tekst do pobrania w serwisie zewnętrznym
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
Publikacja
- P. Czarnul
- Electronics - Rok 2021
The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Pełny tekst do pobrania w portalu
Parallelization of large vector similarity computations in a hybrid CPU+GPU environment
Publikacja
- P. Czarnul
- JOURNAL OF SUPERCOMPUTING - Rok 2018
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu
Superkomputery do wspomagania procesów gospodarczych ze szczególnym uwzględnieniem sektora bankowego
Publikacja
- H. Balicka
- J. Balicki
- W. Korłub
- J. Paluszak
- M. Zadroga
- Współczesna Gospodarka - Rok 2014
W artykule omówiono wykorzystanie superkomputerów do wspomagania procesów gospodarczych ze szczególnym uwzględnieniem sektora bankowego. Odniesiono się do wybranych projektów wspierających rozwój gospodarczy w oparciu o superkomputery. W szczególności zaproponowano zastosowanie HPC do implementacji wybranych metod sztucznej inteligencji w bankowości, w tym oceny ryzyka wybranych przedsięwzięć. Zaproponowane podejście umożliwia...

Pełny tekst do pobrania w portalu
"3D-Breast System for Determining the Volume of Tissue Needed for Breast Reconstruction"
Publikacja
- G. Małyszko
- K. Ostrowska
- Rok 2023
This article presents methods for surface reconstruction and volume determination based on the point cloud created by 3D imaging. Such a system would be used to accurately estimate breast volume in patients classified for breast reconstruction surgery at plastic surgery centers. To develop such a system, various methods of determining volume, based on images from the Intel D435i camera, were tested. In addition, an application...
DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing
Publikacja
- SOFTWARE-PRACTICE & EXPERIENCE - Rok 2022
In the article we propose an automatic power capping software tool DEPO that allows one to perform runtime optimization of performance and energy related metrics. For an assumed application model with an initialization phase followed by a running phase with uniform compute and memory intensity, the tool performs automatic tuning engaging one of the two exploration algorithms—linear search (LS) and golden section search (GSS), finds...

Pełny tekst do pobrania w serwisie zewnętrznym
Zespolone techniki informatyczne w analizie struktury produkcji i wynikówekonomicznych dla korporacji przemysłu elektronicznego. Zamoj. Stud. i Ma-ter.**2003 z. 1 s. 109-116, 6 rys. 2 tab. bibliogr. 11 poz. Seria: Informatyka. Materiały z konferencji '' Informatyka w szkole''.
Publikacja
- P. Brudło
- T. Ratajczak
- Rok 2003
W artykule przeanalizowano strukturę produkcji oraz wyniki ekonomiczne czte-rech wiodących korporacji pzremysłu elektronicznego: Analog Devices, Intel,Texas Instruments i Motorola. Przy wykorzystaniu pakietu Microsoft Acces o-pracowana została baza danych zawierająca podstawowe informacje o korporac-jach. Zaproponowano i wykonano analizy porównawcze w zakresie finansowo-eko-nomicznym oraz przedstawiono przewidywania dotyczące...
Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors
Publikacja
- P. Czarnul
- K. Jabłońska
- International Journal of Computer Information Systems and Industrial Management Applications - Rok 2020
In the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...

Pełny tekst do pobrania w serwisie zewnętrznym
Taking advantage of the shared explicit cache system based critical sections in the shared memory parallel architectures
Publikacja
- T. Madajczak
- Rok 2006
Artykuł prezentuje nową metodę implementacji sekcji krytycznych w równoległych architekturach z pamięcią współdzieloną, takich jak systemy zintegrowane wielowątkowe wieloprocesorowe. Metoda stanowi modyfikację i rozbudowanie metody zwanej Folding, dostępnej w procesorach sieciowych oraz jest w założeniach podobna do techniki zwanej cache-based locking. W porównaniu do dostępnych metod, nowa metoda usuwa problemy skalowalności i...
Tuning matrix-vector multiplication on GPU
Publikacja
- A. Dziekoński
- M. Mrozowski
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010
A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...
Intelligent system supporting diagnosis of malignant melanoma
Publikacja
- Rok 2017
Malignant melanomas are the most deadly type of skin cancers. Early diagnosis is a key for successful treatment and survival. The paper presents the system for supporting the process of diagnosis of skin lesions in order to detect a malignant melanoma. The paper describes the development process of an intel-ligent system purposed for the diagnosis of malignant melanoma. Presented sys-tem can be used as a decision support system...

Pełny tekst do pobrania w serwisie zewnętrznym
GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]
Publikacja
- IEEE ANTENNAS AND PROPAGATION MAGAZINE - Rok 2014
This paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...

Pełny tekst do pobrania w serwisie zewnętrznym
Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method
Publikacja
- A. Dziekoński
- M. Mrozowski
- RADIOENGINEERING - Rok 2018
This paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...

Pełny tekst do pobrania w portalu
Making agile retrospectives more awesome
Publikacja
- A. Przybyłek
- D. Kotecka
- Rok 2017
According to the textbook [23], Scrum exists only in its entirety, where every component is essential to Scrum’s success. However, in many organizational environments some of the components are omitted or modified in a way that is not aligned with the Scrum guidelines. Usually, such deviations result in missing the full benefits of Scrum [24]. Thereby, a Scrum process should be frequently inspected and any deviations should be...

Pełny tekst do pobrania w portalu
The management methods of the hardware and virtual threads in the integrated multiprocessor shared memory architectures
Publikacja
- T. Madajczak
- Rok 2006
Rozprawa doktorska skupiona jest na problematyce efektywnego zarządzania bezpośredniego wątkami sprzętowymi i jednostkami przetwarzającymi, a również zarządzania pośredniego poprzez wątki wirtualne (zadania współbieżne). Omawia ona dostępne technologie wątków sprzętowych i porządkuje metodologie ich wykorzystania. Główna myślą przewodnią pracy jest stwierdzenie, że synchronizacja i zarządzanie wątkami sprzętowymi oraz wirtualnymi...
Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
Publikacja
- A. Dziekoński
- M. Mrozowski
- IEEE Antennas and Wireless Propagation Letters - Rok 2018
In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Pełny tekst do pobrania w serwisie zewnętrznym
Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs
Publikacja
- Rok 2022
In the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...

Pełny tekst do pobrania w portalu
A memory efficient and fast sparse matrix vector product on a Gpu
Publikacja
- Progress in Electromagnetics Research-PIER - Rok 2011
This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Pełny tekst do pobrania w serwisie zewnętrznym
Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments
Publikacja
- Scientific Programming - Rok 2019
The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods. System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. Optimization goals include various combinations of...

Pełny tekst do pobrania w portalu
Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system
Publikacja
- J. Skrzypczak
- P. Czarnul
- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023
In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym
Accurate Lightweight Calibration Methods for Mobile Low-Cost Particulate Matter Sensors
Publikacja
- P. Jørstad
- M. Wójcikowski
- T. Cao
- J. Lepioufle
- K. Wojtkiewicz
- P. H. Ha
- Rok 2023
Monitoring air pollution is a critical step towards improving public health, particularly when it comes to identifying the primary air pollutants that can have an impact on human health. Among these pollutants, particulate matter (PM) with a diameter of up to 2.5 μ m (or PM2.5) is of particular concern, making it important to continuously and accurately monitor pollution related to PM. The emergence of mobile low-cost PM sensors...

Pełny tekst do pobrania w serwisie zewnętrznym

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: INTEL OPTANE PMEM

Paweł Czarnul dr hab. inż.

Bogdan Pankiewicz dr hab. inż.

Marek Wójcikowski dr hab. inż.

Jerzy Proficz dr hab. inż.

Józef Woźniak prof. dr hab. inż.

Karol Zdzisław Zalewski mgr inż.