Wyniki wyszukiwania dla: INTEL

Block-based Representation of Application Execution on Modern Parallel Systems

Publikacja

P. Czarnul

- Rok 2013

The chapter presents how to model execution of a parallel computational application that is to be executed in a large-scale parallel or distributed environment with potentially thousands to millions of execution units. The representation uses pre- viously attributes and factors representative of modern high performance systems including multicore CPUs, GPUs, dedicated accelerators such as Intel Phi.

A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU

Publikacja

- SIAM JOURNAL ON SCIENTIFIC COMPUTING - Rok 2015

In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on Graphics Processing Units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation....

Pełny tekst do pobrania w serwisie zewnętrznym

Sprzętowa implementacja transformacji Hougha w czasie rzeczywistym

Publikacja

- Poznan University of Technology Academic Journals. Electrical Engineering - Rok 2021

W artykule przedstawiono implementację sprzętową w FPGA algorytmu do wykrywania kształtów aproksymowanych zbiorem linii prostych podczas przetwarzania obrazu cyfrowego w czasie rzeczywistym. W opracowanej strukturze sprzętowej podniesiono efektywność przetwarzania poprzez zastosowanie przetwarzania przepływowego, lookup table, wykorzystanie wyłącznie arytmetyki liczb całkowitych oraz rozproszenie pamięci głosowania. Eksperymentalnie...

Pełny tekst do pobrania w portalu

Playing the Sprint Retrospective

Publikacja

M. Wawryk
Y. Y. Ng

- Annals of Computer Science and Information Systems - Rok 2019

In agile software development, where great emphasis is put on effective informal communication, success depends heavily on human and social factors. However, Scrum does not specify any techniques that aid the human side of software development. In this paper we investigate the use of 6 collaborative games for the Sprint Retrospective. Each game was implemented twice in a Scrum team in Intel Technology Poland. The received feedback...

Pełny tekst do pobrania w portalu

Optimization of the System for Determining the Volume of Tissue Needed for Breast Reconstruction

Publikacja

- Rok 2023

This article presents techniques for reconstructing surfaces and volume calculations using a point cloud generated from 3D imaging. The main objective of this article was to optimize the voxel size for the most accurate representation of the surface of the female breast. We experimented with different methods for determining volume using images from the Intel D435i camera. In addition, we designed application and measurement station...

Pełny tekst do pobrania w serwisie zewnętrznym

3D-Breast System for Determining the Volume of Tissue Needed for Breast Reconstruction

Publikacja

- Rok 2024

3D imaging systems can be used to effectively determine breast volumes for surgical applications. This article presents methods for surface reconstruction and volume determination based on the point cloud created by 3D imaging. Such a system would be used to accurately estimate breast volume in patients classified for breast reconstruction surgery at plastic surgery centers. To develop such a system, various methods of determining...

Pełny tekst do pobrania w serwisie zewnętrznym

Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications

Publikacja

P. Czarnul

- Electronics - Rok 2021

The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...

Pełny tekst do pobrania w portalu

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Publikacja

P. Czarnul

- JOURNAL OF SUPERCOMPUTING - Rok 2018

The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector...

Pełny tekst do pobrania w portalu

Superkomputery do wspomagania procesów gospodarczych ze szczególnym uwzględnieniem sektora bankowego

Publikacja

- Współczesna Gospodarka - Rok 2014

W artykule omówiono wykorzystanie superkomputerów do wspomagania procesów gospodarczych ze szczególnym uwzględnieniem sektora bankowego. Odniesiono się do wybranych projektów wspierających rozwój gospodarczy w oparciu o superkomputery. W szczególności zaproponowano zastosowanie HPC do implementacji wybranych metod sztucznej inteligencji w bankowości, w tym oceny ryzyka wybranych przedsięwzięć. Zaproponowane podejście umożliwia...

Pełny tekst do pobrania w portalu

"3D-Breast System for Determining the Volume of Tissue Needed for Breast Reconstruction"

Publikacja

- Rok 2023

This article presents methods for surface reconstruction and volume determination based on the point cloud created by 3D imaging. Such a system would be used to accurately estimate breast volume in patients classified for breast reconstruction surgery at plastic surgery centers. To develop such a system, various methods of determining volume, based on images from the Intel D435i camera, were tested. In addition, an application...

DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing

Publikacja

- SOFTWARE-PRACTICE & EXPERIENCE - Rok 2022

In the article we propose an automatic power capping software tool DEPO that allows one to perform runtime optimization of performance and energy related metrics. For an assumed application model with an initialization phase followed by a running phase with uniform compute and memory intensity, the tool performs automatic tuning engaging one of the two exploration algorithms—linear search (LS) and golden section search (GSS), finds...

Pełny tekst do pobrania w serwisie zewnętrznym

Zespolone techniki informatyczne w analizie struktury produkcji i wynikówekonomicznych dla korporacji przemysłu elektronicznego. Zamoj. Stud. i Ma-ter.**2003 z. 1 s. 109-116, 6 rys. 2 tab. bibliogr. 11 poz. Seria: Informatyka. Materiały z konferencji '' Informatyka w szkole''.

Publikacja

- Rok 2003

W artykule przeanalizowano strukturę produkcji oraz wyniki ekonomiczne czte-rech wiodących korporacji pzremysłu elektronicznego: Analog Devices, Intel,Texas Instruments i Motorola. Przy wykorzystaniu pakietu Microsoft Acces o-pracowana została baza danych zawierająca podstawowe informacje o korporac-jach. Zaproponowano i wykonano analizy porównawcze w zakresie finansowo-eko-nomicznym oraz przedstawiono przewidywania dotyczące...

Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors

Publikacja

- International Journal of Computer Information Systems and Industrial Management Applications - Rok 2020

In the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...

Pełny tekst do pobrania w serwisie zewnętrznym

Taking advantage of the shared explicit cache system based critical sections in the shared memory parallel architectures

Publikacja

T. Madajczak

- Rok 2006

Artykuł prezentuje nową metodę implementacji sekcji krytycznych w równoległych architekturach z pamięcią współdzieloną, takich jak systemy zintegrowane wielowątkowe wieloprocesorowe. Metoda stanowi modyfikację i rozbudowanie metody zwanej Folding, dostępnej w procesorach sieciowych oraz jest w założeniach podobna do techniki zwanej cache-based locking. W porównaniu do dostępnych metod, nowa metoda usuwa problemy skalowalności i...

Tuning matrix-vector multiplication on GPU

Publikacja

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010

A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics...

GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]

Publikacja

- IEEE ANTENNAS AND PROPAGATION MAGAZINE - Rok 2014

This paper presents an optimization approach for limiting memory requirements and enhancing the performance of GPU-accelerated finite-element matrix generation applied in the implementation of the higher-order finite-element method (FEM). It emphasizes the details of the implementation of the matrix-generation algorithm for the simulation of electromagnetic wave propagation in lossless, lossy, and tensor media. Moreover, the impact...

Pełny tekst do pobrania w serwisie zewnętrznym

Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method

Publikacja

- RADIOENGINEERING - Rok 2018

This paper presents two fast generalized eigenvalue solvers for sparse symmetric matrices that arise when electromagnetic cavity resonances are investigated using the higher-order finite element method (FEM). To find a few loworder resonances, the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm with null-space deflation is applied. The computations are expedited by using one or two graphical processing...

Pełny tekst do pobrania w portalu

Making agile retrospectives more awesome

Publikacja

- Rok 2017

According to the textbook [23], Scrum exists only in its entirety, where every component is essential to Scrum’s success. However, in many organizational environments some of the components are omitted or modified in a way that is not aligned with the Scrum guidelines. Usually, such deviations result in missing the full benefits of Scrum [24]. Thereby, a Scrum process should be frequently inspected and any deviations should be...

Pełny tekst do pobrania w portalu

The management methods of the hardware and virtual threads in the integrated multiprocessor shared memory architectures

Publikacja

T. Madajczak

- Rok 2006

Rozprawa doktorska skupiona jest na problematyce efektywnego zarządzania bezpośredniego wątkami sprzętowymi i jednostkami przetwarzającymi, a również zarządzania pośredniego poprzez wątki wirtualne (zadania współbieżne). Omawia ona dostępne technologie wątków sprzętowych i porządkuje metodologie ich wykorzystania. Główna myślą przewodnią pracy jest stwierdzenie, że synchronizacja i zarządzanie wątkami sprzętowymi oraz wirtualnymi...

Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs

Publikacja

- Rok 2022

In the paper we investigate the problem of food classification from images, for the Food-101 dataset extended with 31 additional food classes from Polish cuisine. We adopted transfer learning and firstly measured training times for models such as MobileNet, MobileNetV2, ResNet50, ResNet50V2, ResNet101, ResNet101V2, InceptionV3, InceptionResNetV2, Xception, NasNetMobile and DenseNet, for systems with NVIDIA Tesla V100 (Volta) and...

Pełny tekst do pobrania w portalu

Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics

Publikacja

- IEEE Antennas and Wireless Propagation Letters - Rok 2018

In this paper a GPU-accelerated block conjugate gradient solver with multilevel preconditioning is presented for solving large system of sparse equations with multiple right hand-sides (RHSs) which arise in the finite-element analysis of electromagnetic problems. We demonstrate that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix...

Pełny tekst do pobrania w serwisie zewnętrznym

A memory efficient and fast sparse matrix vector product on a Gpu

Publikacja

- Progress in Electromagnetics Research-PIER - Rok 2011

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising...

Pełny tekst do pobrania w serwisie zewnętrznym

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Publikacja

- SIMULATION MODELLING PRACTICE AND THEORY - Rok 2023

In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain...

Pełny tekst do pobrania w serwisie zewnętrznym

Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments

Publikacja

- Scientific Programming - Rok 2019

The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods. System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. Optimization goals include various combinations of...

Pełny tekst do pobrania w portalu

Accurate Lightweight Calibration Methods for Mobile Low-Cost Particulate Matter Sensors

Publikacja

P. Jørstad
M. Wójcikowski
T. Cao
J. Lepioufle
K. Wojtkiewicz
P. H. Ha

- Rok 2023

Monitoring air pollution is a critical step towards improving public health, particularly when it comes to identifying the primary air pollutants that can have an impact on human health. Among these pollutants, particulate matter (PM) with a diameter of up to 2.5 μ m (or PM2.5) is of particular concern, making it important to continuously and accurately monitor pollution related to PM. The emergence of mobile low-cost PM sensors...

Pełny tekst do pobrania w serwisie zewnętrznym

Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming

Publikacja

- COMPUTER JOURNAL - Rok 2021

In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various compute units and takes into account computations, communication including...

Pełny tekst do pobrania w portalu

Parallel Programming for Modern High Performance Computing Systems

Publikacja

P. Czarnul

- Rok 2018

In view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...

Pełny tekst do pobrania w serwisie zewnętrznym

A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM

Publikacja

- IEEE Access - Rok 2018

The paper discusses a fast implementation of the stabilized locally optimal block preconditioned conjugate gradient (sLOBPCG) method, using a hierarchical multilevel preconditioner to solve nonHermitian sparse generalized eigenvalue problems with large symmetric complex-valued matrices obtained using the higher-order finite-element method (FEM), applied to the analysis of a microwave resonator. The resonant frequencies of the low-order...

Pełny tekst do pobrania w portalu

Scaling scrum with a customized nexus framework: A report from a joint industry‐academia research project

Publikacja

A. Joskowski
A. Przybyłek
B. Marcinkowski

- SOFTWARE-PRACTICE & EXPERIENCE - Rok 2023

espite a wide range of scaling frameworks available, large-scale agile transformations are not straightforward undertakings. Few organizations have structures in place that fit the predefined workflows – while once one applies an off-the-shelf framework outside of its prescribed process, guidance quickly runs out. In this paper, we demonstrate how to instantiate a method configuration process using a lightweight experimental approach...

Pełny tekst do pobrania w portalu

Generation of large finite-element matrices on multiple graphics processors

Publikacja

- INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING - Rok 2013

This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...

Pełny tekst do pobrania w serwisie zewnętrznym

Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs

Publikacja

A. Dziekoński
G. Fotyga
M. Mrozowski

- IEEE Access - Rok 2018

This paper discusses two fast implementations of the conjugate gradient iterative method using a hierarchical multilevel preconditioner to solve the complex-valued, sparse systems obtained using the higher order finite-element method applied to the solution of the time-harmonic Maxwell equations. In the first implementation, denoted PCG-V, a classical V-cycle is applied and the system of equations on the lowest level is solved...

Pełny tekst do pobrania w portalu

Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA

Publikacja

A. Dziekoński

- Rok 2015

Celem niniejszej rozprawy oraz stypendium odbytego w ramach projektu było opracowanie numerycznie efektywnego rozwiązania algorytmicznego i sprzętowego, które umożliwia przyspieszenie analizy problemów elektromagnetycznych metodą elementów skończonych (MES) z funkcjami bazowymi wysokiego rzędu. Metoda elementów skończonych w dziedzinie częstotliwości stanowi wydajne i uniwersalne narzędzie analizy układów mikrofalowych (rys....

Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors

Publikacja

- Annals of Computer Science and Information Systems - Rok 2018

In the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...

Pełny tekst do pobrania w portalu

Filtry

Katalog

Kategoria

Rok

Opcje

Block-based Representation of Application Execution on Modern Parallel Systems

A Task-Scheduling Approach for Efficient Sparse Symmetric Matrix-Vector Multiplication on a GPU

Sprzętowa implementacja transformacji Hougha w czasie rzeczywistym

Playing the Sprint Retrospective

Optimization of the System for Determining the Volume of Tissue Needed for Breast Reconstruction

3D-Breast System for Determining the Volume of Tissue Needed for Breast Reconstruction

Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Superkomputery do wspomagania procesów gospodarczych ze szczególnym uwzględnieniem sektora bankowego

"3D-Breast System for Determining the Volume of Tissue Needed for Breast Reconstruction"

DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computing

Zespolone techniki informatyczne w analizie struktury produkcji i wynikówekonomicznych dla korporacji przemysłu elektronicznego. Zamoj. Stud. i Ma-ter.**2003 z. 1 s. 109-116, 6 rys. 2 tab. bibliogr. 11 poz. Seria: Informatyka. Materiały z konferencji '' Informatyka w szkole''.

Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors

Taking advantage of the shared explicit cache system based critical sections in the shared memory parallel architectures

Tuning matrix-vector multiplication on GPU

GPU-Accelerated Finite-Element Matrix Generation for Lossless, Lossy, and Tensor Media [EM Programmer's Notebook]

Single and Dual-GPU Generalized Sparse Eigenvalue Solvers for Finding a Few Low-Order Resonances of a Microwave Cavity Using the Finite-Element Method

Making agile retrospectives more awesome

The management methods of the hardware and virtual threads in the integrated multiprocessor shared memory architectures

Food Classification from Images Using a Neural Network Based Approach with NVIDIA Volta and Pascal GPUs

Block Conjugate Gradient Method with Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics

A memory efficient and fast sparse matrix vector product on a Gpu

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments

Accurate Lightweight Calibration Methods for Mobile Low-Cost Particulate Matter Sensors

Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming

Parallel Programming for Modern High Performance Computing Systems

A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM

Scaling scrum with a customized nexus framework: A report from a joint industry‐academia research project

Generation of large finite-element matrices on multiple graphics processors

Preconditioners with Low Memory Requirements for Higher-Order Finite-Element Method Applied to Solving Maxwell’s Equations on Multicore CPUs and GPUs

Optymalizacja wydajności obliczeniowej metody elementów skończonych w architekturze CUDA

Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: INTEL