Wyniki wyszukiwania dla: COPROCESSORS
-
Verification and Benchmarking in MPA Coprocessor Design Process
PublikacjaThis paper presents verification and benchmarking required for the development of a coprocessor digital circuit for integer multiple-precision arithmetic (MPA). Its code is developed, with the use of very high speed integrated circuit hardware description language (VHDL), as an intellectual property core. Therefore, it can be used by a final user within their own computing system based on field-programmable gate arrays (FPGAs)....
-
IP Core of Coprocessor for Multiple-Precision-Arithmetic Computations
PublikacjaIn this paper, we present an IP core of coprocessor supporting computations requiring integer multiple-precision arithmetic (MPA). Whilst standard 32/64-bit arithmetic is sufficient to solve many computing problems, there are still applications that require higher numerical precision. Hence, the purpose of the developed coprocessor is to support and offload central processing unit (CPU) in such computations. The developed digital...
-
Hardware cryptography coprocessor for system on chip soft processor
PublikacjaW artykule przedstawiono realizację sprzętową i programową szyfrującejo i deszyfrującego algorytmu AES.Obydwie implementacje zostały zralizowane z wykorzystaniem układu Virtex II i przetestowane. Jako kryterium porónawcze wybrano zużycie zasobów układu oraz wydajność. Realizacja sprzętowa wykonuje operację szyfrowania 2 dekady szybcie niż wersja programowa, ale wymaga pięciokrotnie więcej zasobówIn this paper hardware and software...
-
Open-Source Coprocessor for Integer Multiple Precision Arithmetic
PublikacjaThis paper presents an open-source digital circuit of the coprocessor for an integer multiple-precision arithmetic (MPA). The purpose of this coprocessor is to support a central processing unit (CPU) by offloading computations requiring integer precision higher than 32/64 bits. The coprocessor is developed using the very high speed integrated circuit hardware description language (VHDL) as an intellectual property (IP) core. Therefore,...
-
Implementation of Coprocessor for Integer Multiple Precision Arithmetic on Zynq Ultrascale+ MPSoC
PublikacjaRecently, we have opened the source code of coprocessor for multiple-precision arithmetic (MPA). In this contribution, the implementation and benchmarking results for this MPA coprocessor are presented on modern Zynq Ultrascale+ multiprocessor system on chip, which combines field-programmable gate array with quad-core ARM Cortex-A53 64-bit central processing unit (CPU). In our benchmark, a single coprocessor can be up to 4.5 times...
-
Application of hybrid signals processors to speech and hearing aids
PublikacjaDzięki postępowi w technice Cyfrowych Procesorów Sygnałowych (ang. DSP) stało się możliwe budowanie miniaturowych protez słuchu i mowy. Mimo niewielkich wymiarów procesory te są w stanie wykonywać złożone algorytmy. Ich dodatkową zaletą jest łatwość zmiany oprogramowania, a co za tym idzie łatwość zmiany dziedziny zastosowań. W pracy skupiono się na zagadnieniach związanych z projektowanie i implementacją algorytmów mających zastosowanie...
-
Preemptive versus nonpreemptive scheduling of biprocessor tasks on dedicated processors
Publikacja -
Soft-core processors as SoC prototyping solution for cryptographic application
PublikacjaArtykuł przedstawia metodę wykorzystania procesora soft-core w zastosowaniach kryptografii. Przedstawione są problemy oraz zagadnienia, które udowadniają potrzebę wprowadzania silnych zabezpieczeń na niskim poziomie systemu sprzętowo-programowego. Istniejące zagrożenia stanowią wyzwanie dla projektantów bezpiecznych systemów, a sprzętowa realizacja obsługi algorytmu kryptograficznego AES jest dobrym przykładem rozwiązania tych...
-
Evolution-based scheduling of fault-tolerant programs on multiple processors
Publikacja -
Generation of large finite-element matrices on multiple graphics processors
PublikacjaThis paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...
-
Application of TMS320c67xx signal processors for SONIC-self-optimizing narrowband interference canceler
PublikacjaThe paper presents a laboratory system for testing active control algorithms of acoustics noise in ducts. An applied algorithm - self-optimizing narrowband interference canceller (SONIC), allows one to remove narrowband disturbances of constant or slowly time-varying frequencies. Example experimental results of using the laboratory system for supression of sinusoidal disturbance are described. An electronic part of the system was...
-
Optimal programming of critical sections in modern network processors under performance requirements.
PublikacjaPrzegląd konstrukcji i zastosowań metod programowania sekcji krytycznych w nowoczesnych procesorach sieciowych rodziny Intel IXP. Porównanie wydajnościowe w formie tabeli.
-
Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors
PublikacjaIn the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...
-
Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
PublikacjaThe paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...
-
<title>MatLab script to C code converter for embedded processors of FLASH LLRF control system</title>
Publikacja -
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
PublikacjaIn the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...
-
Parallel Programming for Modern High Performance Computing Systems
PublikacjaIn view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...
-
International Conference on Apps for Specific Array Processors
Konferencje -
Bogdan Pankiewicz dr hab. inż.
OsobyBogdan Pankiewicz ukończył w 1993 r. Wydział Elektroniki Politechniki Gdańskiej, specjalność układy elektroniczne a w 2002 r. uzyskał stopień doktora w dziedzinie elektroniki na Wydziale ETI, PG. Od początku kariery jest związany z Politechniką Gdańską: najpierw jako asystent (lata 1994–2002), a następnie jako adiunkt (od 2002 r.) na Wydziale Elektroniki, Telekomunikacji i Informatyki. Zajmuje się projektowaniem analogowych i cyfrowych...
-
Grzegorz Szwoch dr hab. inż.
OsobyGrzegorz Szwoch urodził się w 1972 roku w Gdańsku. W latach 1991-1996 studiował na wydziale Elektroniki Politechniki Gdańskiej. W roku 1996 ukończył studia w Zakładzie Inżynierii Dźwięku (obecnie Katedra Systemów Multimedialnych), broniąc pracę dyplomową pt. Modelowanie fizyczne wybranych instrumentów muzycznych. W tym samym roku dołączył do zespołu badawczego Katedry jako uczestnik Studium Doktoranckiego. Od stycznia 2001 roku...
-
Modeling DAC Application Execution Time
PublikacjaAn application written in the Divide And Conquer paradigm is more difficult to model than SPMD application because of complex algorithm, causing use of many coefficients in a computational complexity function. Processors are divided into various layers, each layer contains different number of processors. Data packets processed in different layers and transferred between layers have different length. Moreover first layer processors use...
-
Marek Wójcikowski dr hab. inż.
OsobyMarek Wójcikowski ukończył w 1993 r. Wydział Elektroniki Politechniki Gdańskiej, specjalność układy elektroniczne. W 2002 r. uzyskał stopień doktora w dziedzinie elektroniki, a w 2016 r. uzyskał stopień doktora habilitowanego na Wydziale Elektroniki Telekomunikacji i Informatyki Politechniki Gdańskiej. Od początku kariery jest związany z Politechniką Gdańską: najpierw jako asystent (lata 1994–2002), a następnie jako adiunkt (od...
-
Modeling SPMD Application Execution Time
PublikacjaParallel applications in a Single Process Multiple Data paradigm assume splitting huge amounts of data to multiple processors working in parallel at small data packets. As the individual data packets are not independent, the processors must interact with each other to exchange results of the calculations with their adjacent partners and take these results into account in their own computations. An example of SPMD is geometric parallelism...
-
Paweł Czarnul dr hab. inż.
OsobyPaweł Czarnul uzyskał stopień doktora habilitowanego w dziedzinie nauk technicznych w dyscyplinie informatyka w roku 2015 zaś stopień doktora nauk technicznych w zakresie informatyki(z wyróżnieniem) nadany przez Radę Wydziału Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej w roku 2003. Dziedziny jego zainteresowań obejmują: przetwarzanie równoległei rozproszone w tym programowanie równoległe na klastrach obliczeniowych,...
-
Asynchronous distributed state estimation for continuous-time stochastic processes
PublikacjaWe consider the problem of state estimation of a continuous-time stochastic process using an asynchronous distributed multi-sensor estimation system (ADES). In an ADES the state of a process of interest is estimated by a group of local estimators. Each local estimator based, for example, on a Kalman filter, performs single sensor filtration but also fusion of its local results and results from other (remote) processors to compute...
-
Implementation of Addition and Subtraction Operations in Multiple Precision Arithmetic
PublikacjaIn this paper, we present a digital circuit of arithmetic unit implementing addition and subtraction operations in multiple-precision arithmetic (MPA). This adder-subtractor unit is a part of MPA coprocessor supporting and offloading the central processing unit (CPU) in computations requiring precision higher than 32/64 bits. Although addition and subtraction operations of two n-digit numbers require O(n) operations, the efficient...
-
Shared processor scheduling of multiprocessor jobs
PublikacjaWe study a problem of shared processor scheduling of multiprocessor weighted jobs. Each job can be executed on its private processor and simultaneously on possibly many processors shared by all jobs. This simultaneous execution reduces their completion times due to the processing time overlap. Each of the m shared processors may charge a different fee but otherwise the processors are identical. The goal is to maximize the total...
-
Extended investigation of performance-energy trade-offs under power capping in HPC environments
Publikacja—In the paper we present investigation of performance-energy trade-offs under power capping using modern processors. The results are presented for systems targeted at both server and client markets and were collected from Intel Xeon E5 and Intel Xeon Phi server processors as well as from desktop and mobile Intel Core i7 processors. The results, when using power capping, show that we can find various interesting combinations of...
-
Analog CMOS processor for early vision processing with highly reduced power consumption
PublikacjaA new approach to an analog ultra-low power visionchip design is presented. The prototype chip performs low-levelconvolutional image processing algorithms in real time. Thecircuit is implemented in 0.35 μm CMOS technology, contains64 x 64 SIMD matrix with embedded analogue processors APE(Analogue Processing Element). The photo-sensitive-matrix is of2.2 μm x 2.2 μm size, giving the density of 877 processors permm2. The matrix dissipates...
-
Characteristics of an image sensor with early-vision processing fabricated in standard 0.35 µm CMOS technology
PublikacjaThe article presents measurement results of prototype integrated circuits for acquisition and processing of images in real time. In order to verify a new concept of circuit solutions of analogue image processors, experimental integrated circuits were fabricated. The integrated circuits, designed in a standard 0.35 µm CMOS technology, contain the image sensor and analogue processors that perform low-level convolution-based image...
-
The surface of a fragment of the structure of an integrated circuit in the semi-contact mode.
Dane BadawczeThe surface of a fragment of the structure of an integrated circuit. Topographic measurements in the semi-contact mode. NTEGRA Prima (NT-MDT) device. NSG 01 probe.
-
Multi Queue Approach for Network Services Implemented for Multi Core CPUs
PublikacjaMultiple core processors have already became the dominant design for general purpose CPUs. Incarnations of this technology are present in solutions dedicated to such areas like computer graphics, signal processing and also computer networking. Since the key functionality of network core components is fast package servicing, multicore technology, due to multi tasking ability, seems useful to support packet processing. Dedicated...
-
Dedicated scheduling of tasks to minimize mean flow time
PublikacjaThis paper investigates the complexity of scheduling biprocessor tasks on dedicated processors to minimize mean flow time. Since the general problem is strongly NP-hard, we assume some restrictions on task lengths and the structure of associated scheduling graphs. Of particular interest are acyclic graphs. In this way we identify a borderline between NP-hard and polynomially solvable special cases.
-
Equitable and semi-equitable coloring of cubic graphs and its application in batch scheduling
PublikacjaIn the paper we consider the problems of equitable and semi-equitable coloring of vertices of cubic graphs. We show that in contrast to the equitable coloring, which is easy, the problem of semi-equitable coloring is NP- complete within a broad spectrum of graph parameters. This affects the complexity of batch scheduling of unit-length jobs with cubic incompatibility graph on three uniform processors to minimize...
-
The chapter analyses the K-Means algorithm in its parallel setting. We provide detailed description of the algorithm as well as the way we paralellize the computations. We identified complexity of the particular steps of the algorithm that allows us to build the algorithm model in MERPSYS system. The simulations with the MERPSYS have been performed for different size of the data as well as for different number of the processors used for the computations. The results we got using the model have been compared to the results obtained from real computational environment.
PublikacjaThe chapter analyses the K-Means algorithm in its parallel setting. We provide detailed description of the algorithm as well as the way we paralellize the computations. We identified complexity of the particular steps of the algorithm that allows us to build the algorithm model in MERPSYS system. The simulations with the MERPSYS have been performed for different size of the data as well as for different number of the processors used...
-
GPU-accelerated finite element method
PublikacjaIn this paper the results of the acceleration of computations involved in analysing electromagnetic problems by means of the finite element method (FEM), obtained with graphics processors (GPU), are presented. A 4.7-fold acceleration was achieved thanks to the massive parallelization of the most time-consuming steps of FEM, namely finite-element matrix-generation and the solution of a sparse system of linear equations with the...
-
Parallel immune system for graph coloring
PublikacjaThis paper presents a parallel artificial immune system designed forgraph coloring. The algorithm is based on the clonal selection principle. Each processor operates on its own pool of antibodies and amigration mechanism is used to allow processors to exchange information. Experimental results show that migration improves the performance of the algorithm. The experiments were performed using a high performance cluster on a set...
-
On-Line Partitioning for On-Line Scheduling with Resource Conflicts
PublikacjaWithin this paper, we consider the problem of on-line partitioning the sequence of jobs which are competing for non-sharable resources. As a result of partitioning we get the subsets of jobs that form separate instances of the on-line scheduling problem. The objective is to generate a partition into the minimum number of instances such that the response time of any job in each instance is bounded by a given constant. Our research...
-
A solution of the integrated µBIST for functional and diagnostic testing in mixed-signal electronic embedded systems
PublikacjaMain problem of the paper is testing of analog circuits and blocks in mixed-signal electronic embedded systems (EESs), using the built-in self-test (BIST) technique. The integrated mBIST based on reusing signal blocks already present in an EES, such as processors, memories, ADCs, is presented. The novelty of the solution is the extended functionality of the mBIST. It can perform 2 testing functions: functional testing and fault...
-
CMOS realisation of analogue processor for early vision processing
PublikacjaThe architecture concept of a high-speed low-power analogue vision chip, which performs low-level real-time image algorithms ispresented. The proof-of-concept prototype vision chip containing 32 × 32 photosensor array and 32 analogue processors is fabricated usinga 0.35 μm CMOS technology. The prototype can be configured to register and process images with very high speed, reaching 2000 framesper second, or achieve very low power...
-
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
PublikacjaThe paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...
-
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
PublikacjaThe paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...
-
CMOS implementation of an analogue median filter for image processing in real time
PublikacjaAn analogue median filter, realised in a 0.35 μm CMOS technology, is presented in this paper. The key advantages of the filter are: high speed of image processing (50 frames per second), low-power operation (below 1.25 mW under 3.3 V supply) and relatively high accuracy of signal processing. The presented filter is a part of an integrated circuit for image processing (a vision chip), containing: a photo-sensor matrix, a set of...
-
Probe signal processing for channel estimation in underwater acoustic communication system
PublikacjaUnderwater acoustic communication channels are characterized by a large variety of propagation conditions. Designing a reliable communication system requires knowledge of the transmission parameters of the channel, namely multipath delay spread, Doppler spread, coherence time, and coherence bandwidth. However, the possibilities of its estimation in a realtime underwater communication system are limited, mainly due to the computational...
-
Shared multi-processor scheduling
PublikacjaWe study shared multi-processor scheduling problem where each job can be executed on its private processor and simultaneously on one of many processors shared by all jobs in order to reduce the job’s completion time due to processing time overlap. The total weighted overlap of all jobs is to be maximized. The problem models subcontracting scheduling in supply chains and divisible load scheduling in computing. We show that synchronized...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublikacjaThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
Improved magnitude estimation of complex numbers using alpha max and beta min algorithm
PublikacjaThe paper presents an improved algorithm for calculating the magnitude of complex numbers. This problem, which is a special case of square rooting, occurs for example, in FFT processors and complex FIR filters. The proposed method of magnitude calculation makes use of the modified alpha max and beta min algorithm. The improved version of the algorithm allows to control the maximum magnitude approximation error by using an adequate...
-
Real and Virtual Instruments in Machine Learning – Training and Comparison of Classification Results
PublikacjaThe continuous growth of the computing power of processors, as well as the fact that computational clusters can be created from combined machines, allows for increasing the complexity of algorithms that can be trained. The process, however, requires expanding the basis of the training sets. One of the main obstacles in music classification is the lack of high-quality, real-life recording database for every instrument with a variety...
-
On simplification of residue scaling process in pipelined Radix-4 MQRNS FFT processor
PublikacjaResidue scaling is needed in pipelined FFT radix-4 processors based on the Modified Quadratic Residue Number System (MQRNS) at the output of each butterfly. Such processor uses serial connection of radix-4 butterflies. Each butterfly comprises n subunits, one for each modulus of the RNS base and generates four complex residue numbers. In order to prevent arithmetic overflow intermediate results after each butterfly have to be...
-
On configuration of residue scaling process in pipelined radix-4 MQRNS FFT processor
PublikacjaResidue scaling is needed in pipelined FFT radix-4 processors based on the Modified Quadratic Residue Number System (MQRNS) at the output of each butterfly. Such processor uses serial connection of radix-4 butterflies. Each butterfly comprises n subunits, one for each modulus of the RNS base and outputs four complex residue numbers. In order to prevent the arithmetic overflow in the succesive stage, every number has to be scaled,...