Filtry
wszystkich: 61
wybranych: 53
Wyniki wyszukiwania dla: COPROCESSORS
-
Verification and Benchmarking in MPA Coprocessor Design Process
PublikacjaThis paper presents verification and benchmarking required for the development of a coprocessor digital circuit for integer multiple-precision arithmetic (MPA). Its code is developed, with the use of very high speed integrated circuit hardware description language (VHDL), as an intellectual property core. Therefore, it can be used by a final user within their own computing system based on field-programmable gate arrays (FPGAs)....
-
Hardware cryptography coprocessor for system on chip soft processor
PublikacjaW artykule przedstawiono realizację sprzętową i programową szyfrującejo i deszyfrującego algorytmu AES.Obydwie implementacje zostały zralizowane z wykorzystaniem układu Virtex II i przetestowane. Jako kryterium porónawcze wybrano zużycie zasobów układu oraz wydajność. Realizacja sprzętowa wykonuje operację szyfrowania 2 dekady szybcie niż wersja programowa, ale wymaga pięciokrotnie więcej zasobówIn this paper hardware and software...
-
Open-Source Coprocessor for Integer Multiple Precision Arithmetic
PublikacjaThis paper presents an open-source digital circuit of the coprocessor for an integer multiple-precision arithmetic (MPA). The purpose of this coprocessor is to support a central processing unit (CPU) by offloading computations requiring integer precision higher than 32/64 bits. The coprocessor is developed using the very high speed integrated circuit hardware description language (VHDL) as an intellectual property (IP) core. Therefore,...
-
IP Core of Coprocessor for Multiple-Precision-Arithmetic Computations
PublikacjaIn this paper, we present an IP core of coprocessor supporting computations requiring integer multiple-precision arithmetic (MPA). Whilst standard 32/64-bit arithmetic is sufficient to solve many computing problems, there are still applications that require higher numerical precision. Hence, the purpose of the developed coprocessor is to support and offload central processing unit (CPU) in such computations. The developed digital...
-
Implementation of Coprocessor for Integer Multiple Precision Arithmetic on Zynq Ultrascale+ MPSoC
PublikacjaRecently, we have opened the source code of coprocessor for multiple-precision arithmetic (MPA). In this contribution, the implementation and benchmarking results for this MPA coprocessor are presented on modern Zynq Ultrascale+ multiprocessor system on chip, which combines field-programmable gate array with quad-core ARM Cortex-A53 64-bit central processing unit (CPU). In our benchmark, a single coprocessor can be up to 4.5 times...
-
Application of hybrid signals processors to speech and hearing aids
PublikacjaDzięki postępowi w technice Cyfrowych Procesorów Sygnałowych (ang. DSP) stało się możliwe budowanie miniaturowych protez słuchu i mowy. Mimo niewielkich wymiarów procesory te są w stanie wykonywać złożone algorytmy. Ich dodatkową zaletą jest łatwość zmiany oprogramowania, a co za tym idzie łatwość zmiany dziedziny zastosowań. W pracy skupiono się na zagadnieniach związanych z projektowanie i implementacją algorytmów mających zastosowanie...
-
Preemptive versus nonpreemptive scheduling of biprocessor tasks on dedicated processors
Publikacja -
Soft-core processors as SoC prototyping solution for cryptographic application
PublikacjaArtykuł przedstawia metodę wykorzystania procesora soft-core w zastosowaniach kryptografii. Przedstawione są problemy oraz zagadnienia, które udowadniają potrzebę wprowadzania silnych zabezpieczeń na niskim poziomie systemu sprzętowo-programowego. Istniejące zagrożenia stanowią wyzwanie dla projektantów bezpiecznych systemów, a sprzętowa realizacja obsługi algorytmu kryptograficznego AES jest dobrym przykładem rozwiązania tych...
-
Evolution-based scheduling of fault-tolerant programs on multiple processors
Publikacja -
Generation of large finite-element matrices on multiple graphics processors
PublikacjaThis paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics...
-
Application of TMS320c67xx signal processors for SONIC-self-optimizing narrowband interference canceler
PublikacjaThe paper presents a laboratory system for testing active control algorithms of acoustics noise in ducts. An applied algorithm - self-optimizing narrowband interference canceller (SONIC), allows one to remove narrowband disturbances of constant or slowly time-varying frequencies. Example experimental results of using the laboratory system for supression of sinusoidal disturbance are described. An electronic part of the system was...
-
Optimal programming of critical sections in modern network processors under performance requirements.
PublikacjaPrzegląd konstrukcji i zastosowań metod programowania sekcji krytycznych w nowoczesnych procesorach sieciowych rodziny Intel IXP. Porównanie wydajnościowe w formie tabeli.
-
Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors
PublikacjaIn the paper we provide thorough benchmarking of deep neural network (DNN) training on modern multi- and many-core Intel processors in order to assess performance differences for various deep learning as well as parallel computing parameters. We present performance of DNN training for Alexnet, Googlenet, Googlenet_v2 as well as Resnet_50 for various engines used by the deep learning framework, for various batch sizes. Furthermore,...
-
Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors
PublikacjaThe paper presents results from benchmarking the parallel multithreaded Stockfish chess engine on selected multi- and many-core processors. It is shown how the strength of play for an n-thread version compares to 1-thread version on both Intel Xeon and latest Intel Xeon Phi x200 processors. Results such as the number of wins, losses and draws are presented and how these change for growing numbers of threads. Impact of using particular...
-
<title>MatLab script to C code converter for embedded processors of FLASH LLRF control system</title>
Publikacja -
Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors
PublikacjaIn the paper we present extensive results from analyzing energy/performance trade-offs with power capping observed on four different modern CPUs, for three different parallel applications such as 2D heat distribution, numerical integration and Fast Fourier Transform. The CPU tested represent both multi-core type CPUs such as Intel⃝R Xeon⃝R E5, desktop and mobile i7 as well as many-core Intel⃝R Xeon PhiTM x200 but also server, desktop...
-
Parallel Programming for Modern High Performance Computing Systems
PublikacjaIn view of the growing presence and popularity of multicore and manycore processors, accelerators, and coprocessors, as well as clusters using such computing devices, the development of efficient parallel applications has become a key challenge to be able to exploit the performance of such systems. This book covers the scope of parallel programming for modern high performance computing systems. It first discusses selected and...
-
Modeling DAC Application Execution Time
PublikacjaAn application written in the Divide And Conquer paradigm is more difficult to model than SPMD application because of complex algorithm, causing use of many coefficients in a computational complexity function. Processors are divided into various layers, each layer contains different number of processors. Data packets processed in different layers and transferred between layers have different length. Moreover first layer processors use...
-
Modeling SPMD Application Execution Time
PublikacjaParallel applications in a Single Process Multiple Data paradigm assume splitting huge amounts of data to multiple processors working in parallel at small data packets. As the individual data packets are not independent, the processors must interact with each other to exchange results of the calculations with their adjacent partners and take these results into account in their own computations. An example of SPMD is geometric parallelism...
-
Asynchronous distributed state estimation for continuous-time stochastic processes
PublikacjaWe consider the problem of state estimation of a continuous-time stochastic process using an asynchronous distributed multi-sensor estimation system (ADES). In an ADES the state of a process of interest is estimated by a group of local estimators. Each local estimator based, for example, on a Kalman filter, performs single sensor filtration but also fusion of its local results and results from other (remote) processors to compute...
-
Implementation of Addition and Subtraction Operations in Multiple Precision Arithmetic
PublikacjaIn this paper, we present a digital circuit of arithmetic unit implementing addition and subtraction operations in multiple-precision arithmetic (MPA). This adder-subtractor unit is a part of MPA coprocessor supporting and offloading the central processing unit (CPU) in computations requiring precision higher than 32/64 bits. Although addition and subtraction operations of two n-digit numbers require O(n) operations, the efficient...
-
Shared processor scheduling of multiprocessor jobs
PublikacjaWe study a problem of shared processor scheduling of multiprocessor weighted jobs. Each job can be executed on its private processor and simultaneously on possibly many processors shared by all jobs. This simultaneous execution reduces their completion times due to the processing time overlap. Each of the m shared processors may charge a different fee but otherwise the processors are identical. The goal is to maximize the total...
-
Extended investigation of performance-energy trade-offs under power capping in HPC environments
Publikacja—In the paper we present investigation of performance-energy trade-offs under power capping using modern processors. The results are presented for systems targeted at both server and client markets and were collected from Intel Xeon E5 and Intel Xeon Phi server processors as well as from desktop and mobile Intel Core i7 processors. The results, when using power capping, show that we can find various interesting combinations of...
-
Analog CMOS processor for early vision processing with highly reduced power consumption
PublikacjaA new approach to an analog ultra-low power visionchip design is presented. The prototype chip performs low-levelconvolutional image processing algorithms in real time. Thecircuit is implemented in 0.35 μm CMOS technology, contains64 x 64 SIMD matrix with embedded analogue processors APE(Analogue Processing Element). The photo-sensitive-matrix is of2.2 μm x 2.2 μm size, giving the density of 877 processors permm2. The matrix dissipates...
-
Characteristics of an image sensor with early-vision processing fabricated in standard 0.35 µm CMOS technology
PublikacjaThe article presents measurement results of prototype integrated circuits for acquisition and processing of images in real time. In order to verify a new concept of circuit solutions of analogue image processors, experimental integrated circuits were fabricated. The integrated circuits, designed in a standard 0.35 µm CMOS technology, contain the image sensor and analogue processors that perform low-level convolution-based image...
-
Multi Queue Approach for Network Services Implemented for Multi Core CPUs
PublikacjaMultiple core processors have already became the dominant design for general purpose CPUs. Incarnations of this technology are present in solutions dedicated to such areas like computer graphics, signal processing and also computer networking. Since the key functionality of network core components is fast package servicing, multicore technology, due to multi tasking ability, seems useful to support packet processing. Dedicated...
-
Dedicated scheduling of tasks to minimize mean flow time
PublikacjaThis paper investigates the complexity of scheduling biprocessor tasks on dedicated processors to minimize mean flow time. Since the general problem is strongly NP-hard, we assume some restrictions on task lengths and the structure of associated scheduling graphs. Of particular interest are acyclic graphs. In this way we identify a borderline between NP-hard and polynomially solvable special cases.
-
Equitable and semi-equitable coloring of cubic graphs and its application in batch scheduling
PublikacjaIn the paper we consider the problems of equitable and semi-equitable coloring of vertices of cubic graphs. We show that in contrast to the equitable coloring, which is easy, the problem of semi-equitable coloring is NP- complete within a broad spectrum of graph parameters. This affects the complexity of batch scheduling of unit-length jobs with cubic incompatibility graph on three uniform processors to minimize...
-
The chapter analyses the K-Means algorithm in its parallel setting. We provide detailed description of the algorithm as well as the way we paralellize the computations. We identified complexity of the particular steps of the algorithm that allows us to build the algorithm model in MERPSYS system. The simulations with the MERPSYS have been performed for different size of the data as well as for different number of the processors used for the computations. The results we got using the model have been compared to the results obtained from real computational environment.
PublikacjaThe chapter analyses the K-Means algorithm in its parallel setting. We provide detailed description of the algorithm as well as the way we paralellize the computations. We identified complexity of the particular steps of the algorithm that allows us to build the algorithm model in MERPSYS system. The simulations with the MERPSYS have been performed for different size of the data as well as for different number of the processors used...
-
GPU-accelerated finite element method
PublikacjaIn this paper the results of the acceleration of computations involved in analysing electromagnetic problems by means of the finite element method (FEM), obtained with graphics processors (GPU), are presented. A 4.7-fold acceleration was achieved thanks to the massive parallelization of the most time-consuming steps of FEM, namely finite-element matrix-generation and the solution of a sparse system of linear equations with the...
-
Parallel immune system for graph coloring
PublikacjaThis paper presents a parallel artificial immune system designed forgraph coloring. The algorithm is based on the clonal selection principle. Each processor operates on its own pool of antibodies and amigration mechanism is used to allow processors to exchange information. Experimental results show that migration improves the performance of the algorithm. The experiments were performed using a high performance cluster on a set...
-
On-Line Partitioning for On-Line Scheduling with Resource Conflicts
PublikacjaWithin this paper, we consider the problem of on-line partitioning the sequence of jobs which are competing for non-sharable resources. As a result of partitioning we get the subsets of jobs that form separate instances of the on-line scheduling problem. The objective is to generate a partition into the minimum number of instances such that the response time of any job in each instance is bounded by a given constant. Our research...
-
A solution of the integrated µBIST for functional and diagnostic testing in mixed-signal electronic embedded systems
PublikacjaMain problem of the paper is testing of analog circuits and blocks in mixed-signal electronic embedded systems (EESs), using the built-in self-test (BIST) technique. The integrated mBIST based on reusing signal blocks already present in an EES, such as processors, memories, ADCs, is presented. The novelty of the solution is the extended functionality of the mBIST. It can perform 2 testing functions: functional testing and fault...
-
CMOS realisation of analogue processor for early vision processing
PublikacjaThe architecture concept of a high-speed low-power analogue vision chip, which performs low-level real-time image algorithms ispresented. The proof-of-concept prototype vision chip containing 32 × 32 photosensor array and 32 analogue processors is fabricated usinga 0.35 μm CMOS technology. The prototype can be configured to register and process images with very high speed, reaching 2000 framesper second, or achieve very low power...
-
Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
PublikacjaThe paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability...
-
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
PublikacjaThe paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally...
-
CMOS implementation of an analogue median filter for image processing in real time
PublikacjaAn analogue median filter, realised in a 0.35 μm CMOS technology, is presented in this paper. The key advantages of the filter are: high speed of image processing (50 frames per second), low-power operation (below 1.25 mW under 3.3 V supply) and relatively high accuracy of signal processing. The presented filter is a part of an integrated circuit for image processing (a vision chip), containing: a photo-sensor matrix, a set of...
-
Probe signal processing for channel estimation in underwater acoustic communication system
PublikacjaUnderwater acoustic communication channels are characterized by a large variety of propagation conditions. Designing a reliable communication system requires knowledge of the transmission parameters of the channel, namely multipath delay spread, Doppler spread, coherence time, and coherence bandwidth. However, the possibilities of its estimation in a realtime underwater communication system are limited, mainly due to the computational...
-
Shared multi-processor scheduling
PublikacjaWe study shared multi-processor scheduling problem where each job can be executed on its private processor and simultaneously on one of many processors shared by all jobs in order to reduce the job’s completion time due to processing time overlap. The total weighted overlap of all jobs is to be maximized. The problem models subcontracting scheduling in supply chains and divisible load scheduling in computing. We show that synchronized...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublikacjaThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
Improved magnitude estimation of complex numbers using alpha max and beta min algorithm
PublikacjaThe paper presents an improved algorithm for calculating the magnitude of complex numbers. This problem, which is a special case of square rooting, occurs for example, in FFT processors and complex FIR filters. The proposed method of magnitude calculation makes use of the modified alpha max and beta min algorithm. The improved version of the algorithm allows to control the maximum magnitude approximation error by using an adequate...
-
Real and Virtual Instruments in Machine Learning – Training and Comparison of Classification Results
PublikacjaThe continuous growth of the computing power of processors, as well as the fact that computational clusters can be created from combined machines, allows for increasing the complexity of algorithms that can be trained. The process, however, requires expanding the basis of the training sets. One of the main obstacles in music classification is the lack of high-quality, real-life recording database for every instrument with a variety...
-
On simplification of residue scaling process in pipelined Radix-4 MQRNS FFT processor
PublikacjaResidue scaling is needed in pipelined FFT radix-4 processors based on the Modified Quadratic Residue Number System (MQRNS) at the output of each butterfly. Such processor uses serial connection of radix-4 butterflies. Each butterfly comprises n subunits, one for each modulus of the RNS base and generates four complex residue numbers. In order to prevent arithmetic overflow intermediate results after each butterfly have to be...
-
On configuration of residue scaling process in pipelined radix-4 MQRNS FFT processor
PublikacjaResidue scaling is needed in pipelined FFT radix-4 processors based on the Modified Quadratic Residue Number System (MQRNS) at the output of each butterfly. Such processor uses serial connection of radix-4 butterflies. Each butterfly comprises n subunits, one for each modulus of the RNS base and outputs four complex residue numbers. In order to prevent the arithmetic overflow in the succesive stage, every number has to be scaled,...
-
Standard deviation as the optimization criterion in the OptD method and its influence on the generated DTM
PublikacjaReduction of the measurement dataset is one of the current issues related to constantly developing technologies that provide large datasets, eg. laser scanning. It could seems that presence and evolution of processors computer, increase of hard drive capacity etc. is the solution for development of such large datasets. And in fact it is, however, the “lighter” datasets are easier to work with. Additionally, reduced datasets can...
-
Hybrid quantum-classical approach for atomistic simulation of metallic systems
PublikacjaThe learn-on-the-fly (LOTF) method [G. Csanyi et al., Phys. Rev. Lett. 93, 175503 (2004)] serves to seamlessly embed quantum-mechanical computations within a molecular-dynamics framework by continual local retuning of the potential's parameters so that it reproduces the quantum-mechanical forces. In its current formulation, it is suitable for systems where the interaction is short-ranged, such as covalently bonded semiconductors....
-
Using Rule-Based System for Monitoring Marine Navigation Data Processing
PublikacjaProcessing marine navigational data requires sophisticated software solutions. Typically, specialized tools called processors are analyzing raw data from different sensors. It becomes important to create the monitoring software that is able to validate and verify processing components integrated into the final system. Drools®business rule management platform provides a core business rules engine, web authoring and rules management...
-
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
PublikacjaHigh-performance computing (HPC), according to its name, is traditionally oriented toward performance, especially the execution time and scalability of the computations. However, due to the high cost and environmental issues, energy consumption has already become a very important factor that needs to be considered. The paper presents a survey of energy-aware scheduling methods used in a modern HPC environment, starting with the...
-
Thermal Image Processing for Respiratory Estimation from Cubical Data with Expandable Depth
PublikacjaAs healthcare costs continue to rise, finding affordable and non-invasive ways to monitor vital signs is increasingly important. One of the key metrics for assessing overall health and identifying potential issues early on is respiratory rate (RR). Most of the existing methods require multiple steps that consist of image and signal processing. This might be difficult to deploy on edge devices that often do not have specialized...
-
Advanced Control With PLC—Code Generator for aMPC Controller Implementation and Cooperation With External Computational Server for Dealing With Multidimensionality, Constraints and LMI Based Robustness
PublikacjaThe manufacturers of Programmable Logic Controllers (PLC) usually equip their products with extremely simple control algorithms, such as PID and on-off regulators. However, modern PLCs have much more efficient processors and extensive memory, which enables implementing more sophisticated controllers. The paper discusses issues related to the implementation of matrix operations, time limitations for code execution within one PLC...