Wyniki wyszukiwania dla: AUDIO-VISUAL SPEECH RECOGNITION - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: AUDIO-VISUAL SPEECH RECOGNITION

Wyniki wyszukiwania dla: AUDIO-VISUAL SPEECH RECOGNITION

  • Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

    Publikacja

    In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Visual Features for Endoscopic Bleeding Detection

    Aims: To define a set of high-level visual features of endoscopic bleeding and evaluate their capabilities for potential use in automatic bleeding detection. Study Design: Experimental study. Place and Duration of Study: Department of Computer Architecture, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, between March 2014 and May 2014. Methodology: The features have...

    Pełny tekst do pobrania w portalu

  • Energy Efficiency Study of Audio-video Content Consumption on Selected Android Mobile Terminals

    Publikacja

    Mobile devices are widely used by billions of users worldwide. Thanks to their main advantage, which is portability, they should be fully operational as long as possible, without the need to recharge or connect them to external power sources. This paper describes a study, carried out on four different mobile devices, with different hardware and software parameters, running the Android operating system. The research campaign involved...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Noise profiling for speech enhancement employing machine learning models

    Publikacja

    - Journal of the Acoustical Society of America - Rok 2022

    This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

    Pełny tekst do pobrania w portalu

  • Automatic Watercraft Recognition and Identification on Water Areas Covered by Video Monitoring as Extension for Sea and River Traffic Supervision Systems

    Publikacja

    - Polish Maritime Research - Rok 2018

    The article presents the watercraft recognition and identification system as an extension for the presently used visual water area monitoring systems, such as VTS (Vessel Traffic Service) or RIS (River Information Service). The watercraft identification systems (AIS - Automatic Identification Systems) which are presently used in both sea and inland navigation require purchase and installation of relatively expensive transceivers...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • From Linear Classifier to Convolutional Neural Network for Hand Pose Recognition

    Publikacja

    Recently gathered image datasets and the new capabilities of high-performance computing systems have allowed developing new artificial neural network models and training algorithms. Using the new machine learning models, computer vision tasks can be accomplished based on the raw values of image pixels instead of specific features. The principle of operation of deep neural networks resembles more and more what we believe to be happening...

    Pełny tekst do pobrania w portalu

  • Multimodal system for diagnosis and polysensory stimulation of subjects with communication disorders

    An experimental multimodal system, designed for polysensory diagnosis and stimulation of persons with impaired communication skills or even non-communicative subjects is presented. The user interface includes an eye tracking device and the EEG monitoring of the subject. Furthermore, the system consists of a device for objective hearing testing and an autostereoscopic projection system designed to stimulate subjects through their...

  • Study Analysis of Transmission Efficiency in DAB+ Broadcasting System

    Publikacja

    - Rok 2018

    DAB+ is a very innovative and universal multimedia broadcasting system. Thanks to its updated multimedia technologies and metadata options, digital radio keeps pace with changing consumer expectations and the impact of media convergence. Broadcasting analog and digital radio services does vary, concerning devices on both transmitting and receiving side, as well as content processing mechanisms. However, the biggest difference is...

    Pełny tekst do pobrania w portalu

  • Separability Assessment of Selected Types of Vehicle-Associated Noise

    Music Information Retrieval (MIR) area as well as development of speech and environmental information recognition techniques brought various tools in-tended for recognizing low-level features of acoustic signals based on a set of calculated parameters. In this study, the MIRtoolbox MATLAB tool, designed for music parameter extraction, is used to obtain a vector of parameters to check whether they are suitable for separation of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Performance Analysis of the OpenCL Environment on Mobile Platforms

    Publikacja

    Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Creating a Remote Choir Performance Recording Based on an Ambisonic Approach

    The aim of this paper is three-fold. First, the basics of binaural and ambisonic techniques are briefly presented. Then, details related to audio-visual recordings of a remote performance of the Academic Choir of the Gdańsk University of Technology are shown. Due to the COVID-19 pandemic, artists had a choice, namely, to stay at home and not perform or stay at home and perform. In fact, staying at home brought in the possibility...

    Pełny tekst do pobrania w portalu

  • Karolina Zielińska-Dąbkowska dr inż. arch.

    Karolina M. Zielinska-Dabkowska (dr inż. arch., Dipl.-Ing. Arch.[FH]) jest adiunktem na Wydziale Architektury Politechniki Gdańskiej. W roku 2002 ukończyła studia na Wydziale Architektury i Urbanistyki  Politechniki Gdańskiej a w 2004 inżynierii architektonicznej na HAWK Hochschule für angewandte Wissenschaft und Kunst Hildesheim w Niemczech. Po studiach pracowała dla kilku firm o światowej renomie w Berlinie, Londynie, Nowym Jorku...

  • Smart Virtual Bass Synthesis Algorithm Based on Music Genre Classification

    Publikacja

    The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The proposed algorithm employed automatic music genre recognition to determine the optimum parameters for the synthesis of additional frequencies. The synthesis was carried out using the non-linear device (NLD) and phase vocoder (PV) methods depending on the music excerpt genre. Classification of musical...

  • From Knowledge based Vision Systems to Cognitive Vision Systems: A Review

    Publikacja

    - Rok 2018

    Computer vision research and applications have their origins in 1960s. Limitations in computational resources inherent of that time, among other reasons, caused research to move away from artificial intelligence and generic recognition goals to accomplish simple tasks for constrained scenarios. In the past decades, the development in machine learning techniques has contributed to noteworthy progress in vision systems. However,...

    Pełny tekst do pobrania w portalu

  • The data exchange between smart glasses and healthcare information systems using the HL7 FHIR standard

    Publikacja

    - Rok 2016

    In this study we evaluated system architecture for the use of smart glasses as a viewer of information, as a source of medical data (vital sign measurements: temperature, pulse rate, and respiration rate), and as a filter of healthcare information. All activities were based on patient/device identification procedures using graphical markers or features based on visual appearance. The architecture and particular use cases were implemented...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model

    Publikacja

    - Rok 2013

    Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel- Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing...

  • Video content analysis in the urban area telemonitoring system

    Publikacja

    The task of constant monitoring of video streams from a large number of cameras and reviewing the recordings in order to find a specified event requires a considerable amount of time and effort from the system operators and it is prone to errors. A solution to this problem is an automatic system for constant analysis of camera images being able to raise an alarm if a predefined event is detected. The chapter presents various aspects...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Fully Automated AI-powered Contactless Cough Detection based on Pixel Value Dynamics Occurring within Facial Regions

    Publikacja

    - Rok 2021

    Increased interest in non-contact evaluation of the health state has led to higher expectations for delivering automated and reliable solutions that can be conveniently used during daily activities. Although some solutions for cough detection exist, they suffer from a series of limitations. Some of them rely on gesture or body pose recognition, which might not be possible in cases of occlusions, closer camera distances or impediments...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Network oscillations modulate interictal epileptiform spike rate during human memory

    Publikacja
    • J. Matsumoto
    • M. Stead
    • M. T. Kucewicz
    • A. Matsumoto
    • P. Peters
    • B. Brinkmann
    • J. C. Danstrom
    • S. Goerss
    • W. Marsh
    • F. Meyer
    • G. Worrell

    - Brain: A Journal of Neurology - Rok 2013

    Eleven patients being evaluated with intracranial electroencephalography for medically resistant temporal lobe epilepsy participated in a visual recognition memory task. Interictal epileptiform spikes were manually marked and their rate of occurrence compared between baseline and three 2 s periods spanning a 6 s viewing period. During successful, but not unsuccessful, encoding of the images there was a significant reduction in...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • “Shadow” vs. “Phase 3D” method within endoscopic examinations of marine engines

    Publikacja

    A visual investigation of surfaces creating internal, working spaces of marine combustion engines by means of specialized view-finders so called endoscopes is at present almost a basic method of technical diag-nostics. The surface structure of constructional material is visible during investigations like through the magnifying glass (usually with a precisely determined magnification), which makes possible a detection, recognition...

    Pełny tekst do pobrania w portalu

  • Convolutional Neural Networks for C. Elegans Muscle Age Classification Using Only Self-Learned Features

    Nematodes Caenorhabditis elegans (C. elegans) have been used as model organisms in a wide variety of biological studies, especially those intended to obtain a better understanding of aging and age-associated diseases. This paper focuses on automating the analysis of C. elegans imagery to classify the muscle age of nematodes based on the known and well established IICBU dataset. Unlike many modern classification methods, the proposed...

    Pełny tekst do pobrania w portalu

  • Metoda i algorytmy sterowania procesami miksowania dźwięku za pomocą gestów w oparciu o analizę obrazu wizyjnego

    Publikacja

    - Rok 2013

    Głównym celem rozprawy było opracowanie systemu miksowania dźwięku za pomocą gestów rąk wykonywanych w powietrzu oraz zbadanie możliwości oferowanych przez takie rozwiązanie w porównaniu ze współczesną metodą miksowania sygnałów fonicznych, wykorzystującą środowisko komputera. Opracowany system rozpoznaje zarówno dynamiczne jak i statyczne gesty rąk. Rozpoznawanie gestów dynamicznych zrealizowano w oparciu o metody logiki rozmytej...

  • SYNAT_MUSIC_GENRE_FV_173

    Dane Badawcze

    This is the original dataset containing 51582 music tracks (22 music genres) and 173 element-feature vector [1-6,9]. A collection of more than 50000 music excerpts described with a set of descriptors obtained through the analysis of 30-second mp3 recordings was gathered in a database called SYNAT. The SYNAT database was realized by the Gdansk University...

  • SYNAT Music Genre Parameters PCA 19

    Dane Badawcze

    The dataset contains feature vector after  Principal Component Analysis (PCA) performing, so there are 11 music genres and 19-element vector derived from music excerpts. Originally, a feature vector containing 173 elements was conceived in earlier research studies carried out by the team of authors [1-6]. A collection of 52532 music excerpts described...

  • SYNAT_PCA_48

    Dane Badawcze

    There is a series of datasets containing feature vectors derived from music tracks. The dataset contains 51582 music tracks (22 music genres) and feature vector after  Principal Component Analysis (PCA) performing, so there are 48-element vectors derived from music excerpts. Originally, a feature vector containing 173 elements was conceived in earlier...

  • SYNAT_PCA_11

    Dane Badawcze

    The dataset contains 51582 music tracks (22 music genres) and feature vector after  Principal Component Analysis (PCA) performing, so there are 11-element vectors derived from music excerpts. Originally, a feature vector containing 173 elements was conceived in earlier research studies carried out by the team of authors [1-6]. A collection of more than...