Filters
total: 564
filtered: 361
Search results for: audio-visual speech recognition system
-
Audio content analysis in the urban area telemonitoring system
PublicationArtykuł przedstawia możliwości rozwinięcie monitoringu miejskiego o automatyczną analizę dźwięku. Przedstawiono metody parametryzacji dźwięku, które możliwe są do zastosowania w takim systemie oraz omówiono aspekty techniczne implementacji. W kolejnej części przedstawiono system decyzyjny oparty na drzewach zastosowany w systemie. System ten rozpoznaje dźwięki niebezpieczne (strzał, rozbita szyba, krzyk) wśród dźwięków zarejestrowanych...
-
Quality Analysis of Audio-Video Transmission in an OFDM-Based Communication System
PublicationApplication of a reliable audio-video communication system, brings many advantages. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. With the availability of visual information one can monitor the surrounding, working environment, etc. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission. Currently, orthogonal frequency...
-
System Supporting Speech Perception in Special Educational Needs Schoolchildren
PublicationThe system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.
-
Guido: a musical score recognition system
PublicationThis paper presents an optical music recognition system Guido that can automatically recognize the main musical symbols of music scores that were scanned or taken by a digital camera. The application is based on object model of musical notation and uses linguistic approach for symbol interpretation and error correction. The system offers musical editor with a partially automatic error correction.
-
Vowel recognition based on acoustic and visual features
PublicationW artykule zaprezentowano metodę, która może ułatwić naukę mowy dla osób z wadami słuchu. Opracowany system rozpoznawania samogłosek wykorzystuje łączną analizę parametrów akustycznych i wizualnych sygnału mowy. Parametry akustyczne bazują na współczynnikach mel-cepstralnych. Do wyznaczenia parametrów wizualnych z kształtu i ruchu ust zastosowano Active Shape Models. Jako klasyfikator użyto sztuczną sieć neuronową. Działanie systemu...
-
Examining Classifiers Applied to Static Hand Gesture Recognition in Novel Sound Mixing System
PublicationThe main objective of the chapter is to present the methodology and results of examining various classifiers (Nearest Neighbor-like algorithm with non-nested generalization (NNge), Naive Bayes, C4.5 (J48), Random Tree, Random Forests, Artificial Neural Networks (Multilayer Perceptron), Support Vector Machine (SVM) used for static gesture recognition. A problem of effective gesture recognition is outlined in the context of the system...
-
Automatic Image and Speech Recognition Based on Neural Network
Publication -
Audiovisual speech recognition for training hearing impaired patients
PublicationPraca przedstawia system rozpoznawania izolowanych głosek mowy wykorzystujący dane wizualne i akustyczne. Modele Active Shape Models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na współczynnikach melcepstralnych. Sieć neuronowa została użyta do rozpoznawania wymawianych głosek na podstawie wektora cech zawierającego oba typy...
-
System for automatic singing voice recognition
PublicationW artykule przedstawiono system automatycznego rozpoznawania jakości i typu głosu śpiewaczego. Przedstawiono bazę danych oraz zaimplementowane parametry. Algorytmem decyzyjnym jest algorytm sztucznych sieci neuronowych. Wytrenowany system decyzyjny osiąga skuteczność ok. 90% w obydwu kategoriach rozpoznawania. Dodatkowo wykazano przy pomocy metod statystycznych, że wyniki działania systemu automatycznej oceny jakości technicznej...
-
Comparison of Language Models Trained on Written Texts and Speech Transcripts in the Context of Automatic Speech Recognition
Publication -
Combined Single Neuron Unit Activity and Local Field Potential Oscillations in a Human Visual Recognition Memory Task
PublicationGOAL: Activities of neuronal networks range from action potential firing of individual neurons, coordinated oscillations of local neuronal assemblies, and distributed neural populations. Here, we describe recordings using hybrid electrodes, containing both micro- and clinical macroelectrodes, to simultaneously sample both large-scale network oscillations and single neuron spiking activity in the medial temporal lobe structures...
-
Automatic audio signal mixing system based on one-dimensional Wave-U-Net autoencoders
PublicationThe purpose of this dissertation is to develop an automatic song mixing system that is capable of automatically mixing a song with good quality in any music genre. This work recalls first the audio signal processing methods used in audio mixing, and it describes selected methods for automatic audio mixing. Then, a novel architecture built based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. Models...
-
Auditory-model based robust feature selection for speech recognition
Publication -
Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System
PublicationThe broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...
-
Human-Computer Interface Based on Visual Lip Movement and Gesture Recognition
PublicationThe multimodal human-computer interface (HCI) called LipMouse is presented, allowing a user to work on a computer using movements and gestures made with his/her mouth only. Algorithms for lip movement tracking and lip gesture recognition are presented in details. User face images are captured with a standard webcam. Face detection is based on a cascade of boosted classifiers using Haar-like features. A mouth region is located in...
-
Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System
PublicationAlthough there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...
-
A new approach to visual system testing
PublicationOpisano budowę laboratoryjnego stanowiska prac bawczych nad perymetrią obiektywną. Przedstawiono zasadę działania algorytmu VEPDA oraz wyniki działania VEPDA na danych eksperymentalnych.
-
A system for multitask noisy speech enhancement.
PublicationW artykule przedstawiono ogolną charakterystyke opracowanego systemu rejestracji i rekonstrukcji mowy. Artykuł zawiera opis składników systemu, ktory jest oprogramowaniem zawierającym zaawansowane narzędzia służące poprawie zrozumiałości mowy. Zaimplementowane narzędzia systemu umożliwiają wyszukiwanie nagrań dźwiękowych i ich obróbkę przy pomocy zaimplementowanych pluginów. W artykule przedstawione wykorzystane w systemie algorytmy...
-
Multitask Noisy Speech Enhancement System
PublicationW referacie opisano Wielozadaniowy System Poprawy Jakości Sygnału Mowy. Jest to wyspecjalizowany pakiet oprogramowania przeznaczony do rejestrowania sygnału mowy i do poprawy jego jakości oraz zrozumiałości mowy, przy użyciu zaawansowanych procedur cyfrowego przetwarzania sygnału. Pakiet oprogramowania składa się z programów: Rejestrator, Przeglądarka oraz Rekonstruktor. Oprogramowanie to może być użyte w przypadkach, gdy zrozumiałość...
-
Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
PublicationIn order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...
-
Wireless intelligent audio-video surveillance prototyping system
PublicationThe presented system is based on the Virtex6 FPGA and several supporting devices like a fast DDR3 memory, small HD camera, microphone with A/D converter, WiFi radio communication module, etc. The system is controlled by the Linux operating system. The Linux drivers for devices implemented in the system have been prepared. The system has been successfully verified in a H.264 compression accelerator prototype in which the most demanding...
-
AAM toolkit: a system for visual object appearance modeling
PublicationAktywne modele wyglądu (AAM) mogą być traktowane jako zaawansowana metoda analizy informacji multimedialnych, pozwalająca na lokalizowanie i rozpoznawanie obiektów w obrazach statycznych i sekwencjach wideo. Pomimo tego że ukazało się wiele publikacji dotyczących AAM, przejście od koncepcji teoretycznych do działającej implementacji stanowi nadal duże wyzwanie. W pracy przedstawiono przygotowany przez autorów pakiet oprogramowania...
-
Human Tracking in Multi-camera Visual Surveillance System
PublicationArtykuł prezentuje krótkie podsumowanie wykorzystywanych technologii z dziedziny śledzenia osób z wykorzystaniem inteligentnych systemów bezpieczeństwa. Opisane w niniejszym opracowaniu systemy rozpoznawania twarzy, w połączeniu ze śledzeniem osób, nie mają na celu rozpoznawania tożsamości osób. Nie powstaje żadna baza danych łącząca cechy biometryczne z konkretnymi osobami, co sprawia że przestrzegane jest prawo w zakresie ochrony...
-
Visual measurement system for chosen parameters of current collectors
PublicationA current collector is an element in a traction vehicle, used for movable, contact con-nection between the main circuit of this vehicle and the contact line. Bad technical condi-tion of these important elements in the main circuit of the vehicle result in its incorrect cooperation with contact line. Periodic control of technical condition of current collectors ensures infallible current collection. The pantograph diagnostics consists...
-
Real-time working gas recognition system based on the array of semiconductor gas sensors and portable computer Raspberry PI
PublicationThe gas-analyzing systems based on the array of partially selective gas sensors and pattern-recognition techniques are potentially fast and low-cost alternative for other devices, like gas analysers. They give the possibility of recognition the type and the concentration of measured volatile compounds in their working environment. In this work we present the implementation of gas recognition system, in which the signals from an...
-
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
PublicationBrain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....
-
System of speech signal processing and visualisation for linguistic purposes
Publication -
Advanced speech archiving and restoration system for aviation applications
PublicationW referacie przedstawiono opracowany System Rejestracji I Rekonstrukcji Mowy dla potrzeb lotnictwa. System ten umożliwia jednoczesny zapis, archiwizację i poprawę zrozumiałości sygnału mowy pochodzącego z wielu różnych kanałów komunikacji radiowej. Głównym celem systemu jest rejestracja i rekonstrukcja komunikatów słownych wymienianych drogą radiową pomiędzy pilotem samolotu a stacją kontroli lotów - jest to niezwykle istotne w...
-
Visual Object Tracking System Employing Fixed and PTZ Cameras
PublicationThe paper presents a video monitoring system utilizing fixed and PTZ cameras for tracking of moving objects. First type of camera provides image for background modelling, being employed for foreground objects localization. Estimated objects locations are then utilised for steering of PTZ cameras when observing targeted objects with high close-ups. Objects are classified into several classes, then basic event detection is being...
-
Secure transmission of visual image in the VTS system using fingerprinting
Publication -
Camera-based Automatic System for Tool Measurements and Recognition
Publication -
Versatile pattern recognition system based on Fisher criterion
PublicationZaprezentowano system rozpoznawania obrazów w postaci bitmap. Zaimplementowany algorytm ekstrakcji cech jest uniwersalny i może być używany do różnych obrazów. Cały system bazuje na kryterium Fishera.
-
Face recognition by humans with gaze-tracking system Cyber-Eye
PublicationW celu dokładniejszego zrozumienia sposobu rozpoznawania i zapamiętywania twarzy przez człowieka przeprowadzono doświadczenie na grupie 20 osób z wykorzystaniem wcześniej opracowanego systemu śledzenia fiksacji wzroku Cyber-Oko [3]. Wykorzystując diody i kamerę podczerwieni wraz z dedykowanym oprogramowaniem Cyber-Oko, które pozwala na śledzenie punktu skupienia wzroku na ekranie. Każdej osobie biorącej udział w doświadczeniu pokazano...
-
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych = System of speech signal processing and visualisation of the results
PublicationW artykule przedstawiono sposób przetwarzania i wizualizacji sygnału mowy w formie prostego w obsłudze i relatywnie niedrogiego urządzenia do nagrywania sygnału akustycznego oraz przetwarzania cyfrowego wyselekcjonowanych fragmentów i wizualizacji uzyskanych rezultatów przekształceń. Zastosowano do tego celu komputer z kartą dźwiękową. Przetwarzanie cyfrowe oraz wizualizacja dokonywana była w oparciu o program MATLAB bezpośrednio...
-
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych [A system of speech signal processing and visualisation for linguistic purposes]
Publication -
Developing a Low SNR Resistant, Text Independent Speaker Recognition System for Intercom Solutions - A Case Study
PublicationThis article presents a case study on the development of a biometric voice verification system for an intercom solution, utilizing the DeepSpeaker neural network architecture. Despite the variety of solutions available in the literature, there is a noted lack of evaluations for "text-independent" systems under real conditions and with varying distances between the speaker and the microphone. This article aims to bridge this gap....
-
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
PublicationW referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
-
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders
PublicationThe purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...
-
Objectivization of audio-video correlation assessment experiments
PublicationThe purpose of this paper is to present a new method of conducting an audio-visual correlation analysis employing a head-motion-free gaze tracking system. First, a review of related works in the domain of sound and vision correlation is presented. Then assumptions concerning audio-visual scene creation are shortly described. The objectivization process of carrying out correlation tests employing gaze-tracking system is outlined....
-
Introduction to the special issue on machine learning in acoustics
PublicationWhen we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...
-
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
PublicationThis paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...
-
The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish
PublicationThe article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...
-
Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor
PublicationSpatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...
-
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
PublicationThe main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...
-
Investigating Feature Spaces for Isolated Word Recognition
PublicationMuch attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...
-
Gesture-controlled Sound Mixing System With a Sonified Interface
PublicationIn this paper the Authors present a novel approach to sound mixing. It is materialized in a system that enables to mix sound with hand gestures recognized in a video stream. The system has been developed in such a way that mixing operations can be performed both with or without visual support. To check the hypothesis that the mixing process needs only an auditory display, the influence of audio information visualization on sound...
-
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
PublicationThe speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...
-
WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE
PublicationW niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...
-
Intelligent multimedia solutions supporting special education needs.
PublicationThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
-
Intelligent video and audio applications for learning enhancement
PublicationThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....