Search results for: speech recognition systems

Search results for: speech recognition systems

results on page:
embed this view on your website

Filters

total: 7231

clear all filters disabled

displaying 1000 best results Help

Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems
Publication
- P. Sokólski
- T. A. Rutkowski
- Pomiary Automatyka Robotyka - Year 2013
The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

Full text available to download
Language Models in Speech Recognition
Publication
- J. Daciuk
- Year 2022
This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

Full text to download in external service
Multimodal English corpus for automatic speech recognition
Publication
- Year 2013
A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...
Speech recognition system for hearing impaired people.
Publication
- P. Dalka
- A. Czyżewski
- Year 2005
Praca przedstawia wyniki badań z zakresu rozpoznawania mowy. Tworzony system wykorzystujący dane wizualne i akustyczne będzie ułatwiał trening poprawnego mówienia dla osób po operacji transplantacji ślimaka i innych osób wykazujących poważne uszkodzenia słuchu. Active Shape models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na...
An audio-visual corpus for multimodal automatic speech recognition
Publication
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017
review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Full text available to download
Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning
Publication
- A. Czyżewski
- Journal of the Acoustical Society of America - Year 2023
Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...

Full text available to download
Examining Influence of Distance to Microphone on Accuracy of Speech Recognition
Publication
- Year 2015
The problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...

Full text to download in external service
Visual Lip Contour Detection for the Purpose of Speech Recognition
Publication
- Year 2014
A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...
Automatic Image and Speech Recognition Based on Neural Network
Publication
- D. Król
- B. Szlachetko
- Journal of Information Technology Research - Year 2010
Full text to download in external service
Audiovisual speech recognition for training hearing impaired patients
Publication
- Year 2006
Praca przedstawia system rozpoznawania izolowanych głosek mowy wykorzystujący dane wizualne i akustyczne. Modele Active Shape Models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na współczynnikach melcepstralnych. Sieć neuronowa została użyta do rozpoznawania wymawianych głosek na podstawie wektora cech zawierającego oba typy...
Comparison of Language Models Trained on Written Texts and Speech Transcripts in the Context of Automatic Speech Recognition
Publication
- S. Dziadzio
- A. Nabożny
- A. Smywiński-Pohl
- B. Ziółko
- Year 2015
Full text to download in external service
Auditory-model based robust feature selection for speech recognition
Publication
- C. Koniaris
- M. Kuropatwinski
- W. Kleijn
- M. Kuropatwiński
- Journal of the Acoustical Society of America - Year 2010
Full text to download in external service
Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition
Publication
- Year 2016
The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...
Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets
Publication
- Electronics - Year 2022
Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...

Full text available to download
Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition
Publication
- G. Korvel
- P. Treigys
- G. Tamulevicus
- J. Bernataviciene
- B. Kostek
- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2018
convolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word...
A survey of automatic speech recognition deep models performance for Polish medical terms
Publication
- Year 2023
Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

Full text to download in external service
Broadband interference in speech reinforcement systems
Publication
- H. Lasota
- R. Mazurek
- Year 2008
Artykuł podejmuje niedoceniany problem wpływu liczby i rozkładu głośników w systemach nagłośnienia, na jakość przekazu głosowego, czyli na zrozumiałość mowy w audytoriach. Superpozycji przesuniętych w czasie szerokopasmowych sygnałów o tym samym kształcie i lekko różnych wielkościach, które docierają do słuchacza z licznych spójnych źródeł, towarzyszy zjawisko interferencji prowadzące do głębokiej modyfikacji odbieranych sygnałów...
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
Publication
- G. Tamulevicius
- G. Korvel
- A. B. Yayak
- P. Treigys
- J. Bernataviciene
- B. Kostek
- Electronics - Year 2020
In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

Full text available to download
Combining visual and acoustic modalities to ease speech recognition by hearing impaired people
Publication
- B. Kostek
- P. Dalka
- Year 2005
Artykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...
Semantic Integration of Heterogeneous Recognition Systems
Publication
- P. Kaczmarek
- P. Raszkowski
- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011
Computer perception of real-life situations is performed using a variety of recognition techniques, including video-based computer vision, biometric systems, RFID devices and others. The proliferation of recognition modules enables development of complex systems by integration of existing components, analogously to the Service Oriented Architecture technology. In the paper, we propose a method that enables integration of information...
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
Publication
- Year 2014
The problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
Publication
- Year 2014
The problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
Publication
- J. S. Garcia Salinas
- A. A. Torres-García
- C. A. Reyes-Garćia
- L. Villaseñor-Pineda
- Biomedical Signal Processing and Control - Year 2023
Brain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....

Full text to download in external service
Camera angle invariant shape recognition in surveillance systems
Publication
- D. Ellwart
- A. Czyżewski
- Year 2010
A method for human action recognition in surveillance systems is described. Problems within this task are discussed and a solution based on 3D object models is proposed. The idea is shown and some of its limitations are talked over. Shape description methods are introduced along with their main features. Utilized parameterization algorithm is presented. Classification problem, restricted to bi-nary cases is discussed. Support vector...
Mutual recognition of certification systems: The case of SERMO and ACLES
Publication
- J. Zabala-Delgado
- B. Sawicka
- Language Learning in Higher Education - Year 2019
Full text to download in external service
A Framework for Training and Testing of Complex Pattern Recognition Systems
Publication
- M. Smiatacz
- K. Przybycień
- Year 2011
W pracy przedstawiono szkielet aplikacji stworzony po to, by uprościć konstruowanie systemów rozpoznawania obrazów oraz zapewnić środowisko testowe umożliwiające ocenę algorytmów przy użyciu dużych zestawów danych. Jasno zdefiniowana architektura wraz z wieloma gotowymi do użycia modułami pozwala skoncentrować się na implementacji najważniejszych algorytmów. Szkielet wspiera tworzenie modułów, który mogą być wielokrotnie używane,...
Oscillators with anionic surfactants as systems for molecular recognition of taste substances
Publication
- M. Szpakowska
- A. Magnuszewska
- Year 2005
Zaprezentowano, charakterystyki oscylacyjne układów trojfazowych, ktore mogą byc uzyte do rozpoznawania substancji smakowych. Zaobserwowano, że zmiany osylacyjne róznicy potencjału elektrycznego miedzy fazami wodnymi zależą od rodzaju substancji smakowej obecnej w układzie. Ich rózne wartosci poczatkowe są ważną cechą z punktu mozliwości zastosowania tych układów w sensorach smaku. Dla kazdego układu wyznaczono portrety fazowe...
Application of Syntactic Pattern Recognition Approach in Design and Optimisation of Group Machining Systems
Publication
- M. Siemiatkowski
- Solid State Phenomena - Year 2010
Full text to download in external service
Application of Syntactic Pattern Recognition Approach in Design and Optimisation of Group Machining Systems
Publication
- M. Siemiątkowski
- Solid State Phenomena - Year 2010
Rozwinięto koncepcję budowy zoptymalizowanych struktur systemów wytwarzania grupowego spektrum części z wykorzystaniem modelu analizy syntaktycznej sekwencji operacji ich procesów technologicznych. Określono formułę metryki odległościowej opisu stopnia zróżnicowania marszrut indywidualnych procesów oraz testowano jej skuteczność w aspekcie eksploracji wielowymiarowych danych i klasteryzacji obiektów wg cech wymagań technologicznych....

Full text to download in external service
Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej
Publication
- A. Czyżewski
- B. Kostek
- T. Ciszewski
- D. Majewicz
- Year 2013
The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...
On the new possibility of applying oscillating liquid membrane systems for melecular recognition substances responsible for taste.
Publication
- M. Szpakowska
- E. Płocharska-Jankowska
- O. B.nagy
- Year 2005
Sugerowano wczesniej, że układy osylacyjne z membraną ciekłą mogą być uzyte do opracowania sensora smaku. zbadano wpływ substancji odpowiedzialnych za smak należacych do czterech klas samku na charakterystyki osylacyjne oscylatora z mambrana ciekłą i kationowym surfaktantem chlorkiem benzylodimetylotetradecyloamonionowym. Wykazano,że niezaleznie od natury rozpuszczalnika organicznego w membranie ciekłej charkterystyki oscylacyjne...
Oscillating water-oil-water liquid membrane systems for molecular recognition of substances belonging to diferent taste classes
Publication
- Year 2005
Badano oscylacje róznicy potencjału elektrochemicznego miedzy fazami wodnymi. Jedna faza wodna zawiera kationowy lub anionowy surfaktant podczas gdy w drugiej fazie wodnej znajduje sie substancja odpowiedzialna za smak. Dwie fazy wodne sa rozdzielone faza olejową. Oscylacje były analizowane poprzez konstrukcje portretów fazowych uzywając metody opoznienia czasowego. Kształt portretów fazowych jest rozny dla oscylatorów z kationowym...
Automatic Watercraft Recognition and Identification on Water Areas Covered by Video Monitoring as Extension for Sea and River Traffic Supervision Systems
Publication
- N. Wawrzyniak
- A. Stateczny
- Polish Maritime Research - Year 2018
The article presents the watercraft recognition and identification system as an extension for the presently used visual water area monitoring systems, such as VTS (Vessel Traffic Service) or RIS (River Information Service). The watercraft identification systems (AIS - Automatic Identification Systems) which are presently used in both sea and inland navigation require purchase and installation of relatively expensive transceivers...

Full text to download in external service
IEEE Automatic Speech Recognition and Understanding Workshop

Conferences
ISCA Tutorial and Research Workshop Automatic Speech Recognition

Conferences
Badanie rozkładów parametrów sygnału mowy w zastosowaniach do prognozowania prawdopodobieństwa popełnienia błędów w systemach identyfikacji mówców = Examining distribution of speech signal parameters for the prognosis of error probability in speaker verification systems
Publication
- A. Kaczmarek
- Year 2010
Przedmiotem pracy jest system identyfikacji mówców w sposób zależny od tekstu ("text dependent''). Dokonano analizy wielu różnych wypowiedzi kilkudziesięciu mówców. Zastosowana metoda parametryzacji to metoda oparta na wynikach analizy cepstralnej sygnału mowy. Zdefiniowane zostały nowe parametry skojarzone z elementarnymi zdarzeniami w procesie weryfikacji mówców. Na tej podstawie dokonano estymacji funkcji gęstości prawdopodobieństwa...
International Workshop on Pattern Recognition in Information Systems

Conferences
Andrzej Czyżewski prof. dr hab. inż.

People

Department of Multimedia Systems

Prof. zw. dr hab. inż. Andrzej Czyżewski jest absolwentem Wydziału Elektroniki PG (studia magisterskie ukończył w 1982 r.). Pracę doktorską na temat związany z dźwiękiem cyfrowym obronił z wyróżnieniem na Wydziale Elektroniki PG w roku 1987. W 1992 r. przedstawił rozprawę habilitacyjną pt.: „Cyfrowe operacje na sygnałach fonicznych”. Jego kolokwium habilitacyjne zostało przyjęte jednomyślnie w czerwcu 1992 r. w Akademii Górniczo-Hutniczej...
Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions
Publication
- Year 2016
Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Full text to download in external service
Artur Gańcza dr inż.

People

Department of Marine Electronic Systems

I received the M.Sc. degree from the Gdańsk University of Technology (GUT), Gdańsk, Poland, in 2019. I am currently a Ph.D. student at GUT, with the Department of Automatic Control, Faculty of Electronics, Telecommunications and Informatics. My professional interests include speech recognition, system identification, adaptive signal processing and linear algebra.
The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish
Publication
- S. Zaporowski
- Year 2024
The article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...

Full text available to download
PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS
Publication
- I. Kochańska
- H. Lasota
- Year 2015
The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...
A comparative study of English viseme recognition methods and algorithms
Publication
- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018
An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

Full text available to download
A comparative study of English viseme recognition methods and algorithm
Publication
- D. Jachimski
- A. Czyżewski
- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018
An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

Full text available to download
Voice command recognition using hybrid genetic algorithm
Publication
- M. Wroniszewska
- J. Dziedzic
- TASK Quarterly - Year 2010
Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

Full text available to download
Decoding imagined speech for EEG-based BCI
Publication
- C. A. Reyes-García
- A. A. Torres-García
- T. Hernández-del-Toro
- J. S. Garcia Salinas
- L. Villaseñor-Pineda
- Year 2024
Brain–computer interfaces (BCIs) are systems that transform the brain's electrical activity into commands to control a device. To create a BCI, it is necessary to establish the relationship between a certain stimulus, internal or external, and the brain activity it provokes. A common approach in BCIs is motor imagery, which involves imagining limb movement. Unfortunately, this approach allows few commands. As an alternative, this...

Full text to download in external service
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
Publication
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Year 2016
W referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
Bożena Kostek prof. dr hab. inż.

People

Laboratorium Akustyki Fonicznej
Jan Daciuk dr hab. inż.

People

Department of Intelligent Interactive Systems

Jan Daciuk received his M.Sc. from the Faculty of Electronics of Gdansk University of Technology in 1986, and his Ph.D. from the Faculty of Electronics, Telecommunications and Informatics of Gdańsk University of Technology in 1999. He has been working at the Faculty from 1988. His research interests include finite state methods in natural language processing and computational linguistics including speech processing. Dr. Daciuk...
Vocalic Segments Classification Assisted by Mouth Motion Capture
Publication
- Year 2018
Visual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...

Full text to download in external service

Search

Filters

Catalog

Search results for: speech recognition systems

Andrzej Czyżewski prof. dr hab. inż.

Artur Gańcza dr inż.

Bożena Kostek prof. dr hab. inż.

Jan Daciuk dr hab. inż.