Filters
total: 7231
-
Catalog
- Publications 5005 available results
- Journals 510 available results
- Conferences 262 available results
- People 321 available results
- Inventions 1 available results
- Projects 22 available results
- Research Teams 1 available results
- Research Equipment 2 available results
- e-Learning Courses 241 available results
- Events 7 available results
- Open Research Data 859 available results
displaying 1000 best results Help
Search results for: speech recognition systems
-
Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems
PublicationThe aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...
-
Language Models in Speech Recognition
PublicationThis chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.
-
Multimodal English corpus for automatic speech recognition
PublicationA multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...
-
Speech recognition system for hearing impaired people.
PublicationPraca przedstawia wyniki badań z zakresu rozpoznawania mowy. Tworzony system wykorzystujący dane wizualne i akustyczne będzie ułatwiał trening poprawnego mówienia dla osób po operacji transplantacji ślimaka i innych osób wykazujących poważne uszkodzenia słuchu. Active Shape models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na...
-
An audio-visual corpus for multimodal automatic speech recognition
Publicationreview of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...
-
Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning
PublicationText-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...
-
Examining Influence of Distance to Microphone on Accuracy of Speech Recognition
PublicationThe problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...
-
Visual Lip Contour Detection for the Purpose of Speech Recognition
PublicationA method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...
-
Automatic Image and Speech Recognition Based on Neural Network
Publication -
Audiovisual speech recognition for training hearing impaired patients
PublicationPraca przedstawia system rozpoznawania izolowanych głosek mowy wykorzystujący dane wizualne i akustyczne. Modele Active Shape Models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na współczynnikach melcepstralnych. Sieć neuronowa została użyta do rozpoznawania wymawianych głosek na podstawie wektora cech zawierającego oba typy...
-
Comparison of Language Models Trained on Written Texts and Speech Transcripts in the Context of Automatic Speech Recognition
Publication -
Auditory-model based robust feature selection for speech recognition
Publication -
Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition
PublicationThe problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...
-
Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets
PublicationArtificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...
-
Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition
Publicationconvolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word...
-
A survey of automatic speech recognition deep models performance for Polish medical terms
PublicationAmong the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....
-
Broadband interference in speech reinforcement systems
PublicationArtykuł podejmuje niedoceniany problem wpływu liczby i rozkładu głośników w systemach nagłośnienia, na jakość przekazu głosowego, czyli na zrozumiałość mowy w audytoriach. Superpozycji przesuniętych w czasie szerokopasmowych sygnałów o tym samym kształcie i lekko różnych wielkościach, które docierają do słuchacza z licznych spójnych źródeł, towarzyszy zjawisko interferencji prowadzące do głębokiej modyfikacji odbieranych sygnałów...
-
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
PublicationIn this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...
-
Combining visual and acoustic modalities to ease speech recognition by hearing impaired people
PublicationArtykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...
-
Semantic Integration of Heterogeneous Recognition Systems
PublicationComputer perception of real-life situations is performed using a variety of recognition techniques, including video-based computer vision, biometric systems, RFID devices and others. The proliferation of recognition modules enables development of complex systems by integration of existing components, analogously to the Service Oriented Architecture technology. In the paper, we propose a method that enables integration of information...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublicationThe problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublicationThe problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
PublicationBrain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....
-
Camera angle invariant shape recognition in surveillance systems
PublicationA method for human action recognition in surveillance systems is described. Problems within this task are discussed and a solution based on 3D object models is proposed. The idea is shown and some of its limitations are talked over. Shape description methods are introduced along with their main features. Utilized parameterization algorithm is presented. Classification problem, restricted to bi-nary cases is discussed. Support vector...
-
Mutual recognition of certification systems: The case of SERMO and ACLES
Publication -
A Framework for Training and Testing of Complex Pattern Recognition Systems
PublicationW pracy przedstawiono szkielet aplikacji stworzony po to, by uprościć konstruowanie systemów rozpoznawania obrazów oraz zapewnić środowisko testowe umożliwiające ocenę algorytmów przy użyciu dużych zestawów danych. Jasno zdefiniowana architektura wraz z wieloma gotowymi do użycia modułami pozwala skoncentrować się na implementacji najważniejszych algorytmów. Szkielet wspiera tworzenie modułów, który mogą być wielokrotnie używane,...
-
Oscillators with anionic surfactants as systems for molecular recognition of taste substances
PublicationZaprezentowano, charakterystyki oscylacyjne układów trojfazowych, ktore mogą byc uzyte do rozpoznawania substancji smakowych. Zaobserwowano, że zmiany osylacyjne róznicy potencjału elektrycznego miedzy fazami wodnymi zależą od rodzaju substancji smakowej obecnej w układzie. Ich rózne wartosci poczatkowe są ważną cechą z punktu mozliwości zastosowania tych układów w sensorach smaku. Dla kazdego układu wyznaczono portrety fazowe...
-
Application of Syntactic Pattern Recognition Approach in Design and Optimisation of Group Machining Systems
Publication -
Application of Syntactic Pattern Recognition Approach in Design and Optimisation of Group Machining Systems
PublicationRozwinięto koncepcję budowy zoptymalizowanych struktur systemów wytwarzania grupowego spektrum części z wykorzystaniem modelu analizy syntaktycznej sekwencji operacji ich procesów technologicznych. Określono formułę metryki odległościowej opisu stopnia zróżnicowania marszrut indywidualnych procesów oraz testowano jej skuteczność w aspekcie eksploracji wielowymiarowych danych i klasteryzacji obiektów wg cech wymagań technologicznych....
-
Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej
PublicationThe bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...
-
On the new possibility of applying oscillating liquid membrane systems for melecular recognition substances responsible for taste.
PublicationSugerowano wczesniej, że układy osylacyjne z membraną ciekłą mogą być uzyte do opracowania sensora smaku. zbadano wpływ substancji odpowiedzialnych za smak należacych do czterech klas samku na charakterystyki osylacyjne oscylatora z mambrana ciekłą i kationowym surfaktantem chlorkiem benzylodimetylotetradecyloamonionowym. Wykazano,że niezaleznie od natury rozpuszczalnika organicznego w membranie ciekłej charkterystyki oscylacyjne...
-
Oscillating water-oil-water liquid membrane systems for molecular recognition of substances belonging to diferent taste classes
PublicationBadano oscylacje róznicy potencjału elektrochemicznego miedzy fazami wodnymi. Jedna faza wodna zawiera kationowy lub anionowy surfaktant podczas gdy w drugiej fazie wodnej znajduje sie substancja odpowiedzialna za smak. Dwie fazy wodne sa rozdzielone faza olejową. Oscylacje były analizowane poprzez konstrukcje portretów fazowych uzywając metody opoznienia czasowego. Kształt portretów fazowych jest rozny dla oscylatorów z kationowym...
-
Automatic Watercraft Recognition and Identification on Water Areas Covered by Video Monitoring as Extension for Sea and River Traffic Supervision Systems
PublicationThe article presents the watercraft recognition and identification system as an extension for the presently used visual water area monitoring systems, such as VTS (Vessel Traffic Service) or RIS (River Information Service). The watercraft identification systems (AIS - Automatic Identification Systems) which are presently used in both sea and inland navigation require purchase and installation of relatively expensive transceivers...
-
IEEE Automatic Speech Recognition and Understanding Workshop
Conferences -
ISCA Tutorial and Research Workshop Automatic Speech Recognition
Conferences -
Badanie rozkładów parametrów sygnału mowy w zastosowaniach do prognozowania prawdopodobieństwa popełnienia błędów w systemach identyfikacji mówców = Examining distribution of speech signal parameters for the prognosis of error probability in speaker verification systems
PublicationPrzedmiotem pracy jest system identyfikacji mówców w sposób zależny od tekstu ("text dependent''). Dokonano analizy wielu różnych wypowiedzi kilkudziesięciu mówców. Zastosowana metoda parametryzacji to metoda oparta na wynikach analizy cepstralnej sygnału mowy. Zdefiniowane zostały nowe parametry skojarzone z elementarnymi zdarzeniami w procesie weryfikacji mówców. Na tej podstawie dokonano estymacji funkcji gęstości prawdopodobieństwa...
-
International Workshop on Pattern Recognition in Information Systems
Conferences -
Andrzej Czyżewski prof. dr hab. inż.
PeopleProf. zw. dr hab. inż. Andrzej Czyżewski jest absolwentem Wydziału Elektroniki PG (studia magisterskie ukończył w 1982 r.). Pracę doktorską na temat związany z dźwiękiem cyfrowym obronił z wyróżnieniem na Wydziale Elektroniki PG w roku 1987. W 1992 r. przedstawił rozprawę habilitacyjną pt.: „Cyfrowe operacje na sygnałach fonicznych”. Jego kolokwium habilitacyjne zostało przyjęte jednomyślnie w czerwcu 1992 r. w Akademii Górniczo-Hutniczej...
-
Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions
PublicationAutomatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...
-
Artur Gańcza dr inż.
PeopleI received the M.Sc. degree from the Gdańsk University of Technology (GUT), Gdańsk, Poland, in 2019. I am currently a Ph.D. student at GUT, with the Department of Automatic Control, Faculty of Electronics, Telecommunications and Informatics. My professional interests include speech recognition, system identification, adaptive signal processing and linear algebra.
-
The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish
PublicationThe article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...
-
PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS
PublicationThe quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...
-
A comparative study of English viseme recognition methods and algorithms
PublicationAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...
-
A comparative study of English viseme recognition methods and algorithm
PublicationAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...
-
Voice command recognition using hybrid genetic algorithm
PublicationAbstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...
-
Decoding imagined speech for EEG-based BCI
PublicationBrain–computer interfaces (BCIs) are systems that transform the brain's electrical activity into commands to control a device. To create a BCI, it is necessary to establish the relationship between a certain stimulus, internal or external, and the brain activity it provokes. A common approach in BCIs is motor imagery, which involves imagining limb movement. Unfortunately, this approach allows few commands. As an alternative, this...
-
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
PublicationW referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
-
Bożena Kostek prof. dr hab. inż.
People -
Jan Daciuk dr hab. inż.
PeopleJan Daciuk received his M.Sc. from the Faculty of Electronics of Gdansk University of Technology in 1986, and his Ph.D. from the Faculty of Electronics, Telecommunications and Informatics of Gdańsk University of Technology in 1999. He has been working at the Faculty from 1988. His research interests include finite state methods in natural language processing and computational linguistics including speech processing. Dr. Daciuk...
-
Vocalic Segments Classification Assisted by Mouth Motion Capture
PublicationVisual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...