Search results for: speech recognition, allophone, phonology, foreign language, audio features
-
Specialist English as a foreign language for European public health: evaluation of competencies and needs among Polish and Lithuanian students
Publication -
US-China Foreign Language
Journals -
Combining visual and acoustic modalities to ease speech recognition by hearing impaired people
PublicationArtykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...
-
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
PublicationBrain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....
-
Chuzhdoezikovo Obuchenie-Foreign Language Teaching
Journals -
Electronic Journal of Foreign Language Teaching
Journals -
International Journal of Speech-Language Pathology (previously called Advances in Speech-Language Pathology)
Journals -
Puhe ja Kieli (Speech and Language)
Journals -
JOURNAL OF MEDICAL SPEECH-LANGUAGE PATHOLOGY
Journals -
LANGUAGE SPEECH AND HEARING SERVICES IN SCHOOLS
Journals -
AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY
Journals -
Journal of Speech, Language, and Hearing Research
Journals -
Andrzej Czyżewski prof. dr hab. inż.
PeopleProf. zw. dr hab. inż. Andrzej Czyżewski jest absolwentem Wydziału Elektroniki PG (studia magisterskie ukończył w 1982 r.). Pracę doktorską na temat związany z dźwiękiem cyfrowym obronił z wyróżnieniem na Wydziale Elektroniki PG w roku 1987. W 1992 r. przedstawił rozprawę habilitacyjną pt.: „Cyfrowe operacje na sygnałach fonicznych”. Jego kolokwium habilitacyjne zostało przyjęte jednomyślnie w czerwcu 1992 r. w Akademii Górniczo-Hutniczej...
-
Canadian Journal of Speech-Language Pathology and Audiology
Journals -
EURASIP Journal on Audio Speech and Music Processing
Journals -
Asian-Pacific Journal of Second and Foreign Language Education
Journals -
Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions
PublicationAutomatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...
-
Jan Daciuk dr hab. inż.
PeopleJan Daciuk received his M.Sc. from the Faculty of Electronics of Gdansk University of Technology in 1986, and his Ph.D. from the Faculty of Electronics, Telecommunications and Informatics of Gdańsk University of Technology in 1999. He has been working at the Faculty from 1988. His research interests include finite state methods in natural language processing and computational linguistics including speech processing. Dr. Daciuk...
-
Advances in Speech-Language Pathology (correct title: IJSLP)
Journals -
International Journal of Speech, Language and the Law: Forensic Linguistics
Journals -
Methodology and technology for the polymodal allophonic speech transcription
PublicationA method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...
-
Methodology and technology for the polymodal allophonic speech transcription
PublicationA method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...
-
Comparative Study of Self-Organizing Maps vs. Subjective Evaluation of Quality of Allophone Pronunciation for Nonnative English Speakers
PublicationThe purpose of this study was to apply Self-Organizing Maps to differentiate between the correct and the incorrect allophone pronunciations and to compare the results with subjective evaluation. Recordings of a list of target words, containing selected allophones of English plosive consonants, the velar nasal and the lateral consonant, were made twice. First, the target words were read from the list by 9 non-native speakers and...
-
Introduction to the special issue on machine learning in acoustics
PublicationWhen we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...
-
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
PublicationW referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
-
A comparative study of English viseme recognition methods and algorithms
PublicationAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...
-
A comparative study of English viseme recognition methods and algorithm
PublicationAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...
-
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
PublicationThis paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...
-
New approach for determining the QoS of MP3-coded voice signals in IP networks
PublicationPresent-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion that the quality of service (QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines an innovative, simple model that can evaluate the QoS of MP3-coded voice data transported...
-
Analysis of allophones based on audio signal recordings and parameterization
PublicationThe aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...
-
Speech Analytics Based on Machine Learning
PublicationIn this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...
-
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
PublicationThe main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...
-
IEEE Automatic Speech Recognition and Understanding Workshop
Conferences -
Biometria i przetwarzanie mowy 2023
e-Learning Courses{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
-
Biometria i przetwarzanie mowy 2024
e-Learning Courses{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
-
Investigating Feature Spaces for Isolated Word Recognition
PublicationMuch attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...
-
MODALITY corpus - SPEAKER 10 - SEQUENCE S1
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 10 - SEQUENCE S5
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 10 - SEQUENCE S3
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 10 - SEQUENCE S2
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 10 - SEQUENCE S4
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 10 - SEQUENCE S6
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results
PublicationThe goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S1
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S4
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S2
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S5
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S3
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S6
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
Auditory Brainstem Responses recorded employing Audio ABR device
Open Research DataThe dataset consists of ABR measurements employing click, burst and speech stimuli. Parameters of the particular stimuli were as follows: