Search results for: SPEECH SIGNAL

Search results for: SPEECH SIGNAL

results on page:
embed this view on your website

Filters

total: 350

clear all filters disabled

Speech Intelligibility Measurements in Auditorium
Publication
- K. Leo
- ACTA PHYSICA POLONICA A - Year 2010
Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

Full text available to download
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
Publication
- B. Kostek
- B. Szyca
- Journal of the Acoustical Society of America - Year 2023
The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

Full text available to download
A non-uniform real-time speech time-scale stretching method
Publication
- A. Kupryjanow
- A. Czyżewski
- Year 2011
An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...
Language Models in Speech Recognition
Publication
- J. Daciuk
- Year 2022
This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

Full text to download in external service
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
Publication
- D. Korzekwa
- R. Barra-Chicote
- S. Zaporowski
- G. Beringer
- J. Lorenzo-trueba
- A. Serafinowicz
- J. Droppo
- T. Drugman
- B. Kostek
- Year 2021
This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

Full text available to download
Zastosowanie spowalniania wypowiedzi w celu poprawy rozumienia mowy przez dzieci w szkole
Publication
- A. Kupryjanow
- A. Czyżewski
- Year 2009
This paper presents a time-scale modification algorithms that could be used for hearing impairment therapy supported by real-time speech stretching. In this paper the OLA based algorithms and Phase Vocoder were described. In the experimental part usability of those algorithms for real-time speech stretching was discussed
System Supporting Speech Perception in Special Educational Needs Schoolchildren
Publication
- A. Kupryjanow
- P. Suchomski
- P. Odya
- A. Czyżewski
- Year 2012
The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

Full text to download in external service
Virtual keyboard controlled by eye gaze employing speech synthesis
Publication
- B. Kunka
- R. Rybacki
- K. Łopatka
- A. Czyżewski
- B. Kostek
- Year 2010
The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...
Virtual Keyboard controlled by eye gaze employing speech synthesis
Publication
- K. Łopatka
- R. Rybacki
- B. Kunka
- A. Czyżewski
- B. Kostek
- Elektronika : konstrukcje, technologie, zastosowania - Year 2011
The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Full text to download in external service
Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej
Publication
- A. Czyżewski
- B. Kostek
- T. Ciszewski
- D. Majewicz
- Year 2013
The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...
An audio-visual corpus for multimodal automatic speech recognition
Publication
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017
review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Full text available to download
XVIII Międzynarodowe Sympozjum Inżynierii i Reżyserii Dźwięku
Publication
- P. Falkowski-Gilski
- S. Brachmański
- A. Dobrucki
- M. Kin
- Year 2021
The subjective assessment of speech signals takes into account previous experiences and habits of an individual. Since the perception process deteriorates with age, differences should be noticeable among people from dissimilar age groups. In this work, we investigated the difference of speech quality assessment between high school students and university students. The study involved 60 participants, with 30 people in both the adolescents...

Full text to download in external service
Creating new voices using normalizing flows
Publication
- P. Biliński
- T. Merritt
- A. Ezzerg
- K. Pokora
- S. Cygert
- K. Yanagisawa
- R. Barra-Chicote
- D. Korzekwa
- Year 2022
Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Full text available to download
PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS
Publication
- I. Kochańska
- H. Lasota
- Year 2015
The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...
Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems
Publication
- P. Sokólski
- T. A. Rutkowski
- Pomiary Automatyka Robotyka - Year 2013
The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

Full text available to download
Ranking Speech Features for Their Usage in Singing Emotion Classification
Publication
- S. Zaporowski
- B. Kostek
- Year 2020
This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

Full text available to download
Auditory-visual attention stimulator
Publication
- Year 2013
New approach to lateralization irregularities formation was proposed. The emphasis is put on the relationship between visual and auditory attention stimulation. In this approach hearing is stimulated using time scale modified speech and sight is stimulated by rendering the text of the currently heard speech. Moreover, displayed text is modified using several techniques i.e. zooming, highlighting etc. In the experimental part of...

Full text to download in external service
INVESTIGATION OF THE LOMBARD EFFECT BASED ON A MACHINE LEARNING APPROACH
Publication
- G. Korvel
- P. Treigys
- K. Kąkol
- B. Kostek
- International Journal of Applied Mathematics and Computer Science - Year 2023
The Lombard effect is an involuntary increase in the speaker’s pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters...

Full text available to download
Multimodal English corpus for automatic speech recognition
Publication
- Year 2013
A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...
Speech synthesis controlled by eye gazing
Publication
- A. Czyżewski
- K. Łopatka
- B. Kunka
- R. Rybacki
- B. Kostek
- Year 2010
A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...
Auditory Brainstem Responses recorded employing Audio ABR device
Open Research Data
open access
- P. Odya
- A. Czyżewski
The dataset consists of ABR measurements employing click, burst and speech stimuli. Parameters of the particular stimuli were as follows:
The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish
Publication
- S. Zaporowski
- Year 2024
The article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...

Full text available to download
Prof. Haitham Abu-Rub - A Visit to Poland's Gdansk University of Technology
Publication
- J. Guziński
- IEEE Industrial Electronics Magazine - Year 2015
Report on visit of Prof. Haitham Abu-Rub in Gdansk University of Technology. Speech on the Smart Grid Centre. Visit in the new smart grid laboratory of the GUT, the Laboratory for Innovative Power Technologies and Integration of Renewable Energy Sources (LINTE^2).

Full text available to download
Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling
Publication
- D. Korzekwa
- J. Lorenzo-trueba
- S. Zaporowski
- S. Calamaro
- T. Drugman
- B. Kostek
- Year 2021
A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...

Full text to download in external service
Modeling and Designing Acoustical Conditions of the Interior – Case Study
Publication
- Archives of Acoustics - Year 2016
The primary aim of this research study was to model acoustic conditions of the Courtyard of the Gdańsk University of Technology Main Building, and then to design a sound reinforcement system for this interior. First, results of measurements of the parameters of the acoustic field are presented. Then, the comparison between measured and predicted values using the ODEON program is shown. Collected data indicate a long reverberation...

Full text available to download
A comparative study of English viseme recognition methods and algorithms
Publication
- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018
An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

Full text available to download
A comparative study of English viseme recognition methods and algorithm
Publication
- D. Jachimski
- A. Czyżewski
- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018
An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

Full text available to download
Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech
Publication
- D. Piotrowski
- R. Korzeniowski
- A. Falai
- S. Cygert
- K. Pokora
- G. Tinchev
- Z. Zhang
- K. Yanagisawa
- Year 2023
In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

Full text to download in external service
Silence/noise detection for speech and music signals
Publication
- M. Papaj
- Year 2008
This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.
Database of speech and facial expressions recorded with optimized face motion capture settings
Publication
- A. Czyżewski
- M. Kawaler
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2019
The broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...

Full text available to download
Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System
Publication
- M. Zamłyńska
- P. Falkowski-Gilski
- G. Debita
- B. Miedziński
- Year 2021
Although there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...

Full text to download in external service
Intelligent multimedia solutions supporting special education needs.
Publication
- A. Czyżewski
- B. Kostek
- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011
The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
Evaluation Criteria for Affect-Annotated Databases
Publication
- Year 2015
In this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...

Full text to download in external service
Intelligent video and audio applications for learning enhancement
Publication
- A. Czyżewski
- B. Kostek
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2011
The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

Full text available to download
Results of tests on speech intelligibility in reverberant conditions
Open Research Data
open access
The dataset contains the results of tests that aimed to provide a relationship between the rate of speech (RoS) and reverberation conditions characterized by the Speech Transmission Index (STI).
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
Publication
- Year 2014
The problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
Publication
- Year 2014
The problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets
Publication
- Electronics - Year 2022
Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...

Full text available to download
Objectivization of phonological evaluation of speech elements by means of audio parametrization
Publication
- Year 2018
This study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Publication
- P. Rościszewski
- J. Kaliski
- Year 2017
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Full text to download in external service
DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING
Publication
- N. Rizun
- J. Taranenko
- Rocznik Naukowy Wydzialu Zarzadzania w Ciechanowie - Year 2017
The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

Full text available to download
Building Knowledge for the Purpose of Lip Speech Identification
Publication
- Advances in Intelligent Systems and Computing - Year 2017
Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

Full text to download in external service
MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES
Publication
- M. Piotrowska
- G. Korvel
- B. Kostek
- T. Ciszewski
- A. Czyżewski
- International Journal of Applied Mathematics and Computer Science - Year 2019
Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...

Full text available to download
Human-computer interactions in speech therapy using a blowing interface
Publication
- Year 2014
In this paper we present a new human-computer interface for the quantitative measurement of blowing activities. The interface can measure the air flow and air pressure during the blowing activity. The measured values are stored and used to control the state of the graphical objects in the graphical user interface. In speech therapy children will find easier to play attractive therapeutic games than to perform repetitive and tedious,...

Full text to download in external service
Vocalic Segments Classification Assisted by Mouth Motion Capture
Publication
- Year 2018
Visual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...

Full text to download in external service
A Device for Measuring Auditory Brainstem Responses to Audio
Publication
- Year 2018
Standard ABR devices use clicks and tone bursts to assess subjects’ hearing in an objective way. A new device was developed that extends the functionality of a standard ABR audiometer by collecting and analyzing auditory brainstem responses (ABR). The developed accessory allows for the use of complex sounds (e.g., speech or music excerpts) as stimuli. Therefore, it is possible to find out how efficiently different types of sounds...

Full text available to download
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
Publication
- P. Rościszewski
- Procedia Computer Science - Year 2017
In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

Full text available to download
Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów
Publication
- G. Korvel
- B. Kostek
- Year 2017
The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
Secured wired BPL voice transmission system
Publication
- G. Debita
- P. Falkowski-Gilski
- M. Habrych
- B. Miedziński
- J. Wandzio
- P. Jedlikowski
- Scientific Journal of the Military University of Land Forces - Year 2020
Designing a secured voice transmission system is not a trivial task. Wired media, thanks to their reliability and resistance to mechanical damage, seem an ideal solution. The BPL (Broadband over Power Line) cable is resistant to electricity stoppage and partial damage of phase conductors, ensuring continuity of transmission in case of an emergency. It seems an appropriate tool for delivering critical data, mostly clear and understandable...

Full text available to download
Communication Platform for Evaluation of Transmitted Speech Quality
Publication
- A. Ciarkowski
- A. Czyżewski
- Journal of Telecommunications and Information Technology - Year 2011
A voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...

Full text available to download

Search

Filters

Catalog

Search results for: SPEECH SIGNAL