Search results for: speech analysis

Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System

Publication

M. Zamłyńska
P. Falkowski-Gilski
G. Debita
B. Miedziński

- Year 2021

Although there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...

Full text to download in external service

Automated detection of pronunciation errors in non-native English speech employing deep learning

Publication

D. Korzekwa

- Year 2023

Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

Full text available to download

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
S. Calamaro
B. Kostek

- Year 2021

We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

Full text available to download

The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish

Publication

S. Zaporowski

- Year 2024

The article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...

Full text available to download

Estimation of the short-term predictor parameters of speech under noisy conditions

Publication

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- IEEE Transactions on Audio Speech and Language Processing - Year 2006

Full text to download in external service

Time-scale modification of speech signals for supporting hearing impaired schoolchildren

Publication

- Year 2009

A study of time scale modification algorithmsapplied to hearing impaired schoolchildren supporting ispresented. Variety of algorithms are considered, namely:overlap and add, two variations of synchronized overlapand add, and the phase vocoder. Their effectiveness as wellas real-time processing capabilities are examined.

Speech formant frequency and pitch estimation using instantaneous complex frequency

Publication

M. [. Kaniewska

- Year 2008

W pracy opisany został algorytm estymacji częstotliwości podstawowej oraz częstotliwości środkowych i pasm formantów mowy z wykorzystaniem zespolonej pulsacji chwilowej. W artykule przedstawiono również wyniki działania algorytmu dla polskich samogłosek.

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

Publication

- Journal of the Acoustical Society of America - Year 2018

A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

Full text to download in external service

Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System

Publication

P. Falkowski-Gilski
G. Debita
M. Habrych
B. Miedziński
P. Jedlikowski
B. Polnik
J. Wandzio
X. Wang

- Year 2020

The broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...

Full text to download in external service

Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students

Publication

P. Falkowski-Gilski

- Year 2021

The user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...

Full text to download in external service

Modeling and Designing Acoustical Conditions of the Interior – Case Study

Publication

- Archives of Acoustics - Year 2016

The primary aim of this research study was to model acoustic conditions of the Courtyard of the Gdańsk University of Technology Main Building, and then to design a sound reinforcement system for this interior. First, results of measurements of the parameters of the acoustic field are presented. Then, the comparison between measured and predicted values using the ODEON program is shown. Collected data indicate a long reverberation...

Full text available to download

Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition

Publication

J. S. Garcia Salinas
A. A. Torres-García
C. A. Reyes-Garćia
L. Villaseñor-Pineda

- Biomedical Signal Processing and Control - Year 2023

Brain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....

Full text to download in external service

Combining visual and acoustic modalities to ease speech recognition by hearing impaired people

Publication

- Year 2005

Artykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...

Improving signal quality of a speech codec using hybrid perceptual-parametric algorithm

Publication

- International Journal of Intelligent Information and Database Systems - Year 2008

W artykule zaprezentowano hybrydową architekturę parametryczno-perceptualną kodeka mowy. Jego podstawę stanowi kodek CELP, który wspomagany jest kodekiem perceptualnym. Celem zastosowania proponowanej metody jest uzyskanie poprawy jakości kodowania sygnału mowy. Badaniom poddano dwie architektury, z których w jednej dźwięczne części sygnału rezydualnego kodeka CELP kodowane są perceptualnie. Drugi z proponowanych kodeków dokonuje...

Full text to download in external service

EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY

Publication

- Year 2014

The problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...

EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY

Publication

- Year 2014

The problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...

Multimodal human-computer interfaces based on advanced video and audio analysis

Publication

- Year 2013

Multimodal interfaces development history is reviewed briefly in the introduction. Examples of applications of multimodal interfaces to education software and for the disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and the audio interface for speech stretching for hearing impaired and stuttering people. The Smart...

Full text to download in external service

Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

Publication

D. Piotrowski
R. Korzeniowski
A. Falai
S. Cygert
K. Pokora
G. Tinchev
Z. Zhang
K. Yanagisawa

- Year 2023

In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

Full text to download in external service

Stochastic Integration and Long Term Predictor Estimation under Noisy Conditions for Speech Enhancement

Publication

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- Year 2005

Full text to download in external service

Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.

Publication

- Year 2018

In this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...

Full text to download in external service

PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS

Publication

- Year 2015

The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...

Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów

Publication

- Year 2017

The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

Performance Analysis of the OpenCL Environment on Mobile Platforms

Publication

- Year 2022

Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Full text to download in external service

Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine

Publication

P. Falkowski-Gilski
G. Debita

- Archives of Acoustics - Year 2023

In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

Full text available to download

Examining Feature Vector for Phoneme Recognition

Publication

G. Korvel
B. Kostek

- Year 2018

The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

Analysis of a caustic formed by a spherical reflector: Impact of a caustic on architectural acoustics

Publication

A. Kulowski

- APPLIED ACOUSTICS - Year 2020

Focusing sound in rooms intended for listening to music or speech is an acoustic defect. Design recommendations provide remedial steps to effectively prevent this. However, there is a category of objects of high historical or architectural value in which the sound focus correction is limited or even abandoned. This also applies to indoor or outdoor concert shells, installations for teaching and acoustic presentations, etc. The...

Full text available to download

Akustyczny obraz słowa na tle mowy etnicznej [The acoustic image of ethnic speech words]

Publication

K. Wojan

- Year 2002

WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE

Publication

- Year 2018

W niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...

Creating new voices using normalizing flows

Publication

P. Biliński
T. Merritt
A. Ezzerg
K. Pokora
S. Cygert
K. Yanagisawa
R. Barra-Chicote
D. Korzekwa

- Year 2022

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Full text available to download

Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

Publication

K. Kąkol

- Year 2023

The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

Full text available to download

Study Analysis of Transmission Efficiency in DAB+ Broadcasting System

Publication

P. Falkowski-Gilski

- Year 2018

DAB+ is a very innovative and universal multimedia broadcasting system. Thanks to its updated multimedia technologies and metadata options, digital radio keeps pace with changing consumer expectations and the impact of media convergence. Broadcasting analog and digital radio services does vary, concerning devices on both transmitting and receiving side, as well as content processing mechanisms. However, the biggest difference is...

Full text available to download

The development of speech in early childhood in children from twin pregnancies with twin-twin transfusion syndrome (TTTS)

Publication

M. Bidzan
Ł. Bieleninik
M. Lipowska

- Polish Psychological Bulletin - Year 2013

Full text to download in external service

Minimum mean square error estimation of speech short-term predictor parameters under noisy conditions

Publication

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- Year 2003

Full text to download in external service

Intelligent multimedia solutions supporting special education needs.

Publication

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011

The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

Intelligent video and audio applications for learning enhancement

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2011

The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

Full text available to download

Extracting concepts from the software requirements specification using natural language processing

Publication

- Year 2018

Extracting concepts from the software require¬ments is one of the first step on the way to automating the software development process. This task is difficult due to the ambiguity of the natural language used to express the requirements specification. The methods used so far consist mainly of statistical analysis of words and matching expressions with a specific ontology of the domain in which the planned software will be applicable....

Full text to download in external service

Strategie treningu neuronowego estymatora częstotliwości tonu krtaniowego z użyciem generatora syntetycznych samogłosek

Publication

- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Year 2022

W wielu zastosowaniach telekomunikacyjnych pojawia się problem przetwarzania lub analizy sygnału mowy, w ramach którego, często w obszarze podstawowych algorytmów, stosuje się estymator częstotliwości tonu krtaniowego. Estymator rozpatrywany w tej pracy bazuje na neuronowym klasyfikatorze podejmującym decyzje na podstawie częstotliwości oraz mocy chwilowej wyznaczanych w podpasmach analizowanego sygnału mowy. W pracy rozważamy...

Full text available to download

DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING

Publication

- Rocznik Naukowy Wydzialu Zarzadzania w Ciechanowie - Year 2017

The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

Full text available to download

Selection of Features for Multimodal Vocalic Segments Classification

Publication

- Year 2018

English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the inﬂuence of feature vector dimension reduction for the accuracy of vocalic segments classiﬁcation employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...

Full text to download in external service

A study on signal processing methods applied to hearing aids

Publication

- Year 2016

This paper presents a short survey on current technology available in hearing aids with a focus on digital signal processing techniques used. First, factors influencing the hearing aid effectiveness are introduced. Then, examples of the present DSP methods and strategies are provided. Also, a description of current limitations of hearing aids and future trends of development are shown. Finally, the notion of computational auditory...

System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych = System of speech signal processing and visualisation of the results

Publication

Z. Wojan
W. Lis
K. Wojan

- Year 2005

W artykule przedstawiono sposób przetwarzania i wizualizacji sygnału mowy w formie prostego w obsłudze i relatywnie niedrogiego urządzenia do nagrywania sygnału akustycznego oraz przetwarzania cyfrowego wyselekcjonowanych fragmentów i wizualizacji uzyskanych rezultatów przekształceń. Zastosowano do tego celu komputer z kartą dźwiękową. Przetwarzanie cyfrowe oraz wizualizacja dokonywana była w oparciu o program MATLAB bezpośrednio...

System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych [A system of speech signal processing and visualisation for linguistic purposes]

Publication

K. Wojan

- Year 2005

Cross-domain applications of multimodal human-computer interfaces

Publication

A. Czyżewski

- Year 2015

Developed multimodal interfaces for education applications and for disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and audio interface for speech stretching for hearing impaired and stuttering people and intelligent pen allowing for diagnosing and ameliorating developmental dyslexia. The eye-gaze tracking system named...

Digital Transformation of Terrestrial Radio: An Analysis of Simulcasted Broadcasts in FM and DAB+ for a Smart and Successful Switchover

Publication

P. Falkowski-Gilski

- Applied Sciences-Basel - Year 2021

The process of digitizing radio is far from over. It is an important interdisciplinary aspect, involving Big Data and AI (Artificial Intelligence) when it comes to classifying and handling content, and an organizational challenge in the Industry 4.0 concept. There exist several methods for delivering audio signals, including terrestrial broadcasting and internet streaming. Among them, the DAB+ (Digital Audio Broadcasting plus)...

Full text available to download

Multimedia industrial and medical applications supported by machine learning

Publication

A. Czyżewski

- Year 2023

This article outlines a keynote paper presented at the Intelligent DecisionTechnologies conference providing a part of the KES Multi-theme Conference “Smart Digital Futures” organized in Rome on June 14–16, 2023. It briefly discusses projects related to traffic control using developed intelligent traffic signs and diagnosing the health of wind turbine mechanisms and multimodal biometric authentication for banking branches to provide...

Full text to download in external service

Ultrawideband transmission in physical channels: a broadband interference view

Publication

- HYDROACOUSTICS - Year 2014

The superposition of multipath components (MPC) of an emitted wave, formed by reflections from limiting surfaces and obstacles in the propagation area, strongly affects communication signals. In the case of modern wideband systems, the effect should be seen as a broadband counterpart of classical interference which is the cause of fading in narrowband systems. This paper shows that in wideband communications, the time- and frequency-domain...

Full text available to download

A Novel Approach to the Assessment of Cough Incidence

Publication

- Year 2013

In this paper we consider the problem of identication of cough events in patients suffering from chronic respiratory diseases. The information about frequency of cough events is necessary to medical treatment. The proposed approach is based on bidirectional processing of a measured vibration signal - cough events are localized by combining the results of forward-time and backward-time analysis. The signal is at rst transformed...

Full text to download in external service

High quality speech coding using combined parametric and perceptual modules. [Kodowanie sygnału mowy z zachowaniem wysokiej jakości przy wykorzystaniu modułu parametrycznego i perceptualnego]

Publication

- Transaction on Engineering, Computation and Technology - Year 2006

W komunikacie zaprezentowano nową metodę hybrydowego kodowania sygnału mowy. Techniki kodowania parametrycznego oraz perceptualnego zostały wykorzystane w celu zapewnienia wysokiej jakości kodowania sygnału mowy. Przedstawiono wyniki badań dla dwóch architektur kodeka. Jedna z nich bazuje na algorytmie pozwalajacym wyodrębnić składowe dźwięczne, bezdźwięczne oraz transjenty. Składowe dźwięczne kodowane są metodą perceptualną, bezdźwięczne...

Full text to download in external service

Improving signal quality in speech codec using hybrid perceptual-parametric algorithm. [Poprawa jakości sygnału w kodekach mowy przy użyciu hybrydowego, parametryczno-perceptualnego algorytmu kodowania]

Publication

- Year 2006

Przedstawiono hybrydową, parametryczno-perceptualną architekturę kodeka. Podstawowa struktura kodeka parametrycznego CELP została wzbogacona o kodowanie perceptualne. Celem hybrydyzacji kodeka jest uzyskanie znaczącej poprawy subiektywnej jakości zdekodowanego sygnału. Zaproponowano dwie hybrydowe struktury. Pierwsza polega na perceptualnym kodowaniu dźwięcznych elementów sygnału rezydualnego kodeka CELP. Druga metoda dzieli sygnał...

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

Publication

D. Korzekwa
R. Barra-Chicote
S. Zaporowski
G. Beringer
J. Lorenzo-trueba
A. Serafinowicz
J. Droppo
T. Drugman
B. Kostek

- Year 2021

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

Full text available to download

Search

Filters

Catalog

Category

Year

Options

Search results for: speech analysis