Wyniki wyszukiwania dla: SPEECH ANALYSIS

Auditory-model based robust feature selection for speech recognition

Publikacja

C. Koniaris
M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- Journal of the Acoustical Society of America - Rok 2010

Pełny tekst do pobrania w serwisie zewnętrznym

Real-time speech streching for supporting hearing impaired schoolchildren

Publikacja

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2010

A study of time scale modification algorithms applied to support hearing impaired schoolchildren is presented. Variety of algorithms are considered, namely: overlap-and add, two variations of synchronous overlapand- add, and the phase vocoder. Their effectiveness as well as real-time processing capabilities are examined.

Pełny tekst do pobrania w serwisie zewnętrznym

Elimination of clicks from archive speech signals using sparse autoregressive modeling

Publikacja

- Rok 2012

This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

Pełny tekst do pobrania w serwisie zewnętrznym

Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System

Publikacja

M. Zamłyńska
P. Falkowski-Gilski
G. Debita
B. Miedziński

- Rok 2021

Although there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...

Pełny tekst do pobrania w serwisie zewnętrznym

Automated detection of pronunciation errors in non-native English speech employing deep learning

Publikacja

D. Korzekwa

- Rok 2023

Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

Pełny tekst do pobrania w portalu

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

Publikacja

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
S. Calamaro
B. Kostek

- Rok 2021

We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

Pełny tekst do pobrania w portalu

The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish

Publikacja

S. Zaporowski

- Rok 2024

The article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...

Pełny tekst do pobrania w portalu

Estimation of the short-term predictor parameters of speech under noisy conditions

Publikacja

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- IEEE Transactions on Audio Speech and Language Processing - Rok 2006

Pełny tekst do pobrania w serwisie zewnętrznym

Towards Computer-Based Automated Screening of Dementia Through Spontaneous Speech

Publikacja

K. Chlasta
K. Wołk

- Frontiers in Psychology - Rok 2021

Pełny tekst do pobrania w serwisie zewnętrznym

Speech formant frequency and pitch estimation using instantaneous complex frequency

Publikacja

M. [. Kaniewska

- Rok 2008

W pracy opisany został algorytm estymacji częstotliwości podstawowej oraz częstotliwości środkowych i pasm formantów mowy z wykorzystaniem zespolonej pulsacji chwilowej. W artykule przedstawiono również wyniki działania algorytmu dla polskich samogłosek.

Time-scale modification of speech signals for supporting hearing impaired schoolchildren

Publikacja

- Rok 2009

A study of time scale modification algorithmsapplied to hearing impaired schoolchildren supporting ispresented. Variety of algorithms are considered, namely:overlap and add, two variations of synchronized overlapand add, and the phase vocoder. Their effectiveness as wellas real-time processing capabilities are examined.

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

Publikacja

- Journal of the Acoustical Society of America - Rok 2018

A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

Pełny tekst do pobrania w serwisie zewnętrznym

Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System

Publikacja

P. Falkowski-Gilski
G. Debita
M. Habrych
B. Miedziński
P. Jedlikowski
B. Polnik
J. Wandzio
X. Wang

- Rok 2020

The broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...

Pełny tekst do pobrania w serwisie zewnętrznym

Modeling and Designing Acoustical Conditions of the Interior – Case Study

Publikacja

- Archives of Acoustics - Rok 2016

The primary aim of this research study was to model acoustic conditions of the Courtyard of the Gdańsk University of Technology Main Building, and then to design a sound reinforcement system for this interior. First, results of measurements of the parameters of the acoustic field are presented. Then, the comparison between measured and predicted values using the ODEON program is shown. Collected data indicate a long reverberation...

Pełny tekst do pobrania w portalu

Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students

Publikacja

P. Falkowski-Gilski

- Rok 2021

The user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...

Pełny tekst do pobrania w serwisie zewnętrznym

Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition

Publikacja

J. S. Garcia Salinas
A. A. Torres-García
C. A. Reyes-Garćia
L. Villaseñor-Pineda

- Biomedical Signal Processing and Control - Rok 2023

Brain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....

Pełny tekst do pobrania w portalu

Automated speech-based screening of depression using deep convolutional neural networks

Publikacja

K. Chlasta
K. Wołk
I. Krejtz

- Procedia Computer Science - Rok 2019

Pełny tekst do pobrania w serwisie zewnętrznym

Improving signal quality of a speech codec using hybrid perceptual-parametric algorithm

Publikacja

- International Journal of Intelligent Information and Database Systems - Rok 2008

W artykule zaprezentowano hybrydową architekturę parametryczno-perceptualną kodeka mowy. Jego podstawę stanowi kodek CELP, który wspomagany jest kodekiem perceptualnym. Celem zastosowania proponowanej metody jest uzyskanie poprawy jakości kodowania sygnału mowy. Badaniom poddano dwie architektury, z których w jednej dźwięczne części sygnału rezydualnego kodeka CELP kodowane są perceptualnie. Drugi z proponowanych kodeków dokonuje...

Pełny tekst do pobrania w serwisie zewnętrznym

Combining visual and acoustic modalities to ease speech recognition by hearing impaired people

Publikacja

- Rok 2005

Artykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...

EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY

Publikacja

- Rok 2014

The problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...

EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY

Publikacja

- Rok 2014

The problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...

Multimodal human-computer interfaces based on advanced video and audio analysis

Publikacja

- Rok 2013

Multimodal interfaces development history is reviewed briefly in the introduction. Examples of applications of multimodal interfaces to education software and for the disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and the audio interface for speech stretching for hearing impaired and stuttering people. The Smart...

Pełny tekst do pobrania w serwisie zewnętrznym

Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

Publikacja

D. Piotrowski
R. Korzeniowski
A. Falai
S. Cygert
K. Pokora
G. Tinchev
Z. Zhang
K. Yanagisawa

- Rok 2023

In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

Pełny tekst do pobrania w serwisie zewnętrznym

Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.

Publikacja

- Rok 2018

In this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...

Pełny tekst do pobrania w serwisie zewnętrznym

Stochastic Integration and Long Term Predictor Estimation under Noisy Conditions for Speech Enhancement

Publikacja

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- Rok 2005

Pełny tekst do pobrania w serwisie zewnętrznym

Performance Analysis of the OpenCL Environment on Mobile Platforms

Publikacja

- Rok 2022

Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Pełny tekst do pobrania w serwisie zewnętrznym

PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS

Publikacja

- Rok 2015

The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...

Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów

Publikacja

- Rok 2017

The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine

Publikacja

P. Falkowski-Gilski
G. Debita

- Archives of Acoustics - Rok 2023

In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

Pełny tekst do pobrania w portalu

Examining Feature Vector for Phoneme Recognition

Publikacja

G. Korvel
B. Kostek

- Rok 2018

The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

Analysis of a caustic formed by a spherical reflector: Impact of a caustic on architectural acoustics

Publikacja

A. Kulowski

- APPLIED ACOUSTICS - Rok 2020

Focusing sound in rooms intended for listening to music or speech is an acoustic defect. Design recommendations provide remedial steps to effectively prevent this. However, there is a category of objects of high historical or architectural value in which the sound focus correction is limited or even abandoned. This also applies to indoor or outdoor concert shells, installations for teaching and acoustic presentations, etc. The...

Pełny tekst do pobrania w portalu

Akustyczny obraz słowa na tle mowy etnicznej [The acoustic image of ethnic speech words]

Publikacja

K. Wojan

- Rok 2002

WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE

Publikacja

- Rok 2018

W niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...

Creating new voices using normalizing flows

Publikacja

P. Biliński
T. Merritt
A. Ezzerg
K. Pokora
S. Cygert
K. Yanagisawa
R. Barra-Chicote
D. Korzekwa

- Rok 2022

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Pełny tekst do pobrania w portalu

Study Analysis of Transmission Efficiency in DAB+ Broadcasting System

Publikacja

P. Falkowski-Gilski

- Rok 2018

DAB+ is a very innovative and universal multimedia broadcasting system. Thanks to its updated multimedia technologies and metadata options, digital radio keeps pace with changing consumer expectations and the impact of media convergence. Broadcasting analog and digital radio services does vary, concerning devices on both transmitting and receiving side, as well as content processing mechanisms. However, the biggest difference is...

Pełny tekst do pobrania w portalu

Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

Publikacja

K. Kąkol

- Rok 2023

The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

Pełny tekst do pobrania w portalu

The development of speech in early childhood in children from twin pregnancies with twin-twin transfusion syndrome (TTTS)

Publikacja

M. Bidzan
Ł. Bieleninik
M. Lipowska

- Polish Psychological Bulletin - Rok 2013

Pełny tekst do pobrania w serwisie zewnętrznym

Minimum mean square error estimation of speech short-term predictor parameters under noisy conditions

Publikacja

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- Rok 2003

Pełny tekst do pobrania w serwisie zewnętrznym

Intelligent multimedia solutions supporting special education needs.

Publikacja

- LECTURE NOTES IN COMPUTER SCIENCE - Rok 2011

The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

Intelligent video and audio applications for learning enhancement

Publikacja

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Rok 2011

The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

Pełny tekst do pobrania w portalu

Separation of Simultaneous Speakers with Acoustic Vector Sensor

Publikacja

- SENSORS - Rok 2025

This paper presents a method of sound source separation in live audio signals, based on sound intensity analysis. Sound pressure signals recorded with an acoustic vector sensor are analyzed, and the spectral distribution of sound intensity in two dimensions is calculated. Spectral components of the analyzed signal are selected based on the calculated source direction, which leads to a spatial filtration of the sound. The experiments...

Pełny tekst do pobrania w serwisie zewnętrznym

Extracting concepts from the software requirements specification using natural language processing

Publikacja

- Rok 2018

Extracting concepts from the software require¬ments is one of the first step on the way to automating the software development process. This task is difficult due to the ambiguity of the natural language used to express the requirements specification. The methods used so far consist mainly of statistical analysis of words and matching expressions with a specific ontology of the domain in which the planned software will be applicable....

Pełny tekst do pobrania w serwisie zewnętrznym

Strategie treningu neuronowego estymatora częstotliwości tonu krtaniowego z użyciem generatora syntetycznych samogłosek

Publikacja

- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2022

W wielu zastosowaniach telekomunikacyjnych pojawia się problem przetwarzania lub analizy sygnału mowy, w ramach którego, często w obszarze podstawowych algorytmów, stosuje się estymator częstotliwości tonu krtaniowego. Estymator rozpatrywany w tej pracy bazuje na neuronowym klasyfikatorze podejmującym decyzje na podstawie częstotliwości oraz mocy chwilowej wyznaczanych w podpasmach analizowanego sygnału mowy. W pracy rozważamy...

Pełny tekst do pobrania w portalu

DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING

Publikacja

- Rocznik Naukowy Wydzialu Zarzadzania w Ciechanowie - Rok 2017

The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

Pełny tekst do pobrania w portalu

Selection of Features for Multimodal Vocalic Segments Classification

Publikacja

- Rok 2018

English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the inﬂuence of feature vector dimension reduction for the accuracy of vocalic segments classiﬁcation employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...

Pełny tekst do pobrania w serwisie zewnętrznym

A study on signal processing methods applied to hearing aids

Publikacja

- Rok 2016

This paper presents a short survey on current technology available in hearing aids with a focus on digital signal processing techniques used. First, factors influencing the hearing aid effectiveness are introduced. Then, examples of the present DSP methods and strategies are provided. Also, a description of current limitations of hearing aids and future trends of development are shown. Finally, the notion of computational auditory...

Automated Text Annotation Using Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Publikacja

S. Saifullah
R. Dreżewski
F. Dwiyanto
A. Aribowo
Y. Fauziah
N. Cahyana

- Rok 2023

Pełny tekst do pobrania w serwisie zewnętrznym

Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Publikacja

S. Saifullah
R. Dreżewski
F. Dwiyanto
A. Aribowo
Y. Fauziah
N. Cahyana

- Applied Sciences-Basel - Rok 2024

Pełny tekst do pobrania w serwisie zewnętrznym

System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych = System of speech signal processing and visualisation of the results

Publikacja

Z. Wojan
W. Lis
K. Wojan

- Rok 2005

W artykule przedstawiono sposób przetwarzania i wizualizacji sygnału mowy w formie prostego w obsłudze i relatywnie niedrogiego urządzenia do nagrywania sygnału akustycznego oraz przetwarzania cyfrowego wyselekcjonowanych fragmentów i wizualizacji uzyskanych rezultatów przekształceń. Zastosowano do tego celu komputer z kartą dźwiękową. Przetwarzanie cyfrowe oraz wizualizacja dokonywana była w oparciu o program MATLAB bezpośrednio...

Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency

Publikacja

N. Cahyana
S. Saifullah
Y. Fauziah
A. Aribowo
R. Drezewski

- International Journal of Advanced Computer Science and Applications - Rok 2022

Pełny tekst do pobrania w serwisie zewnętrznym

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: SPEECH ANALYSIS