Filtry
wszystkich: 6514
-
Katalog
- Publikacje 4445 wyników po odfiltrowaniu
- Czasopisma 499 wyników po odfiltrowaniu
- Konferencje 250 wyników po odfiltrowaniu
- Osoby 297 wyników po odfiltrowaniu
- Wynalazki 1 wyników po odfiltrowaniu
- Projekty 20 wyników po odfiltrowaniu
- Zespoły Badawcze 1 wyników po odfiltrowaniu
- Aparatura Badawcza 2 wyników po odfiltrowaniu
- Kursy Online 218 wyników po odfiltrowaniu
- Wydarzenia 7 wyników po odfiltrowaniu
- Dane Badawcze 774 wyników po odfiltrowaniu
wyświetlamy 1000 najlepszych wyników Pomoc
Wyniki wyszukiwania dla: SPEECH REINFORMENT SYSTEMS
-
Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions
PublikacjaThe aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...
-
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
PublikacjaObjective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...
-
Database of speech and facial expressions recorded with optimized face motion capture settings
PublikacjaThe broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...
-
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
PublikacjaThe speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...
-
Corrupted speech intelligibility improvement using adaptive filter based algorithm
PublikacjaA technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.
-
Jarosław Sadowski dr hab. inż.
Osoby -
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
PublikacjaThe main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...
-
A non-uniform real-time speech time-scale stretching method
PublikacjaAn algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...
-
Emotions in polish speech recordings
Dane BadawczeThe data set presents emotions recorded in sound files that are expressions of Polish speech. Statements were made by people aged 21-23, young voices of 5 men. Each person said the following words / nie – no, oddaj - give back, podaj – pass, stop - stop, tak - yes, trzymaj -hold / five times representing a specific emotion - one of three - anger (a),...
-
Intelligent processing of stuttered speech.
PublikacjaW artykule zaprezentowano kilka metod analizy i automatycznego zliczania potknięć artykulacyjnych, związanych z jąkaniem się, opartych na wykorzystaniu algorytmów uczących się sztucznych sieci neuronowych i zbiorów przybliżonych.
-
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
PublikacjaIn this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...
-
Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System
PublikacjaAlthough there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...
-
Communication Platform for Evaluation of Transmitted Speech Quality
PublikacjaA voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...
-
Transfer learning in imagined speech EEG-based BCIs
PublikacjaThe Brain–Computer Interfaces (BCI) based on electroencephalograms (EEG) are systems which aim is to provide a communication channel to any person with a computer, initially it was proposed to aid people with disabilities, but actually wider applications have been proposed. These devices allow to send messages or to control devices using the brain signals. There are different neuro-paradigms which evoke brain signals of interest...
-
Results of tests on speech intelligibility in reverberant conditions
Dane BadawczeThe dataset contains the results of tests that aimed to provide a relationship between the rate of speech (RoS) and reverberation conditions characterized by the Speech Transmission Index (STI).
-
Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency
PublikacjaIn this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.
-
Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency
PublikacjaIn this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.
-
Automated detection of pronunciation errors in non-native English speech employing deep learning
PublikacjaDespite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...
-
PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS
PublikacjaThe quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...
-
Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech
PublikacjaWe propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...
-
Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich
PublikacjaThe article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...
-
Visual Lip Contour Detection for the Purpose of Speech Recognition
PublikacjaA method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...
-
Objectivization of phonological evaluation of speech elements by means of audio parametrization
PublikacjaThis study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...
-
TELECOMMUNICATION SYSTEMS
Czasopisma -
Human-computer interactions in speech therapy using a blowing interface
PublikacjaIn this paper we present a new human-computer interface for the quantitative measurement of blowing activities. The interface can measure the air flow and air pressure during the blowing activity. The measured values are stored and used to control the state of the graphical objects in the graphical user interface. In speech therapy children will find easier to play attractive therapeutic games than to perform repetitive and tedious,...
-
Piotr Szczuko dr hab. inż.
OsobyDr hab. inż. Piotr Szczuko w 2002 roku ukończył studia na Wydziale Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej zdobywając tytuł magistra inżyniera. Tematem pracy dyplomowej było badanie zjawisk jednoczesnej percepcji obrazu cyfrowego i dźwięku dookólnego. W roku 2008 obronił rozprawę doktorską zatytułowaną "Zastosowanie reguł rozmytych w komputerowej animacji postaci", za którą otrzymał nagrodę Prezesa Rady...
-
Speech and Drama
Czasopisma -
LANGUAGE AND SPEECH
Czasopisma -
Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets
PublikacjaArtificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...
-
Estimation of the short-term predictor parameters of speech under noisy conditions
Publikacja -
Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning
PublikacjaThe Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...
-
Estimation of the excitation variances of speech and noise AR-models for enhanced speech coding
Publikacja -
Bimodal classification of English allophones employing acoustic speech signal and facial motion capture
PublikacjaA method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...
-
Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System
PublikacjaThe broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...
-
Noise profiling for speech enhancement employing machine learning models
PublikacjaThis paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...
-
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
PublikacjaBrain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....
-
Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
PublikacjaIn order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...
-
Rafał Leszczyna dr hab. inż.
OsobyDr hab. inż. Rafał Leszczyna jest profesorem uczelni na Wydziale Zarządzania i Ekonomii Politechniki Gdańskiej. W lipcu 2020 r., na podstawie osiągnięcia naukowego w obszarze zarządzania cyberbezpieczeństwem infrastruktur krytycznych w sektorze elektroenergetycznym, uzyskał stopień doktora habilitowanego w dziedzinie nauk inżynieryjno-technicznych, dyscyplina informatyka techniczna i telekomunikacja. W latach 2004–2008 pracował...
-
Comparison of Language Models Trained on Written Texts and Speech Transcripts in the Context of Automatic Speech Recognition
Publikacja -
Tomasz Zubowicz dr inż.
OsobyTomasz Zubowicz has received his M.Sc. Eng. degree in Control Engineering from the Faculty of Electrical and Control Engineering at the Gda{\'n}sk University of Technology (GUT) in $2008$. He received his Ph.D. Eng. (Hons.) in the field of Control Engineering from the same faculty in $2019$. In $2012$ he became a permanent staff member at the Department of Intelligent Control and Decision Support Systems at GUT and a member of...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublikacjaThe problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublikacjaThe problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech
PublikacjaIn this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...
-
Magdalena Gajewska prof. dr hab. inż.
OsobyMagdalena Gajewska (ur. 1.06.1968 r. w Gdańsku) ukończyła studia w 1993 roku na Wydziale Hydrotechniki Politechniki Gdańskiej. Jest adiunktem w Katedrze Technologii Wody i Ścieków na Wydziale Inżynierii Lądowej i Środowiska Politechniki Gdańskiej. Doktorat (2001) i habilitacja (2013) w dyscyplinie inżynierii środowiska. W kadencji 2016–2020 pełni funkcję prodziekana ds. nauki. Specjalizuję się w technologiach związanych z ekoinżynierią:...
-
SYSTEMS SCIENCE
Czasopisma -
A Comparison of STI Measured by Direct and Indirect Methods for Interiors Coupled with Sound Reinforcement Systems
PublikacjaThis paper presents a comparison of STI (Speech Transmission Index) coefficient measurement results carried out by direct and indirect methods. First, acoustic parameters important in the context of public address and sound reinforcement systems are recalled. A measurement methodology is presented that employs various test signals to determine impulse responses. The process of evaluating sound system performance, signals enabling...
-
Novel approaches to wideband speech coding
PublikacjaDwie metoda kodowania szerokopasmowego mowy zostały zaprezentowane. W pierwszej metodzie wykorzystano algorytm kompresji i ekspansji czasowej sygnału mowy, pozwalający na kodowanie szerokopasmowe sygnału mowy z wykorzystaniem ustandaryzowanych kodeków. Metoda ta jest przewidziana do zastosowania w adaptacyjnych algorytmach kodowania mowy. Drugie z proponowanych rozwiazan dotyczy nowej metody estymacji obwiedni widma sygnalu mowy...
-
Integration of speech enhancement and coding techniques
Publikacja -
A system for multitask noisy speech enhancement.
PublikacjaW artykule przedstawiono ogolną charakterystyke opracowanego systemu rejestracji i rekonstrukcji mowy. Artykuł zawiera opis składników systemu, ktory jest oprogramowaniem zawierającym zaawansowane narzędzia służące poprawie zrozumiałości mowy. Zaimplementowane narzędzia systemu umożliwiają wyszukiwanie nagrań dźwiękowych i ich obróbkę przy pomocy zaimplementowanych pluginów. W artykule przedstawione wykorzystane w systemie algorytmy...
-
Multitask Noisy Speech Enhancement System
PublikacjaW referacie opisano Wielozadaniowy System Poprawy Jakości Sygnału Mowy. Jest to wyspecjalizowany pakiet oprogramowania przeznaczony do rejestrowania sygnału mowy i do poprawy jego jakości oraz zrozumiałości mowy, przy użyciu zaawansowanych procedur cyfrowego przetwarzania sygnału. Pakiet oprogramowania składa się z programów: Rejestrator, Przeglądarka oraz Rekonstruktor. Oprogramowanie to może być użyte w przypadkach, gdy zrozumiałość...