Wyniki wyszukiwania dla: SPEECH RECOGNITION, SPEECH ANALYSIS, PHONEME, ALLOPHONE.

Wyniki wyszukiwania dla: SPEECH RECOGNITION, SPEECH ANALYSIS, PHONEME, ALLOPHONE.

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 569

wyczyść wszystkie filtry niedostępne

Stochastic Integration and Long Term Predictor Estimation under Noisy Conditions for Speech Enhancement
Publikacja
- M. Kuropatwinski
- W. Kleijn
- M. Kuropatwiński
- Rok 2005
Pełny tekst do pobrania w serwisie zewnętrznym
POPRAWA OBIEKTYWNYCH WSKAŹNIKÓW JAKOŚCI MOWY W WARUNKACH HAŁASU
Publikacja
- K. Kąkol
- B. Kostek
- Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej - Rok 2018
Celem pracy jest modyfikacja sygnału mowy, aby uzyskać zwiększenie poprawy obiektywnych wskaźników jakości mowy po zmiksowaniu sygnału użytecznego z szumem bądź z sygnałem zakłócającym. Wykonane modyfikacje sygnału bazują na cechach mowy lombardzkiej, a w szczególności na efekcie podniesienia częstotliwości podstawowej F0. Sesja nagraniowa obejmowała zestawy słów i zdań w języku polskim, nagrane w warunkach ciszy, jak również w...

Pełny tekst do pobrania w portalu
Akustyczny obraz słowa na tle mowy etnicznej [The acoustic image of ethnic speech words]
Publikacja
- K. Wojan
- Rok 2002
Canadian Journal of Speech-Language Pathology and Audiology

Czasopisma

ISSN: 1913-2018
International Journal on Document Analysis and Recognition

Czasopisma

ISSN: 1433-2833 , eISSN: 1433-2825
The development of speech in early childhood in children from twin pregnancies with twin-twin transfusion syndrome (TTTS)
Publikacja
- M. Bidzan
- Ł. Bieleninik
- M. Lipowska
- Polish Psychological Bulletin - Rok 2013
Pełny tekst do pobrania w serwisie zewnętrznym
Minimum mean square error estimation of speech short-term predictor parameters under noisy conditions
Publikacja
- M. Kuropatwinski
- W. Kleijn
- M. Kuropatwiński
- Rok 2003
Pełny tekst do pobrania w serwisie zewnętrznym
Jan Daciuk dr hab. inż.

Osoby

Katedra Inteligentnych Systemów Interaktywnych

Jan Daciuk uzyskał tytuł zawodowy magistra na Wydziale Elektroniki Politechniki Gdańskiej w 1986 roku, a doktorat na wydziale Elektroniki, Telekomunikacji i Informatyki PG w 1999. Pracuje na Wydziale od 1988 roku. Jego zainteresowania naukowe obejmują zastosowania automatów skończonych w przetwarzaniu języka naturalnego i przetwarzaniu mowy. Spędził ponad cztery lata w europejskich uniwersytetach i instytutach naukowych, takich...
Introduction to the special issue on machine learning in acoustics
Publikacja
- Z. Michalopoulou
- P. Gerstoft
- B. Kostek
- M. A. Roch
- Journal of the Acoustical Society of America - Rok 2021
When we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...

Pełny tekst do pobrania w portalu
SPEECH COMMUNICATION

Czasopisma

ISSN: 0167-6393 , eISSN: 1872-7182
Advances in Speech-Language Pathology (correct title: IJSLP)

Czasopisma

ISSN: 1441-7049
IEEE-ACM Transactions on Audio Speech and Language Processing

Czasopisma

ISSN: 2329-9290
International Journal of Speech, Language and the Law: Forensic Linguistics

Czasopisma

ISSN: 1748-8885 , eISSN: 1748-8893
IEEE Automatic Speech Recognition and Understanding Workshop

Konferencje
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych = System of speech signal processing and visualisation of the results
Publikacja
- Z. Wojan
- W. Lis
- K. Wojan
- Rok 2005
W artykule przedstawiono sposób przetwarzania i wizualizacji sygnału mowy w formie prostego w obsłudze i relatywnie niedrogiego urządzenia do nagrywania sygnału akustycznego oraz przetwarzania cyfrowego wyselekcjonowanych fragmentów i wizualizacji uzyskanych rezultatów przekształceń. Zastosowano do tego celu komputer z kartą dźwiękową. Przetwarzanie cyfrowe oraz wizualizacja dokonywana była w oparciu o program MATLAB bezpośrednio...
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych [A system of speech signal processing and visualisation for linguistic purposes]
Publikacja
- K. Wojan
- Rok 2005
Multidimensional Scaling Analysis Applied to Music Mood Recognition
Publikacja
- B. Kostek
- M. Piotrowska
- Rok 2013
The paper presents two experiments aimed at categorizing mood associated with music. Two parts of a listening test were designed and carried out with a group of students, most of whom where users of online social music services. The initial experiment was designed to evaluate the extent to which a given label describes the mood of the particular music excerpt. The second subjective test was conducted to collect the similarity data...
ISCA Tutorial and Research Workshop Automatic Speech Recognition

Konferencje
Emotion Recognition - the need for a complete analysis of the phenomenon of expression formation
Publikacja
- K. Bobkowska
- M. Przyborski
- D. Skorupka
- E3S Web of Conferences - Rok 2018
This article shows how complex emotions are. This has been proven by the analysis of the changes that occur on the face. The authors present the problem of image analysis for the purpose of identifying emotions. In addition, they point out the importance of recording the phenomenon of the development of emotions on the human face with the use of high-speed cameras, which allows the detection of micro expression. The work that was...

Pełny tekst do pobrania w portalu
Ontological Modeling for Contextual Data Describing Signals Obtained from Electrodermal Activity for Emotion Recognition and Analysis
Publikacja
- IEEE Access - Rok 2023
Most of the research in the field of emotion recognition is based on datasets that contain data obtained during affective computing experiments. However, each dataset is described by different metadata, stored in various structures and formats. This research can be counted among those whose aim is to provide a structural and semantic pattern for affective computing datasets, which is an important step to solve the problem of data...

Pełny tekst do pobrania w portalu
Recognition of two-phase flow patterns with the use of dynamic image analysis
Publikacja
- R. Ulbrich
- M. Krótkiewicz
- N. Szmolke
- S. Anweiler
- M. Masiukiewicz
- D. Zajac
- PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART E-JOURNAL OF PROCESS MECHANICAL ENGINEERING - Rok 2002
Pełny tekst do pobrania w serwisie zewnętrznym
On the possibility of molecular recognition of taste substances studied by Gabor analysis of oscillations
Publikacja
- M. Szpakowska
- E. Płocharska-Jankowska
- S. Matefi-Tempfli
- O. B.nagy
- Rok 2005
Badano wpływ substancji odpowiedzialnych za wrażenia smakowe należące do czterech klas smaku (słodki, słony, gorzki i kwaśny) na charakterystyki oscylacyjne w nitrometanowym oscylatorze zawierającym kationowy surfaktant: chlorek benzylodimetylotetradecyloamoniowy. Zapropnowano nowe podejście oparte o transformację Gabora za pomocą której otrzymano widma mocy dla posczególnych układów. Wykazano, że dwuwymiarowa forma tych widm może...
EURASIP Journal on Audio Speech and Music Processing

Czasopisma

ISSN: 1687-4714 , eISSN: 1687-4722
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
Publikacja
- D. Korzekwa
- R. Barra-Chicote
- S. Zaporowski
- G. Beringer
- J. Lorenzo-trueba
- A. Serafinowicz
- J. Droppo
- T. Drugman
- B. Kostek
- Rok 2021
This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

Pełny tekst do pobrania w portalu
Ontological Model for Contextual Data Defining Time Series for Emotion Recognition and Analysis
Publikacja
- T. Zawadzka
- W. Waloszek
- A. Karpus
- S. Zapalowska
- M. Wróbel
- IEEE Access - Rok 2021
One of the major challenges facing the field of Affective Computing is the reusability of datasets. Existing affective-related datasets are not consistent with each other, they store a variety of information in different forms, different formats, and the terms used to describe them are not unified. This paper proposes a new ontology, ROAD, as a solution to this problem, by formally describing the datasets and unifying the terms...

Pełny tekst do pobrania w portalu
Investigating Feature Spaces for Isolated Word Recognition
Publikacja
- G. Korvel
- G. Tamulevicus
- P. Treigys
- J. Bernataviciene
- B. Kostek
- Rok 2018
Much attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...
Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor
Publikacja
- Rok 2015
Spatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...

Pełny tekst do pobrania w serwisie zewnętrznym
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
Publikacja
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2016
W referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
Australasian Speech Science and Technology

Konferencje
IEEE Workshop on Speech Coding

Konferencje
High quality speech coding using combined parametric and perceptual modules. [Kodowanie sygnału mowy z zachowaniem wysokiej jakości przy wykorzystaniu modułu parametrycznego i perceptualnego]
Publikacja
- Transaction on Engineering, Computation and Technology - Rok 2006
W komunikacie zaprezentowano nową metodę hybrydowego kodowania sygnału mowy. Techniki kodowania parametrycznego oraz perceptualnego zostały wykorzystane w celu zapewnienia wysokiej jakości kodowania sygnału mowy. Przedstawiono wyniki badań dla dwóch architektur kodeka. Jedna z nich bazuje na algorytmie pozwalajacym wyodrębnić składowe dźwięczne, bezdźwięczne oraz transjenty. Składowe dźwięczne kodowane są metodą perceptualną, bezdźwięczne...

Pełny tekst do pobrania w serwisie zewnętrznym
Improving signal quality in speech codec using hybrid perceptual-parametric algorithm. [Poprawa jakości sygnału w kodekach mowy przy użyciu hybrydowego, parametryczno-perceptualnego algorytmu kodowania]
Publikacja
- Rok 2006
Przedstawiono hybrydową, parametryczno-perceptualną architekturę kodeka. Podstawowa struktura kodeka parametrycznego CELP została wzbogacona o kodowanie perceptualne. Celem hybrydyzacji kodeka jest uzyskanie znaczącej poprawy subiektywnej jakości zdekodowanego sygnału. Zaproponowano dwie hybrydowe struktury. Pierwsza polega na perceptualnym kodowaniu dźwięcznych elementów sygnału rezydualnego kodeka CELP. Druga metoda dzieli sygnał...
Biometria i przetwarzanie mowy 2023
Kursy Online
- J. Daciuk
{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
Biometria i przetwarzanie mowy 2024
Kursy Online
- J. Daciuk
{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
New approach for determining the QoS of MP3-coded voice signals in IP networks
Publikacja
- T. Uhl
- S. Paulsen
- K. Nowicki
- EURASIP Journal on Audio Speech and Music Processing - Rok 2017
Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion that the quality of service (QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines an innovative, simple model that can evaluate the QoS of MP3-coded voice data transported...

Pełny tekst do pobrania w portalu
Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results
Publikacja
- G. Korvel
- O. Kurasova
- B. Kostek
- Archives of Acoustics - Rok 2019
The goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are...

Pełny tekst do pobrania w portalu
Investigating Feature Spaces for Isolated Word Recognition
Publikacja
- P. Treigys
- G. Korvel
- G. Tamulevicius
- J. Bernataviciene
- B. Kostek
- Rok 2020
The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

Pełny tekst do pobrania w serwisie zewnętrznym
Comparative analysis of various transformation techniques for voiceless consonants modeling
Publikacja
- G. Korvel
- B. Kostek
- O. Kurasova
- International Journal of Computers Communications & Control - Rok 2018
In this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) are performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation....

Pełny tekst do pobrania w portalu
Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System
Publikacja
- G. Korvel
- B. Kostek
- Archives of Acoustics - Rok 2017
A voiceless stop consonant phoneme modelling and synthesis framework based on a phoneme modelling in low-frequency range and high-frequency range separately is proposed. The phoneme signal is decomposed into the sums of simpler basic components and described as the output of a linear multiple-input and single-output (MISO) system. The impulse response of each channel is a third order quasi-polynomial. Using this framework, the...

Pełny tekst do pobrania w portalu
Badanie rozkładów parametrów sygnału mowy w zastosowaniach do prognozowania prawdopodobieństwa popełnienia błędów w systemach identyfikacji mówców = Examining distribution of speech signal parameters for the prognosis of error probability in speaker verification systems
Publikacja
- A. Kaczmarek
- Rok 2010
Przedmiotem pracy jest system identyfikacji mówców w sposób zależny od tekstu ("text dependent''). Dokonano analizy wielu różnych wypowiedzi kilkudziesięciu mówców. Zastosowana metoda parametryzacji to metoda oparta na wynikach analizy cepstralnej sygnału mowy. Zdefiniowane zostały nowe parametry skojarzone z elementarnymi zdarzeniami w procesie weryfikacji mówców. Na tej podstawie dokonano estymacji funkcji gęstości prawdopodobieństwa...
Selection of Features for Multimodal Vocalic Segments Classification
Publikacja
- S. Zaporowski
- A. Czyżewski
- Rok 2018
English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the inﬂuence of feature vector dimension reduction for the accuracy of vocalic segments classiﬁcation employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...

Pełny tekst do pobrania w serwisie zewnętrznym
AUTOMATYCZNA KLASYFIKACJA MOWY PATOLOGICZNEJ
Publikacja
- M. Włoszczyńska
- B. Kostek
- Rok 2023
Aplikacja przedstawiona w niniejszym rozdziale służy do automatycznego wykrywania mowy patologicznej na podstawie bazy nagrań. W pierwszej kolejności przedstawiono założenia leżące u podstaw przeprowadzonych badan wraz z wyborem bazy mowy patologicznej. Zaprezentowano również zastosowane algorytmy oraz cechy sygnału mowy, które pozwalają odróżnić mowę niezaburzoną od mowy patologicznej. Wytrenowane sieci neuronowe zostały następnie...

Pełny tekst do pobrania w serwisie zewnętrznym
IEEE International Conference on Acoustics, Speech and Signal Processing

Konferencje
Vocalic Segments Classification Assisted by Mouth Motion Capture
Publikacja
- Rok 2018
Visual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...

Pełny tekst do pobrania w serwisie zewnętrznym
Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization
Publikacja
- B. Kostek
- M. Piotrowska
- T. Ciszewski
- A. Czyżewski
- Rok 2017
An allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted...
Discriminating macromolecular interactions based on an impedimetric fingerprint supported by multivariate data analysis for rapid and label-free Escherichia coli recognition in human urine
Publikacja
- A. Koterwa
- M. Pierpaoli
- B. Nejman-Faleńczyk
- S. Bloch
- A. Zieliński
- W. Adamus-Białek
- Z. Jeleniewska
- B. Trzaskowski
- R. Bogdanowicz
- G. Węgrzyn... i 2 innych
- BIOSENSORS & BIOELECTRONICS - Rok 2023
This manuscript presents a novel approach to address the challenges of electrode fouling and highly complex electrode nanoarchitecture, which are primary concerns for biosensors operating in real environments. The proposed approach utilizes multiparametric impedance discriminant analysis (MIDA) to obtain a fingerprint of the macromolecular interactions on flat glassy carbon surfaces, achieved through self-organized, drop-cast,...

Pełny tekst do pobrania w portalu
Metoda i algorytmy modyfikacji sygnału do celu wspomagania rozumienia mowy przez osoby z pogorszoną rozdzielczością czasową słuchu
Publikacja
- A. Kupryjanow
- Rok 2013
Przedmiotem badań przeprowadzonych w ramach rozprawy są metody modyfikacji czasu trwania sygnału (ang. Time Scale Modification –TSM) mowy operujące w czasie rzeczywistym oraz ocena ich wpływu na rozumienie wypowiedzi przez osoby z pogorszoną rozdzielczością czasową słuchu. Pogorszona rozdzielczość słuchu jest jednym z symptomów związanych z ośrodkowymi zaburzeniami słuchu (ang. Cetnral Auditory Processing Disorder – CAPD). W odróżnieniu...
A comparative study of English viseme recognition methods and algorithms
Publikacja
- MULTIMEDIA TOOLS AND APPLICATIONS - Rok 2018
An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

Pełny tekst do pobrania w portalu
A comparative study of English viseme recognition methods and algorithm
Publikacja
- D. Jachimski
- A. Czyżewski
- MULTIMEDIA TOOLS AND APPLICATIONS - Rok 2018
An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

Pełny tekst do pobrania w portalu
Creating new voices using normalizing flows
Publikacja
- P. Biliński
- T. Merritt
- A. Ezzerg
- K. Pokora
- S. Cygert
- K. Yanagisawa
- R. Barra-Chicote
- D. Korzekwa
- Rok 2022
Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Pełny tekst do pobrania w portalu

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: SPEECH RECOGNITION, SPEECH ANALYSIS, PHONEME, ALLOPHONE.

Jan Daciuk dr hab. inż.