Search results for: speech recognition, speech analysis, phoneme, allophone.

Search results for: speech recognition, speech analysis, phoneme, allophone.

results on page:
embed this view on your website

Filters

total: 574

clear all filters disabled

Creating new voices using normalizing flows
Publication
- P. Biliński
- T. Merritt
- A. Ezzerg
- K. Pokora
- S. Cygert
- K. Yanagisawa
- R. Barra-Chicote
- D. Korzekwa
- Year 2022
Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Full text available to download
Strategie treningu neuronowego estymatora częstotliwości tonu krtaniowego z użyciem generatora syntetycznych samogłosek
Publication
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Year 2022
W wielu zastosowaniach telekomunikacyjnych pojawia się problem przetwarzania lub analizy sygnału mowy, w ramach którego, często w obszarze podstawowych algorytmów, stosuje się estymator częstotliwości tonu krtaniowego. Estymator rozpatrywany w tej pracy bazuje na neuronowym klasyfikatorze podejmującym decyzje na podstawie częstotliwości oraz mocy chwilowej wyznaczanych w podpasmach analizowanego sygnału mowy. W pracy rozważamy...

Full text available to download
Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.
Publication
- Year 2018
In this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...

Full text to download in external service
Zastosowanie spowalniania wypowiedzi w celu poprawy rozumienia mowy przez dzieci w szkole
Publication
- A. Kupryjanow
- A. Czyżewski
- Year 2009
This paper presents a time-scale modification algorithms that could be used for hearing impairment therapy supported by real-time speech stretching. In this paper the OLA based algorithms and Phase Vocoder were described. In the experimental part usability of those algorithms for real-time speech stretching was discussed
Modeling and Designing Acoustical Conditions of the Interior – Case Study
Publication
- Archives of Acoustics - Year 2016
The primary aim of this research study was to model acoustic conditions of the Courtyard of the Gdańsk University of Technology Main Building, and then to design a sound reinforcement system for this interior. First, results of measurements of the parameters of the acoustic field are presented. Then, the comparison between measured and predicted values using the ODEON program is shown. Collected data indicate a long reverberation...

Full text available to download
The Innovative Faculty for Innovative Technologies
Publication
- Year 2013
A leaflet describing Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology. Multimedia Systems Department described laboratories and prototypes of: Auditory-visual attention stimulator, Automatic video event detection, Object re-identification application for multi-camera surveillance systems, Object Tracking and Automatic Master-Slave PTZ Camera Positioning System, Passive Acoustic Radar,...

Full text to download in external service
Instantaneous complex frequency for pipeline pitch estimation
Publication
- M. [. Kaniewska
- Year 2010
In the paper a pipeline algorithm for estimating the pitch of speech signal is proposed. The algorithm uses instantaneous complex frequencies estimated for four waveforms obtained by filtering the original speech signal through four bandpass complex Hilbert filters. The imaginary parts of ICFs from each channel give four candidates for pitch estimates. The decision regarding the final estimate is made based on the real parts of...
XVIII Międzynarodowe Sympozjum Inżynierii i Reżyserii Dźwięku
Publication
- P. Falkowski-Gilski
- S. Brachmański
- A. Dobrucki
- M. Kin
- Year 2021
The subjective assessment of speech signals takes into account previous experiences and habits of an individual. Since the perception process deteriorates with age, differences should be noticeable among people from dissimilar age groups. In this work, we investigated the difference of speech quality assessment between high school students and university students. The study involved 60 participants, with 30 people in both the adolescents...

Full text to download in external service
International Conference on Image Analysis and Recognition

Conferences
Automatic Emotion Recognition in Children with Autism: A Systematic Literature Review
Publication
- A. Landowska
- A. Karpus
- T. Zawadzka
- B. Robins
- D. Erol Barkana
- H. Kose
- T. Zorcec
- N. Cummins
- SENSORS - Year 2022
The automatic emotion recognition domain brings new methods and technologies that might be used to enhance therapy of children with autism. The paper aims at the exploration of methods and tools used to recognize emotions in children. It presents a literature review study that was performed using a systematic approach and PRISMA methodology for reporting quantitative and qualitative results. Diverse observation channels and modalities...

Full text available to download
Human voice modification using instantaneous complex frequency
Publication
- M. Kaniewska
- Year 2010
The paper presents the possibilities of changing human voice by modifying instantaneous complex frequency (ICF) of the speech signal. The proposed method provides a flexible way of altering voice without the necessity of finding fundamental frequency and formants' positions or detecting voiced and unvoiced fragments of speech. The algorithm is simple and fast. Apart from ICF it uses signal factorization into two factors: one fully...
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Publication
- P. Rościszewski
- J. Kaliski
- Year 2017
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Full text to download in external service
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
Publication
- P. Rościszewski
- Procedia Computer Science - Year 2017
In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

Full text available to download
Auditory-visual attention stimulator
Publication
- Year 2013
New approach to lateralization irregularities formation was proposed. The emphasis is put on the relationship between visual and auditory attention stimulation. In this approach hearing is stimulated using time scale modified speech and sight is stimulated by rendering the text of the currently heard speech. Moreover, displayed text is modified using several techniques i.e. zooming, highlighting etc. In the experimental part of...

Full text to download in external service
INVESTIGATION OF THE LOMBARD EFFECT BASED ON A MACHINE LEARNING APPROACH
Publication
- G. Korvel
- P. Treigys
- K. Kąkol
- B. Kostek
- International Journal of Applied Mathematics and Computer Science - Year 2023
The Lombard effect is an involuntary increase in the speaker’s pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters...

Full text available to download
Intelligent multimedia solutions supporting special education needs.
Publication
- A. Czyżewski
- B. Kostek
- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011
The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
Intelligent video and audio applications for learning enhancement
Publication
- A. Czyżewski
- B. Kostek
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2011
The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

Full text available to download
IEEE International Conference on Document Analysis and Recognition

Conferences
Auditory Brainstem Responses recorded employing Audio ABR device
Open Research Data
open access
- P. Odya
- A. Czyżewski
The dataset consists of ABR measurements employing click, burst and speech stimuli. Parameters of the particular stimuli were as follows:
Voice command recognition using hybrid genetic algorithm
Publication
- M. Wroniszewska
- J. Dziedzic
- TASK Quarterly - Year 2010
Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

Full text available to download
DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING
Publication
- N. Rizun
- J. Taranenko
- Rocznik Naukowy Wydzialu Zarzadzania w Ciechanowie - Year 2017
The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

Full text available to download
A study on signal processing methods applied to hearing aids
Publication
- B. Kostek
- K. Kąkol
- Year 2016
This paper presents a short survey on current technology available in hearing aids with a focus on digital signal processing techniques used. First, factors influencing the hearing aid effectiveness are introduced. Then, examples of the present DSP methods and strategies are provided. Also, a description of current limitations of hearing aids and future trends of development are shown. Finally, the notion of computational auditory...
Separability Assessment of Selected Types of Vehicle-Associated Noise
Publication
- Advances in Intelligent Systems and Computing - Year 2016
Music Information Retrieval (MIR) area as well as development of speech and environmental information recognition techniques brought various tools in-tended for recognizing low-level features of acoustic signals based on a set of calculated parameters. In this study, the MIRtoolbox MATLAB tool, designed for music parameter extraction, is used to obtain a vector of parameters to check whether they are suitable for separation of...

Full text to download in external service
Variable Ratio Sample Rate Conversion Based on Fractional Delay Filter
Publication
- M. Blok
- P. Drózda
- Archives of Acoustics - Year 2014
In this paper a sample rate conversion algorithm which allows for continuously changing resampling ratio has been presented. The proposed implementation is based on a variable fractional delay filter which is implemented by means of a Farrow structure. Coefficients of this structure are computed on the basis of fractional delay filters which are designed using the offset window method. The proposed approach allows us to freely...

Full text available to download
Prof. Haitham Abu-Rub - A Visit to Poland's Gdansk University of Technology
Publication
- J. Guziński
- IEEE Industrial Electronics Magazine - Year 2015
Report on visit of Prof. Haitham Abu-Rub in Gdansk University of Technology. Speech on the Smart Grid Centre. Visit in the new smart grid laboratory of the GUT, the Laboratory for Innovative Power Technologies and Integration of Renewable Energy Sources (LINTE^2).

Full text available to download
Comparative Study of Self-Organizing Maps vs. Subjective Evaluation of Quality of Allophone Pronunciation for Nonnative English Speakers
Publication
- Year 2017
The purpose of this study was to apply Self-Organizing Maps to differentiate between the correct and the incorrect allophone pronunciations and to compare the results with subjective evaluation. Recordings of a list of target words, containing selected allophones of English plosive consonants, the velar nasal and the lateral consonant, were made twice. First, the target words were read from the list by 9 non-native speakers and...
A Comparison of STI Measured by Direct and Indirect Methods for Interiors Coupled with Sound Reinforcement Systems
Publication
- Year 2018
This paper presents a comparison of STI (Speech Transmission Index) coefficient measurement results carried out by direct and indirect methods. First, acoustic parameters important in the context of public address and sound reinforcement systems are recalled. A measurement methodology is presented that employs various test signals to determine impulse responses. The process of evaluating sound system performance, signals enabling...

Full text to download in external service
Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set
Publication
- P. Filipowicz
- B. Kostek
- Applied Sciences-Basel - Year 2023
This work deals with automatically detecting stuttering and its subclasses. An effective classification of stuttering along with its subclasses could find wide application in determining the severity of stuttering by speech therapists, preliminary patient diagnosis, and enabling communication with the previously mentioned voice assistants. The first part of this work provides an overview of examples of classical and deep learning...

Full text available to download
Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model
Publication
- K. Leckey
- R. Neininger
- W. Szpankowski
- Year 2013
Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel- Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing...
Analysis-by-synthesis paradigm evolved into a new concept
Publication
- B. Kostek
- Journal of the Acoustical Society of America - Year 2022
This work aims at showing how the well-known analysis-by-synthesis paradigm has recently been evolved into a new concept. However, in contrast to the original idea stating that the created sound should not fail to pass the foolproof synthesis test, the recent development is a consequence of the need to create new data. Deep learning models are greedy algorithms requiring a vast amount of data that, in addition, should be correctly...

Full text to download in external service
Extracting concepts from the software requirements specification using natural language processing
Publication
- J. Kuchta
- P. Padhiyar
- Year 2018
Extracting concepts from the software require¬ments is one of the first step on the way to automating the software development process. This task is difficult due to the ambiguity of the natural language used to express the requirements specification. The methods used so far consist mainly of statistical analysis of words and matching expressions with a specific ontology of the domain in which the planned software will be applicable....

Full text to download in external service
Playback detection using machine learning with spectrogram features approach
Publication
- J. Dembski
- J. Rumiński
- Year 2017
This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect...

Full text available to download
Multimedia industrial and medical applications supported by machine learning
Publication
- A. Czyżewski
- Year 2023
This article outlines a keynote paper presented at the Intelligent DecisionTechnologies conference providing a part of the KES Multi-theme Conference “Smart Digital Futures” organized in Rome on June 14–16, 2023. It briefly discusses projects related to traffic control using developed intelligent traffic signs and diagnosing the health of wind turbine mechanisms and multimodal biometric authentication for banking branches to provide...

Full text to download in external service
Performance Analysis of the OpenCL Environment on Mobile Platforms
Publication
- P. Falkowski-Gilski
- M. Plewka
- Year 2022
Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Full text to download in external service
Evaluation Criteria for Affect-Annotated Databases
Publication
- Year 2015
In this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...

Full text to download in external service
Cross-domain applications of multimodal human-computer interfaces
Publication
- A. Czyżewski
- Year 2015
Developed multimodal interfaces for education applications and for disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and audio interface for speech stretching for hearing impaired and stuttering people and intelligent pen allowing for diagnosing and ameliorating developmental dyslexia. The eye-gaze tracking system named...
Ultrawideband transmission in physical channels: a broadband interference view
Publication
- HYDROACOUSTICS - Year 2014
The superposition of multipath components (MPC) of an emitted wave, formed by reflections from limiting surfaces and obstacles in the propagation area, strongly affects communication signals. In the case of modern wideband systems, the effect should be seen as a broadband counterpart of classical interference which is the cause of fading in narrowband systems. This paper shows that in wideband communications, the time- and frequency-domain...

Full text available to download
Szymon Andrzejewski dr

People

Wydział Nauk Społecznych

Master’s degree at the University of Gdańsk in 2008 Major in political system and self-government. Overgraduate studies at the Gdańsk University of Technology „Management and evaluation of projects financed from EU funds” and at AGH University of Science and Technology Noise protection against noise and vibration. Student of sociology PhD studies at the University of Gdańsk from 2016. The research scope is democracy and institutions...
A Novel Approach to the Assessment of Cough Incidence
Publication
- Year 2013
In this paper we consider the problem of identication of cough events in patients suffering from chronic respiratory diseases. The information about frequency of cough events is necessary to medical treatment. The proposed approach is based on bidirectional processing of a measured vibration signal - cough events are localized by combining the results of forward-time and backward-time analysis. The signal is at rst transformed...

Full text to download in external service
ALOFON corpus
Open Research Data
The ALOFON corpus is one of the multimodal database of word recordings in English, available at http://www.modality-corpus.org/. The ALOFON corpus is oriented towards the recording of the speech equivalence variants. For this purpose, a total of 7 people who are or speak English with native speaker fluency and a variety of Standard Southern British...
Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing
Publication
- M. Niedźwiecki
- M. Ciołek
- IEEE Transactions on Audio Speech and Language Processing - Year 2013
In this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...

Full text available to download
Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering
Publication
- IEEE Transactions on Audio Speech and Language Processing - Year 2015
This paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...

Full text available to download
Dynamic Bayesian Networks for Symbolic Polyphonic Pitch Modeling
Publication
- S. Raczyński
- E. Vincent
- S. Sagayama
- IEEE Transactions on Audio Speech and Language Processing - Year 2013
Symbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of an- alyzing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the “horizontal” and the “vertical” pitch struc- ture. These models are formulated as linear or log-linear interpo- lations of up to fi ve sub-models, each of which is...

Full text to download in external service
Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation
Publication
- S. Raczyński
- E. Vincent
- IEEE Transactions on Audio Speech and Language Processing - Year 2014
In this work we present a new Bayesian topic model: latent hierarchical Pitman-Yor process allocation (LHPYA), which uses hierarchical Pitman-Yor pr ocess priors for both word and topic distributions, and generalizes a few of the existing topic models, including the latent Dirichlet allocation (LDA), the bi- gram topic model and the hierarchical Pitman-Yor topic model. Using such priors allows for integration of -grams with a topic model,...

Full text to download in external service
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders
Publication
- D. Koszewski
- T. Görne
- G. Korvel
- B. Kostek
- EURASIP Journal on Audio Speech and Music Processing - Year 2023
The purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...

Full text available to download
Secured wired BPL voice transmission system
Publication
- G. Debita
- P. Falkowski-Gilski
- M. Habrych
- B. Miedziński
- J. Wandzio
- P. Jedlikowski
- Scientific Journal of the Military University of Land Forces - Year 2020
Designing a secured voice transmission system is not a trivial task. Wired media, thanks to their reliability and resistance to mechanical damage, seem an ideal solution. The BPL (Broadband over Power Line) cable is resistant to electricity stoppage and partial damage of phase conductors, ensuring continuity of transmission in case of an emergency. It seems an appropriate tool for delivering critical data, mostly clear and understandable...

Full text available to download
A Device for Measuring Auditory Brainstem Responses to Audio
Publication
- Year 2018
Standard ABR devices use clicks and tone bursts to assess subjects’ hearing in an objective way. A new device was developed that extends the functionality of a standard ABR audiometer by collecting and analyzing auditory brainstem responses (ABR). The developed accessory allows for the use of complex sounds (e.g., speech or music excerpts) as stimuli. Therefore, it is possible to find out how efficiently different types of sounds...

Full text available to download
Discovery of Stylistic Patterns in Business Process Textual Descriptions: IT Ticket Case
Publication
- N. Rizun
- A. Revina
- V. Maister
- Year 2019
Growing IT complexity and related problems, which are reflected in IT tickets,create a need for new qualitative approaches. The goal isto automate the extraction of main topics described in tickets in order to provide high quality support for the IT process workers and enablea smooth service delivery to the end user. Present paper proposes a method of knowledge extraction in a form of stylistic patterns in business...

Full text available to download
Adaptacja akustyczna pomieszczenia wykładowego - studium przypadku
Publication
- M. Mańkowska
- Year 2018
W niniejszej pracy przedstawiono analizę rozkładu pola akustycznego sali wykładowej znajdującej się w budynku Wydziału Elektroniki i Telekomunikacji Politechniki Gdańskiej. Badania przeprowadzono metodą pomiarową oraz symulacyjną z wykorzystaniem programu Odeon. Wybór parametrów oceny akustyki wnętrz sugerowany jest wymaganiami stawianymi pomieszczeniom lekcyjnym z zaznaczeniem multimedialnego charakteru wykładów prowadzonych...
Chirp Rate and Instantaneous Frequency Estimation: Application to Recursive Vertical Synchrosqueezing
Publication
- D. Fourer
- F. Auger
- K. Czarnecki
- S. Meignen
- P. Flandrin
- IEEE SIGNAL PROCESSING LETTERS - Year 2017
This letter introduces new chirp rate and instantaneous frequency estimators designed for frequency-modulated signals. These estimators are first investigated from a deterministic point of view, then compared together in terms of statistical efficiency. They are also used to design new recursive versions of the vertically synchrosqueezed short-time Fourier transform, using a previously published method (D. Fourer, F. Auger, and...

Full text available to download

Search

Filters

Catalog

Search results for: speech recognition, speech analysis, phoneme, allophone.

Szymon Andrzejewski dr