Search results for: Query by Sketch

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

Publication

- Electronics - Year 2022

Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...

Full text available to download

Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

Publication

- Year 2016

The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

The influence of PET mechanical properties on Stretch Blow Molding (SBM) process

Publication

P. Wawrzyniak

- Year 2013

In the paper it is said about the influence of PET mechanical properties on SBM process parameters changes. The below paper mentions also about the influence of PET orientation and crystallization processes which have a very big influence on mechanical and thermal properties of PET material during SBM process. All mechanical data of PET material and SBM process parameters change in time are been got from collected literature which...

Time-scale modification of speech signals for supporting hearing impaired schoolchildren

Publication

- Year 2009

A study of time scale modification algorithmsapplied to hearing impaired schoolchildren supporting ispresented. Variety of algorithms are considered, namely:overlap and add, two variations of synchronized overlapand add, and the phase vocoder. Their effectiveness as wellas real-time processing capabilities are examined.

A non-uniform real-time speech time-scale stretching method

Publication

- Year 2011

An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition

Publication

G. Korvel
P. Treigys
G. Tamulevicus
J. Bernataviciene
B. Kostek

- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2018

convolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word...

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

Publication

G. Korvel
O. Kurasova
B. Kostek

- Year 2019

The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

Full text available to download

Information Retrieval with the Use of Music Clustering by Directions Algorithm

Publication

A. Kaczmarek

- Year 2013

This paper introduces the Music Clustering by Directions (MCBD) algorithm. The algorithm is designed to support users of query by humming systems in formulating queries. This kind of systems makes it possible to retrieve songs and tunes on the basis of a melody recorded by the user. The Music Clustering by Directions algorithm is a kind of an interactive query expansion method. On the basis of query, the algorithm provides suggestions...

Full text to download in external service

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publication

- Elektronika : konstrukcje, technologie, zastosowania - Year 2008

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Improving signal quality of a speech codec using hybrid perceptual-parametric algorithm

Publication

- International Journal of Intelligent Information and Database Systems - Year 2008

W artykule zaprezentowano hybrydową architekturę parametryczno-perceptualną kodeka mowy. Jego podstawę stanowi kodek CELP, który wspomagany jest kodekiem perceptualnym. Celem zastosowania proponowanej metody jest uzyskanie poprawy jakości kodowania sygnału mowy. Badaniom poddano dwie architektury, z których w jednej dźwięczne części sygnału rezydualnego kodeka CELP kodowane są perceptualnie. Drugi z proponowanych kodeków dokonuje...

Full text to download in external service

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publication

- Year 2007

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Combining visual and acoustic modalities to ease speech recognition by hearing impaired people

Publication

- Year 2005

Artykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
B. Kostek

- SPEECH COMMUNICATION - Year 2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Full text available to download

A survey of automatic speech recognition deep models performance for Polish medical terms

Publication

- Year 2023

Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

Full text to download in external service

Elimination of clicks from archive speech signals using sparse autoregressive modeling

Publication

- Year 2012

This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

Full text to download in external service

Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

Publication

K. Kąkol
G. Korvel
B. Kostek

- Year 2018

The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Publication

G. Tamulevicius
G. Korvel
A. B. Yayak
P. Treigys
J. Bernataviciene
B. Kostek

- Electronics - Year 2020

In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

Full text available to download

Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System

Publication

M. Zamłyńska
P. Falkowski-Gilski
G. Debita
B. Miedziński

- Year 2021

Although there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...

Full text to download in external service

Database of speech and facial expressions recorded with optimized face motion capture settings

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2019

The broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...

Full text available to download

An Efficient Noisy Binary Search in Graphs via Median Approximation

Publication

D. Dereniowski
A. Łukasiewicz
P. Uznański

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2021

Consider a generalization of the classical binary search problem in linearly sorted data to the graph-theoretic setting. The goal is to design an adaptive query algorithm, called a strategy, that identifies an initially unknown target vertex in a graph by asking queries. Each query is conducted as follows: the strategy selects a vertex q and receives a reply v: if q is the target, then =, and if q is not the target, then v is a...

Full text to download in external service

NLP Questions Answering Using DBpedia and YAGO

Publication

- Vietnam Journal of Computer Science - Year 2020

In this paper, we present results of employing DBpedia and YAGO as lexical databases for answering questions formulated in the natural language. The proposed solution has been evaluated for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference). Our method uses dependency trees generated from the user query. The trees are browsed for paths leading from the root of the tree to the question...

Full text available to download

Novel Family of modified qZS buck-boost multilevel inverters with reduced switch count

Publication

O. Husev
R. Strzelecki
F. Blaabjerg
V. Chopyk
D. Vinnikov

- Year 2015

Full text to download in external service

Stochastic Integration and Long Term Predictor Estimation under Noisy Conditions for Speech Enhancement

Publication

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- Year 2005

Full text to download in external service

Automated detection of pronunciation errors in non-native English speech employing deep learning

Publication

D. Korzekwa

- Year 2023

Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

Full text available to download

Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems

Publication

- Pomiary Automatyka Robotyka - Year 2013

The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

Full text available to download

Shaking table experimental study on damage mechanism of the disconnecting switch under seismic excitation

Publication

- Key Engineering Materials - Year 2011

The efficiency of the energetic network is a very import safety issue in the region experienced by the earthquake. High voltage disconnecting switches are important elements of the energetic infrastructure used to separate electric circuits (i.e. during repairs), which should not be damaged remaining fully operational. The aim of the paper is to show the results of the shaking table experimental investigation focused on damage...

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

Publication

- Journal of the Acoustical Society of America - Year 2018

A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

Full text to download in external service

Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System

Publication

P. Falkowski-Gilski
G. Debita
M. Habrych
B. Miedziński
P. Jedlikowski
B. Polnik
J. Wandzio
X. Wang

- Year 2020

The broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...

Full text to download in external service

Design of Intelligent Low-Voltage Load Switch for Remote Control System in Smart Grid

Publication

D. Xiong
X. Chen
R. Martinek
H. Wen
D. Luo
J. Smulko

- Iranian Journal of Science and Technology-Transactions of Electrical Engineering - Year 2021

Current low-voltage load switches do not support remote disconnect/connect and real-time monitoring of a disconnect/connect state. Addressing to these issues, this paper presents a low-voltage load switch for a smart remote control system, which uses a one-chip microcontroller board and a DC step motor drive mechanism and provides the feedback on the switch status also. Arrears disconnect and full-pay connect control is implemented...

Full text available to download

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
S. Calamaro
B. Kostek

- Year 2021

We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

Full text available to download

Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students

Publication

P. Falkowski-Gilski

- Year 2021

The user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...

Full text to download in external service

EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY

Publication

- Year 2014

The problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...

Akustyczny obraz słowa na tle mowy etnicznej [The acoustic image of ethnic speech words]

Publication

K. Wojan

- Year 2002

Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition

Publication

J. S. Garcia Salinas
A. A. Torres-García
C. A. Reyes-Garćia
L. Villaseñor-Pineda

- Biomedical Signal Processing and Control - Year 2023

Brain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....

Full text to download in external service

CLICK 'n' Sleep: Light-Switch Behavior of Triazole-Containing Tris(bipyridyl)ruthenium Complexes

Publication

M. Braumüller
M. Staniszewska
J. Guthmuller
S. Rau

- EUROPEAN JOURNAL OF INORGANIC CHEMISTRY - Year 2016

A set of RuII complexes incorporating triazole subunits are presented. They show a solvent-dependent light-switch effect. Theoretical calculations revealed the excited states involved in the emission process. The findings are highly important for future design of light-switch sensors and suggest a severe restriction for functional photomolecular devices synthesized by CLICK chemistry.

Full text to download in external service

Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich

Publication

K. Kowalik-Bańczyk

- Year 2015

The article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...

EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY

Publication

- Year 2014

The problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...

A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

Publication

- SENSORS - Year 2022

Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...

Full text available to download

DBpedia and YAGO Based System for Answering Questions in Natural Language

Publication

- Year 2018

In this paper we propose a method for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference) based on DBpedia and YAGO. Our method is based on generating dependency trees for the query. In the dependency tree we look for paths leading from the root to the named entity of interest. These paths (referenced further as fibers) are candidates for representation of actual user intention. The...

Full text available to download

The development of speech in early childhood in children from twin pregnancies with twin-twin transfusion syndrome (TTTS)

Publication

M. Bidzan
Ł. Bieleninik
M. Lipowska

- Polish Psychological Bulletin - Year 2013

Full text to download in external service

Minimum mean square error estimation of speech short-term predictor parameters under noisy conditions

Publication

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- Year 2003

Full text to download in external service

Immune escape of B-cell lymphoblastic leukemic cells through a lineage switch to acute myeloid leukemia

Publication

K. Bełdzińska-Gądek
E. Zarzycka
K. Pastuszak
K. Borman
K. Lewandowski
J. M. Zaucha
W. Prejzner

- LEUKEMIA & LYMPHOMA - Year 2024

Acute leukemia (AL) with a lineage switch (LS) is associated with poor prognosis. The predisposing factors of LS are unknown, apart from KMT2A rearrangements that have been reported to be associated with LS. Herein, we present two cases and review all 104 published cases to identify risk factors for LS. Most of the patients (75.5%) experienced a switch from the lymphoid phenotype to the myeloid phenotype. Eighteen patients (17.0%)...

Full text to download in external service

Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

Publication

D. Piotrowski
R. Korzeniowski
A. Falai
S. Cygert
K. Pokora
G. Tinchev
Z. Zhang
K. Yanagisawa

- Year 2023

In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

Full text to download in external service

Estimation of time-frequency complex phase-based speech attributes using narrow band filter banks

Publication

K. Abratkiewicz
K. Czarnecki
D. Fourer
F. Auger

- Year 2017

In this paper, we present nonlinear estimators of nonstationary and multicomponent signal attributes (parameters, properties) which are instantaneous frequency, spectral (or group) delay, and chirp-rate (also known as instantaneous frequency slope). We estimate all of these distributions in the time-frequency domain using both finite and infinite impulse response (FIR and IIR) narrow band filers for speech analysis. Then, we present...

Full text available to download

Cyfrowa analiza mowy etnicznej – ekstrakcja kodu informacji [A digital analysis of ethnic speech – deciphering the information code]

Publication

K. Wojan

- Year 2003

Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine

Publication

P. Falkowski-Gilski
G. Debita

- Archives of Acoustics - Year 2023

In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

Full text available to download

Pulse-Width Modulation Template for Five-Level Switch-Clamped H-Bridge-Based Cascaded Multilevel Inverter

Publication

C. I. Odeh
D. Kondratenko
A. Lewicki
M. Morawiec
A. Jąderko
J. Baran

- ENERGIES - Year 2021

This article presents a carrier-based pulse-width modulation (PWM) template for a 5-level, H bridge-based cascaded multilevel inverter (MLI). The developed control concept generates adequate modulation template for this inverter topology wherein a sinusoidal modulating waveform is modified to fit in a single triangular carrier signal range. With this modulation approach, classical multiplicity and synchronization of the triangular...

Full text available to download

Novel Family of Single-Phase Modified Impedance-Source Buck-Boost Multilevel Inverters With Reduced Switch Count

Publication

O. Husev
R. Strzelecki
F. Blaabjerg
V. Chopyk
D. Vinnikov

- IEEE TRANSACTIONS ON POWER ELECTRONICS - Year 2016

his paper describes novel single-phase solutions with increased inverter voltage levels derived by means of a nonstandard inverter configuration and impedance source networks. Operation principles based on special modulation techniques are presented. Detailed component design guidelines along with simulation and experimental verification are also provided. Possible application fields are discussed, as well as advantages and disadvantages....

Full text to download in external service

Цифровой анализ сигналов речи как инструмент сравнительного языкознания [A digital analysis of speech signals as an instrument in comparative linguistics]

Publication

K. Wojan

- Year 2003

System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych = System of speech signal processing and visualisation of the results

Publication

Z. Wojan
W. Lis
K. Wojan

- Year 2005

W artykule przedstawiono sposób przetwarzania i wizualizacji sygnału mowy w formie prostego w obsłudze i relatywnie niedrogiego urządzenia do nagrywania sygnału akustycznego oraz przetwarzania cyfrowego wyselekcjonowanych fragmentów i wizualizacji uzyskanych rezultatów przekształceń. Zastosowano do tego celu komputer z kartą dźwiękową. Przetwarzanie cyfrowe oraz wizualizacja dokonywana była w oparciu o program MATLAB bezpośrednio...

Search

Filters

Catalog

Category

Year

Options

Search results for: Query by Sketch