Search results for: AUTOMATIC SPEECH RECOGNITION

Towards Emotion Acquisition in IT Usability Evaluation Context

Publication

A. Landowska

- Year 2015

The paper concerns extension of IT usability studies with automatic analysis of the emotional state of a user. Affect recognition methods and emotion representation models are reviewed and evaluated for applicability in usability testing procedures. Accuracy of emotion recognition, susceptibility to disturbances, independence on human will and interference with usability testing procedures are...

Full text to download in external service

Potential and Use of the Googlenet Ann for the Purposes of Inland Water Ships Classification

Publication

K. Bobkowska
I. Bodus-olkowska Izabela

- Polish Maritime Research - Year 2020

This article presents an analysis of the possibilities of using the pre-degraded GoogLeNet artificial neural network to classify inland vessels. Inland water authorities monitor the intensity of the vessels via CCTV. Such classification seems to be an improvement in their statutory tasks. The automatic classification of the inland vessels from video recording is a one of the main objectives of the Automatic Ship Recognition and...

Full text available to download

Classifying type of vehicles on the basis of data extracted from audio signal characteristics

Publication

- Journal of the Acoustical Society of America - Year 2017

The aim of this study is to find and optimize a feature vector for an automatic recognition of the type of vehicles, extracted form an audio signal. First, the influence of weather-based conditions of road surface on spectral characteristic of the audio signal recorded from a passing vehicle in close proximity to the road is discussed. Next, parameterization of the recorded audio signal is performed. For that purpose, the MIRtoolbox,...

Full text to download in external service

Playback detection using machine learning with spectrogram features approach

Publication

- Year 2017

This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect...

Full text available to download

Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results

Publication

G. Korvel
O. Kurasova
B. Kostek

- Archives of Acoustics - Year 2019

The goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are...

Full text available to download

Analiza stanu nawierzchni i klas pojazdów na podstawie parametrów ekstrahowanych z sygnału fonicznego

Publication

- Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej - Year 2016

Celem badań jest poszukiwanie parametrów wektora cech ekstrahowanego z sygnału fonicznego w kontekście automatycznego rozpoznawania stanu nawierzchni jezdni oraz typu pojazdów. W pierwszej kolejności przedstawiono wpływ warunków pogodowych na charakterystykę widmową sygnału fonicznego rejestrowanego przy przejeżdżających pojazdach. Następnie, dokonano parametryzacji sygnału fonicznego oraz przeprowadzano analizę korelacyjną w celu...

Full text available to download

Detection and localization of selected acoustic events in 3D acoustic field for smart surveillance applications

Publication

- Communications in Computer and Information Science - Year 2011

A method for automatic determination of position of chosen sound events such as speech signals and impulse sounds in 3-dimensional space is presented. The events are localized in the presence of sound reflections employing acoustic vector sensors. Human voice and impulsive sounds are detected using adaptive detectors based on modified peak-valley difference (PVD) parameter and sound pressure level. Localization based on signals...

Full text to download in external service

Detection and localization of selected acoustic events in acoustic field for smart surveillance applications

Publication

- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2014

A method for automatic determination of position of chosen sound events such as speech signals and impulse sounds in 3-dimensional space is presented. The evens are localized in the presence of sound reflections employing acoustic vector sensors. Human voice and impulsive sounds are detected using adaptive detectors based on modified peak-valley difference (PVD) parameter and sound pressure level. Localization based on signals...

Full text available to download

Multi-Stage Video Analysis Framework

Publication

- Year 2011

The chapter is organized as follows. Section 2 presents the general structure of the proposed framework and a method of data exchange between system elements. Section 3 is describing the low-level analysis modules for detection and tracking of moving objects. In Section 4 we present the object classification module. Sections 5 and 6 describe specialized modules for detection and recognition of faces and license plates, respectively....

Full text to download in external service

Quality of graphical markers for the needs of eyewear devices

Publication

A. Kwaśniewska
J. Rumiński
J. Klimiuk-Myszk
F. Jérôme
M. Benoît
P. Isabelle

- Year 2015

in this paper we propose to cast the problem of identification of people, objects or places into an application for smart glasses that decodes information from graphical markers. We focus on analyzing different factors that can have influence on the processes of the automatic recognition of information from a code. The research we present aims at reviewing recognition performances in function of: size of a marker, distance from/to...

Full text to download in external service

Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing

Publication

- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2020

Developing signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings....

Full text available to download

Music Mood Visualization Using Self-Organizing Maps

Publication

- Archives of Acoustics - Year 2015

Due to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which...

Full text available to download

Robot-Based Intervention for Children With Autism Spectrum Disorder: A Systematic Literature Review

Publication

K. D. Bartl-Pokorny
P. Uluer
D. E. Barkana
A. Baird
H. Kose
T. Zorcec
B. Robins
B. Schuller
A. Landowska
M. Pykała

- IEEE Access - Year 2021

Children with autism spectrum disorder (ASD) have deficits in the socio-communicative domain and frequently face severe difficulties in the recognition and expression of emotions. Existing literature suggested that children with ASD benefit from robot-based interventions. However, studies varied considerably in participant characteristics, applied robots, and trained skills. Here, we reviewed robot-based interventions targeting...

Full text available to download

Identification of acoustic event of selected noise sources in a long-term environmental monitoring systems

Publication

M. Kłaczyński
W. Cioch
T. Wszołek
W. Wszołek
D. Mleczko
P. Pawlik
A. Grzeczka

- Year 2014

ABSTRACT Undertaking long-term acoustic measurements on sites located near an airport is related to a problem of large quantities of recorded data, which very often represents information not related to flight operations. In such areas, usually defined as zone of limited use, often other sources of noise exist, such as roads or railway lines treated is such context as acoustic background. Manual verification of such recorded data...

Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC

Publication

P. Czarnul

- Year 2002

This work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...

Full text to download in external service

Semi complex navigation with an active optical gesture sensor

Publication

- Year 2016

This paper presents the methods of diversified touchless interactions between a user and a mobile platform utilizing the optical gesture sensor. The sensor uses 8 photodiodes to measure the reflected light in the active mode (using embedded LEDs) or it measures shadows caused by fingers in the passive mode. Several algorithms were implemented: automatic mode switching, adaptive illumination level compensation, resolution improvements...

Full text to download in external service

In uence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classi cation

Publication

A. Rosner
F. Weninger
B. Schuller
M. Michalak
B. Kostek

- Year 2013

We present a comprehensive evaluation of the infuence of 'harmonic' and rhythmic sections contained in an audio file on automatic music genre classi cation. The study is performed using the ISMIS database composed of music files, which are represented by vectors of acoustic parameters describing low-level music features. Non-negative Matrix Factorization serves for blind separation of instrument components. Rhythmic components...

Separability Assessment of Selected Types of Vehicle-Associated Noise

Publication

- Advances in Intelligent Systems and Computing - Year 2016

Music Information Retrieval (MIR) area as well as development of speech and environmental information recognition techniques brought various tools in-tended for recognizing low-level features of acoustic signals based on a set of calculated parameters. In this study, the MIRtoolbox MATLAB tool, designed for music parameter extraction, is used to obtain a vector of parameters to check whether they are suitable for separation of...

Full text to download in external service

Performance Analysis of the OpenCL Environment on Mobile Platforms

Publication

- Year 2022

Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Full text to download in external service

The Hough transform in the classification process of inland ships

Publication

K. Bobkowska
N. Wawrzyniak

- Zeszyty Naukowe Akademii Morskiej w Szczecinie - Year 2019

This article presents an analysis of the possibilities of using image processing methods for feature extraction that allows kNN classification based on a ship’s image delivered from an on-water video surveillance system. The subject of the analysis is the Hough transform which enables the detection of straight lines in an image. The recognized straight lines and the information about them serve as features in the classification...

Full text available to download

Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model

Publication

K. Leckey
R. Neininger
W. Szpankowski

- Year 2013

Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel- Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing...

Detecting Apples in the Wild: Potential for Harvest Quantity Estimation

Publication

A. Janowski
R. Kaźmierczak
C. Kowalczyk
J. Szulwic

- Sustainability - Year 2021

Knowing the exact number of fruits and trees helps farmers to make better decisions in their orchard production management. The current practice of crop estimation practice often involves manual counting of fruits (before harvesting), which is an extremely time-consuming and costly process. Additionally, this is not practicable for large orchards. Thanks to the changes that have taken place in recent years in the field of image...

Full text available to download

Audio content analysis in the urban area telemonitoring system

Publication

- Year 2010

Artykuł przedstawia możliwości rozwinięcie monitoringu miejskiego o automatyczną analizę dźwięku. Przedstawiono metody parametryzacji dźwięku, które możliwe są do zastosowania w takim systemie oraz omówiono aspekty techniczne implementacji. W kolejnej części przedstawiono system decyzyjny oparty na drzewach zastosowany w systemie. System ten rozpoznaje dźwięki niebezpieczne (strzał, rozbita szyba, krzyk) wśród dźwięków zarejestrowanych...

Full text to download in external service

Smart Virtual Bass Synthesis Algorithm Based on Music Genre Classification

Publication

- Year 2014

The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The proposed algorithm employed automatic music genre recognition to determine the optimum parameters for the synthesis of additional frequencies. The synthesis was carried out using the non-linear device (NLD) and phase vocoder (PV) methods depending on the music excerpt genre. Classification of musical...

Selection of an artificial pre-training neural network for the classification of inland vessels based on their images

Publication

K. Bobkowska
I. Bodus-olkowska Izabela

- Zeszyty Naukowe Akademii Morskiej w Szczecinie - Year 2021

Artificial neural networks (ANN) are the most commonly used algorithms for image classification problems. An image classifier takes an image or video as input and classifies it into one of the possible categories that it was trained to identify. They are applied in various areas such as security, defense, healthcare, biology, forensics, communication, etc. There is no need to create one’s own ANN because there are several pre-trained...

Full text available to download

Video content analysis in the urban area telemonitoring system

Publication

- Year 2010

The task of constant monitoring of video streams from a large number of cameras and reviewing the recordings in order to find a specified event requires a considerable amount of time and effort from the system operators and it is prone to errors. A solution to this problem is an automatic system for constant analysis of camera images being able to raise an alarm if a predefined event is detected. The chapter presents various aspects...

Full text to download in external service

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
B. Kostek

- SPEECH COMMUNICATION - Year 2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Full text available to download

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Publication

G. Korvel
K. Kąkol
O. Kurasova
B. Kostek

- IEEE Access - Year 2020

The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

Full text available to download

Visual Features for Endoscopic Bleeding Detection

Publication

A. Brzeski

- Current Journal of Applied Science and Technology (British Journal of Applied Science & Technology) - Year 2014

Aims: To define a set of high-level visual features of endoscopic bleeding and evaluate their capabilities for potential use in automatic bleeding detection. Study Design: Experimental study. Place and Duration of Study: Department of Computer Architecture, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, between March 2014 and May 2014. Methodology: The features have...

Full text available to download

Using Convolutional Neural Networks for Corneal Arcus Detection Towards Familial Hypercholesterolemia Screening

Publication

T. Kocejko
J. Rumiński
M. Mazur-Milecka
M. Romanowska-Kocejko
K. Chlebus
J. Kang-Hyun

- Journal of King Saud University-Computer and Information Sciences - Year 2022

Familial hypercholesterolemia (FH) is a highly undiagnosed disease. Among FH patients, the onset of premature coronary artery disease is 13 times higher than in the general population. Early diagnosis and treatment is essential to prevent cardiovascular diseases and their complications, and to prolong life. One of the clinical criteria of FH is the occurrence of a corneal arcus (CA) among patients, especially those under 45 years...

Full text available to download

Speech Intelligibility Measurements in Auditorium

Publication

K. Leo

- ACTA PHYSICA POLONICA A - Year 2010

Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

Full text available to download

DIAGNOSIS OF MALIGNANT MELANOMA BY NEURAL NETWORK ENSEMBLE-BASED SYSTEM UTILISING HAND-CRAFTED SKIN LESION FEATURES

Publication

- Metrology and Measurement Systems - Year 2019

Malignant melanomas are the most deadly type of skin cancer but detected early have high chances for successful treatment. In the last twenty years, the interest of automated melanoma recognition detection and classification dynamically increased partially because of public datasets appearing with dermatoscopic images of skin lesions. Automated computer-aided skin cancer detection in dermatoscopic images is a very challenging task...

Full text available to download

Transient detection for speech coding applications

Publication

- International Journal of Computer Science and Network Security - Year 2006

Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

Full text to download in external service

Improving the quality of speech in the conditions of noise and interference

Publication

B. Kostek
K. Kąkol

- Journal of the Acoustical Society of America - Year 2018

The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

Full text to download in external service

Constructing a Dataset of Speech Recordingswith Lombard Effect

Publication

D. Weber
S. Zaporowski
D. Korzekwa

- Year 2020

Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Improved method for real-time speech stretching

Publication

- Year 2012

n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

Full text to download in external service

The Impact of Weather on Traffic Speed in Urban Area

Publication

J. Chmielewski
M. Budzyński

- IOP Conference Series: Materials Science and Engineering - Year 2019

The issue of the impact of weather conditions on trip speed of vehicles has been studied for a long time and it is still the subject of many scientific researches. The impact of atmospheric conditions on the speed with which drivers drive their vehicles seems to be obvious. Good weather conditions, sunny weather with good visibility surely provokes higher speed while rainfall, wind...

Full text available to download

Real-time speech-rate modification experiments

Publication

- Year 2010

An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

Full text to download in external service

Improving Objective Speech Quality Indicators in Noise Conditions

Publication

K. Kąkol
G. Korvel
B. Kostek

- Year 2020

This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

Full text to download in external service

Detecting Lombard Speech Using Deep Learning Approach

Publication

K. Kąkol
G. Korvel
G. Tamulevicius
B. Kostek

- SENSORS - Year 2023

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

Full text available to download

Speech synthesis controlled by eye gazing

Publication

- Year 2010

A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

Publication

- SENSORS - Year 2021

The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

Full text available to download

Time-domain prosodic modifications for text-to-speech synthesizer

Publication

- Year 2010

An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

A Method of Real-Time Non-uniform Speech Stretching

Publication

- Year 2012

Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

Full text to download in external service

STANY NIEUSTALONE TOWARZYSZĄCE POMIAROWI IMPEDANCJI PĘTLI ZWARCIA W OBWODACH WYJŚCIOWYCH ZASILACZY BEZPRZERWOWYCH UPS

Publication

M. Olesz
J. Katarzyński

- Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej - Year 2018

W pracy przedstawiono metodykę i wyniki pomiarów stanów nieustalonych w zasilaczu bezprzerwowym (UPS) typu on - line. Do rejestracji zdarzeń po stronie zasilania i na wyjściu UPS wykorzystano dwa przyrządy do pomiaru jakości energii elektrycznej zsynchronizowane czasowo. Rejestratory kompresują uzyskane dane pomiarowe, co może wprowadzać dodatkowe błędy pomiaru wielkości mierzonych – napięć i prądów. Dodatkowe rejestracje oscyloskopem...

Full text available to download

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Publication

D. Korzekwa
R. Barra-Chicote
B. Kostek
T. Drugman
M. Łajszczak

- Year 2019

We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

Full text available to download

Comparison of various speech time-scale modificartion methods

Publication

- Archives of Acoustics - Year 2011

The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

Speech codec enhancements utilizing time compression and perceptual coding

Publication

- Year 2007

A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

Search

Filters

Catalog

Search results for: AUTOMATIC SPEECH RECOGNITION