Search results for: RECONSTRUCTION OF SPEECH SIGNALS

MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES

Publication

M. Piotrowska
G. Korvel
B. Kostek
T. Ciszewski
A. Czyżewski

- International Journal of Applied Mathematics and Computer Science - Year 2019

Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...

Full text available to download

Experimental and numerical identification of corrosion degradation of ageing structural components

Publication

B. Zima
K. Wołoszyk
Y. Garbatov

- OCEAN ENGINEERING - Year 2022

The study presents experimental and numerical identification of corrosion degradation of thin-walled structural components employing guided wave propagation. The steel structural components are subjected to through-thickness varying corrosion degradation levels. The developed approach using the non-destructive guided wave-propagation quantifies the equivalent average corrosion degradation level by exploring a limited number of...

Full text available to download

Subjective and Objective Comparative Study of DAB+ Broadcast System

Publication

- Archives of Acoustics - Year 2017

Broadcasting services seek to optimize their use of bandwidth in order to maximize user’s quality of experience. They aim to transmit high-quality digital speech and music signals at the lowest bitrate. They intend to offer the best quality under available conditions. Due to bandwidth limitations, audio quality is in conflict with the number of transmitted radio programs. This paper analyzes whether the quality of real-time digital...

Full text available to download

Digital Transformation of Terrestrial Radio: An Analysis of Simulcasted Broadcasts in FM and DAB+ for a Smart and Successful Switchover

Publication

P. Falkowski-Gilski

- Applied Sciences-Basel - Year 2021

The process of digitizing radio is far from over. It is an important interdisciplinary aspect, involving Big Data and AI (Artificial Intelligence) when it comes to classifying and handling content, and an organizational challenge in the Industry 4.0 concept. There exist several methods for delivering audio signals, including terrestrial broadcasting and internet streaming. Among them, the DAB+ (Digital Audio Broadcasting plus)...

Full text available to download

A low complexity double-talk detector based on the signal envelope

Publication

- SIGNAL PROCESSING - Year 2008

A new algorithm for double-talk detection, intended for use in the acoustic echo canceller for voice communication applications, is proposed. The communication system developed by the authors required the use of a double-talk detection algorithm with low complexity and good accuracy. The authors propose an approach to doubletalk detection based on the signal envelopes. For each of three signals: the far-end speech, the microphone...

Full text available to download

Separability Assessment of Selected Types of Vehicle-Associated Noise

Publication

- Advances in Intelligent Systems and Computing - Year 2016

Music Information Retrieval (MIR) area as well as development of speech and environmental information recognition techniques brought various tools in-tended for recognizing low-level features of acoustic signals based on a set of calculated parameters. In this study, the MIRtoolbox MATLAB tool, designed for music parameter extraction, is used to obtain a vector of parameters to check whether they are suitable for separation of...

Full text to download in external service

Automatic Emotion Recognition in Children with Autism: A Systematic Literature Review

Publication

A. Landowska
A. Karpus
T. Zawadzka
B. Robins
D. Erol Barkana
H. Kose
T. Zorcec
N. Cummins

- SENSORS - Year 2022

The automatic emotion recognition domain brings new methods and technologies that might be used to enhance therapy of children with autism. The paper aims at the exploration of methods and tools used to recognize emotions in children. It presents a literature review study that was performed using a systematic approach and PRISMA methodology for reporting quantitative and qualitative results. Diverse observation channels and modalities...

Full text available to download

Theoretical and experimental analysis of guided wave propagation in plate-like structures with sinusoidal thickness variations

Publication

B. Zima
J. Moll

- Archives of Civil and Mechanical Engineering - Year 2023

Guided waves have attracted significant attention for non-destructive testing (NDT) and structural health monitoring (SHM) due to their ability to travel relatively long distances without significant energy loss combined with their sensitivity to even small defects. Therefore, they are commonly used in damage detection and localization applications. The main idea of incorporating guided waves in NDT and SHM is based on processing...

Full text available to download

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
B. Kostek

- SPEECH COMMUNICATION - Year 2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Full text available to download

BPL-PLC Voice Communication System for the Oil and Mining Industry

Publication

G. Debita
P. Falkowski-Gilski
M. Habrych
G. Wiśniewski
B. Miedziński
P. Jedlikowski
A. Waniewska
J. Wandzio
B. Polnik

- ENERGIES - Year 2020

Application of a high-efficiency voice communication systems based on broadband over power line-power line communication (BPL-PLC) technology in medium voltage networks, including hazardous areas (like the oil and mining industry), as a redundant mean of wired communication (apart from traditional fiber optics and electrical wires) can be beneficial. Due to the possibility of utilizing existing electrical infrastructure, it can...

Full text available to download

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Publication

G. Korvel
K. Kąkol
O. Kurasova
B. Kostek

- IEEE Access - Year 2020

The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

Full text available to download

Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning

Publication

A. Czyżewski

- Journal of the Acoustical Society of America - Year 2023

Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...

Full text available to download

Speech Intelligibility Measurements in Auditorium

Publication

K. Leo

- ACTA PHYSICA POLONICA A - Year 2010

Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

Full text available to download

Language Models in Speech Recognition

Publication

J. Daciuk

- Year 2022

This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

Full text to download in external service

Transient detection for speech coding applications

Publication

- International Journal of Computer Science and Network Security - Year 2006

Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

Full text to download in external service

Constructing a Dataset of Speech Recordingswith Lombard Effect

Publication

D. Weber
S. Zaporowski
D. Korzekwa

- Year 2020

Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Real-time speech-rate modification experiments

Publication

- Year 2010

An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

Full text to download in external service

Weighted 2-sections and hypergraph reconstruction

Publication

- THEORETICAL COMPUTER SCIENCE - Year 2022

In the paper we introduce the notion of weighted 2-sections of hypergraphs with integer weights and study the following hypergraph reconstruction problems: (1) Given a weighted graph , is there a hypergraph H such that is its weighted 2-section? (2) Given a weighted 2-section , find a hypergraph H such that is its weighted 2-section. We show that (1) is NP-hard even if G is a complete graph or integer weights w does not exceed...

Full text to download in external service

Speech Analytics Based on Machine Learning

Publication

- Year 2019

In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

Full text to download in external service

Speech synthesis controlled by eye gazing

Publication

- Year 2010

A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

Detecting Lombard Speech Using Deep Learning Approach

Publication

K. Kąkol
G. Korvel
G. Tamulevicius
B. Kostek

- SENSORS - Year 2023

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

Full text available to download

4D Reconstruction and Visualisation of Krakow Fortress

Publication

E. G. Głowienka
K. Michałowska
P. Opaliński
B. Hejmanowska
S. Mikrut
P. Kramarczyk
A. Struś

- Year 2017

The specific aim of the European project named "Cultural Heritage Through Time" (CHT2) and reported in this paper is to fully integrate the fourth dimension (4D) into Cultural Heritage studies for analysing structures and landscapes over time. Krakow-the Fortress City (Poland) is the one of four case studies of the CHT2, which are used for the time varying reconstruction, analysis, visualization, and preservation. The goal of...

Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

Publication

- SENSORS - Year 2021

The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

Full text available to download

Time-domain prosodic modifications for text-to-speech synthesizer

Publication

- Year 2010

An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

Using Physiological Signals for Emotion Recognition

Publication

W. Szwoch

- Year 2013

Recognizing user’s emotions is the promising area of research in a field of human-computer interaction. It is possible to recognize emotions using facial expression, audio signals, body poses, gestures etc. but physiological signals are very useful in this field because they are spontaneous and not controllable. In this paper a problem of using physiological signals for emotion recognition is presented. The kinds of physiological...

Full text to download in external service

A Method of Real-Time Non-uniform Speech Stretching

Publication

- Year 2012

Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

Full text to download in external service

Examining Influence of Distance to Microphone on Accuracy of Speech Recognition

Publication

- Year 2015

The problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...

Full text to download in external service

Multibeam data processing for 3D object shape reconstruction

Publication

- HYDROACOUSTICS - Year 2017

The technology of hydroacoustic scanning offers an efficient and widely-used source of geospatial information regarding underwater environments, providing measurement data which usually have the structure of irregular groups of points known as point clouds. Since this data model has known disadvantages, a different form of representation based on representing surfaces with simple geometric structures, such as edges and facets,...

Full text available to download

Reconstruction of 3D image of corona discharge streamer

Publication

M. Kocik
M. Tański
J. Mizeraczyk
R. Ichiki
S. Kanazawa
J. Dembski

- Year 2010

In this paper, the method of reconstruction of the 3D structure of streamers in DC positive corona discharge in nozzle-to-plate electrode configuration is presented. For reconstructing of 3D image of corona discharge streamer we propose a stereographical method, where streamers are observed from several directions simultaneously. The multi-directional observation enabled to obtain fine positional coordinates of streamers for a...

Full text to download in external service

Comparison of various speech time-scale modificartion methods

Publication

- Archives of Acoustics - Year 2011

The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

Emotion Recognition Using Physiological Signals

Publication

W. Szwoch

- Year 2015

In this paper the problem of emotion recognition using physiological signals is presented. Firstly the problems with acquisition of physiological signals related to specific human emotions are described. It is not a trivial problem to elicit real emotions and to choose stimuli that always, and for all people, elicit the same emotion. Also different kinds of physiological signals for emotion recognition are considered. A set of...

Full text to download in external service

Speech codec enhancements utilizing time compression and perceptual coding

Publication

- Year 2007

A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

Publication

- Diagnostic Pathology - Year 2012

Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

Full text available to download

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Publication

- Year 2016

Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Full text to download in external service

New approach for determining the QoS of MP3-coded voice signals in IP networks

Publication

T. Uhl
S. Paulsen
K. Nowicki

- EURASIP Journal on Audio Speech and Music Processing - Year 2017

Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion that the quality of service (QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines an innovative, simple model that can evaluate the QoS of MP3-coded voice data transported...

Full text available to download

Processing of Hydroacoustic and LiDAR Data for Three-dimensional Surface Reconstruction

Publication

M. Kulawiak

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2018

The technologies of sonar and laser scanning are commonly used for obtaining spatial information about underwater and over ground environments in the form of point clouds. Since this data model has known disadvantages, a more practical solution of visualising such data involves the creation of solid three-dimensional meshes composed of edges and facets. In this paper, several methods for 3D shape reconstruction of data obtained...

An audio-visual corpus for multimodal automatic speech recognition

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017

review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Full text available to download

Laboratory Stand for Wideband Analysis Radiocommunication Signals

Publication

R. Studański
J. Garus
R. Wąs
A. Czapiewska

- Year 2012

A laboratory stand for wideband analysis radiocommunication signals is presented in the paper. The stand is designed for signals acquisition in wide spectrum and research a field of digital signal processing. Procedures used for simultaneous acquiring many frequency channels in selected wide band are described. The method of detection of direct sequence spread spectrum signals (DS SS) which power spectral density is lower than...

Full text available to download

Laboratory stand for wideband analysis radiocommunication signals

Publication

R. Studański
R. Wąs
A. Czapiewska
J. Garus

- Year 2011

A laboratory stand for wideband analysis radiocommunication signals is presented in the paper. The stand is designed for signals acquisition in wide spectrum and research a field of digital signal processing. Procedures used for simultaneous acquiring many frequency channels in selected wide band are described. The method of detection of direct sequence spread spectrum signals (DS SS) which power spectral density is lower than...

Full text to download in external service

Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

Publication

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Year 2022

The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

Full text available to download

3D-Breast System for Determining the Volume of Tissue Needed for Breast Reconstruction

Publication

- Year 2024

3D imaging systems can be used to effectively determine breast volumes for surgical applications. This article presents methods for surface reconstruction and volume determination based on the point cloud created by 3D imaging. Such a system would be used to accurately estimate breast volume in patients classified for breast reconstruction surgery at plastic surgery centers. To develop such a system, various methods of determining...

Full text to download in external service

"3D-Breast System for Determining the Volume of Tissue Needed for Breast Reconstruction"

Publication

- Year 2023

This article presents methods for surface reconstruction and volume determination based on the point cloud created by 3D imaging. Such a system would be used to accurately estimate breast volume in patients classified for breast reconstruction surgery at plastic surgery centers. To develop such a system, various methods of determining volume, based on images from the Intel D435i camera, were tested. In addition, an application...

System Supporting Speech Perception in Special Educational Needs Schoolchildren

Publication

- Year 2012

The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

Full text to download in external service

High quality speech codec employing sines+noise+transients model

Publication

- Archives of Acoustics - Year 2006

A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

Full text available to download

Virtual keyboard controlled by eye gaze employing speech synthesis

Publication

- Year 2010

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Virtual Keyboard controlled by eye gaze employing speech synthesis

Publication

- Elektronika : konstrukcje, technologie, zastosowania - Year 2011

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Full text to download in external service

Employing flowgraphs for forward route reconstruction in video surveillance system

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2014

Pawlak’s flowgraphs were utilized as a base idea and knowledge container for prediction and decision making algorithms applied to experimental video surveillance system. The system is used for tracking people inside buildings in order to obtain information about their appearance and movement. The fields of view of the cameras did not overlap. Therefore, when an object was moving through unsupervised areas, prediction was needed...

Full text available to download

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

Publication

G. Korvel
O. Kurasova
B. Kostek

- Year 2019

The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

Full text available to download

Detection Range of Intercept Sonar for CWFM Signals

Publication

- Archives of Acoustics - Year 2014

Stealth in military sonars applications may be ensured through the use of low power signals making them difficult to intercept by the enemy. In recent years, silent sonar design has been investigated by the Department of Marine Electronic Systems of the Gdansk University of Technology. This article provides an analysis of how an intercept sonar operated by the enemy can detect silent sonar signals. To that end a theoretical intercept...

Full text available to download

Selection of excitation signals for high-impedance spectroscopy

Publication

M. Kowalewski

- Journal of Physics : Conference Series - Year 2013

A method of fast impedance spectroscopy of technical objects with high impedance (|Zx| > 1 GOhm) is evaluated in this paper. An object is excited with a signal generated by a digital-to-analog converter (DAC) located on the U2531A DAQ module. Response signals proportional to current flowing through and voltage across the measured object are sampled by analog-to-digital converters (ADC) in the DAQ module. The object impedance spectrum...

Full text available to download

Search

Filters

Catalog

Category

Year

Options

Search results for: RECONSTRUCTION OF SPEECH SIGNALS