Wyniki wyszukiwania dla: speech-to-text technology

Comparison of various speech time-scale modificartion methods

Publikacja

- Archives of Acoustics - Rok 2011

The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

Speech codec enhancements utilizing time compression and perceptual coding

Publikacja

- Rok 2007

A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

Tensor Decomposition for Imagined Speech Discrimination in EEG

Publikacja

J. S. Garcia Salinas
L. Villaseñor-Pineda
C. A. Reyes-Garćia
A. A. Torres-García

- LECTURE NOTES IN COMPUTER SCIENCE - Rok 2018

Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

Pełny tekst do pobrania w serwisie zewnętrznym

Multimodal English corpus for automatic speech recognition

Publikacja

- Rok 2013

A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

Publikacja

- Diagnostic Pathology - Rok 2012

Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

Pełny tekst do pobrania w portalu

An audio-visual corpus for multimodal automatic speech recognition

Publikacja

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Rok 2017

review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Pełny tekst do pobrania w portalu

Machine Learning and Text Analysis in an Artificial Intelligent System for the Training of Air Traffic Controllers

Publikacja

T. Shmelova
Y. Sikirda
N. Rizun
V. Lazorenko
V. Kharchenko

- Rok 2020

This chapter presents the application of new information technology in education for the training of air traffic controllers (ATCs). Machine learning, multi-criteria decision analysis, and text analysis as the methods of artificial intelligence for ATCs training have been described. The authors have made an analysis of the International Civil Aviation Organization documents for modern principles of ATCs education. The prototype...

Pełny tekst do pobrania w portalu

Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network

Publikacja

- Applied Sciences-Basel - Rok 2021

To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

Pełny tekst do pobrania w portalu

Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

Publikacja

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Rok 2022

The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

Pełny tekst do pobrania w portalu

Decoding imagined speech for EEG-based BCI

Publikacja

C. A. Reyes-García
A. A. Torres-García
T. Hernández-del-Toro
J. S. Garcia Salinas
L. Villaseñor-Pineda

- Rok 2024

Brain–computer interfaces (BCIs) are systems that transform the brain's electrical activity into commands to control a device. To create a BCI, it is necessary to establish the relationship between a certain stimulus, internal or external, and the brain activity it provokes. A common approach in BCIs is motor imagery, which involves imagining limb movement. Unfortunately, this approach allows few commands. As an alternative, this...

Pełny tekst do pobrania w serwisie zewnętrznym

Two Stage SVM and kNN Text Documents Classifier

Publikacja

- Rok 2015

The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...

The Method of a Two-Level Text-Meaning Similarity Approximation of the Customers’ Opinions

Publikacja

N. Rizun
P. Kapłański
Y. Taranenko

- Studia Ekonomiczne. Zeszyty Naukowe Uniwersytetu Ekonomicznego w Katowicach - Rok 2016

The method of two-level text-meaning similarity approximation, consisting in the implementation of the classification of the stages of text opinions of customers and identifying their rank quality level was developed. Proposed and proved the significance of major hypotheses, put as the basis of the developed methodology, notably about the significance of suggestions about the existence of analogies between mathematical bases of...

Pełny tekst do pobrania w portalu

Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

Publikacja

- Rok 2016

The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

Ranking Speech Features for Their Usage in Singing Emotion Classification

Publikacja

- Rok 2020

This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

Pełny tekst do pobrania w portalu

System Supporting Speech Perception in Special Educational Needs Schoolchildren

Publikacja

- Rok 2012

The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

Pełny tekst do pobrania w serwisie zewnętrznym

Silence/noise detection for speech and music signals

Publikacja

M. Papaj

- Rok 2008

This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

High quality speech codec employing sines+noise+transients model

Publikacja

- Archives of Acoustics - Rok 2006

A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

Pełny tekst do pobrania w portalu

Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

Publikacja

K. Kąkol
G. Korvel
B. Kostek

- Rok 2018

The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

Thresholding Strategies for Large Scale Multi-Label Text Classifier

Publikacja

- Rok 2013

This article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classiﬁcation tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classiﬁer on medium scale text corpora extracted from Wikipedia. Obtained results show that the...

Pełny tekst do pobrania w serwisie zewnętrznym

Text-mining Similarity Approximation Operators for Opinion Mining in BI tools

Publikacja

N. Rizun
P. Kapłański
Y. Taranenko
S. Alessandro

- Rok 2016

The concept of the Text-mining Similarity Approximation Operators for Opinion Mining as extensions to Natural Language Interface Database is defined. The new operators: “keywords of” dimension; subsetting operator “about C is q”; aggregation operator “by similar C” are proposed. These operators are based on the Latent Semantic Analysis and Social Network Analysis

Pełny tekst do pobrania w portalu

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

Publikacja

G. Korvel
O. Kurasova
B. Kostek

- Rok 2019

The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

Pełny tekst do pobrania w portalu

Corrupted speech intelligibility improvement using adaptive filter based algorithm

Publikacja

- Rok 2010

A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

Distortion of speech signals in the listening area: its mechanism and measurements

Publikacja

- Rok 2014

The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

Pełny tekst do pobrania w serwisie zewnętrznym

Database of speech and facial expressions recorded with optimized face motion capture settings

Publikacja

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Rok 2019

The broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...

Pełny tekst do pobrania w portalu

A non-uniform real-time speech time-scale stretching method

Publikacja

- Rok 2011

An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

Building Knowledge for the Purpose of Lip Speech Identification

Publikacja

- Advances in Intelligent Systems and Computing - Rok 2017

Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

Pełny tekst do pobrania w serwisie zewnętrznym

What matters most to patients? On the Core Determinants of Patient Experience from Free Text Feedback

Publikacja

- Rok 2021

Free-text feedback from patients is increasingly used for improving the quality of healthcare services and systems. A major reason for the growing interest in harnessing free-text feedback is the belief that it provides richer information about what patients want and care about. The use of computational approaches such as structural topic modelling for analysing large unstructured textual data such as free-text feedback from patients...

Pełny tekst do pobrania w portalu

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Publikacja

G. Tamulevicius
G. Korvel
A. B. Yayak
P. Treigys
J. Bernataviciene
B. Kostek

- Electronics - Rok 2020

In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

Pełny tekst do pobrania w portalu

Prof. Haitham Abu-Rub - A Visit to Poland's Gdansk University of Technology

Publikacja

J. Guziński

- IEEE Industrial Electronics Magazine - Rok 2015

Report on visit of Prof. Haitham Abu-Rub in Gdansk University of Technology. Speech on the Smart Grid Centre. Visit in the new smart grid laboratory of the GUT, the Laboratory for Innovative Power Technologies and Integration of Renewable Energy Sources (LINTE^2).

Pełny tekst do pobrania w portalu

Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System

Publikacja

P. Falkowski-Gilski
G. Debita
M. Habrych
B. Miedziński
P. Jedlikowski
B. Polnik
J. Wandzio
X. Wang

- Rok 2020

The broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...

Pełny tekst do pobrania w serwisie zewnętrznym

Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary

Publikacja

- Rok 2018

This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...

Pełny tekst do pobrania w portalu

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publikacja

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2008

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publikacja

- Rok 2007

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Automated detection of pronunciation errors in non-native English speech employing deep learning

Publikacja

D. Korzekwa

- Rok 2023

Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

Pełny tekst do pobrania w portalu

Application of Text Analytics in Public Service Co-Creation: Literature Review and Research Framework

Publikacja

N. Rizun
A. Revina
N. Edelmann

- Rok 2023

The public sector faces several challenges, such as a number of external and internal demands for change, citizens' dissatisfaction and frustration with public sector organizations, that need to be addressed. An alternative to the traditional top-down development of public services is co-creation of public services. Co-creation promotes collaboration between stakeholders with the aim to create better public services and achieve...

Pełny tekst do pobrania w portalu

Exploring the Usability and User Experience of Social Media Apps through a Text Mining Approach

Publikacja

- Engineering Management in Production and Services - Rok 2023

This study aims to evaluate the applicability of a text mining approach for extracting UUX-related issues from a dataset of user comments and not to evaluate the Instagram (IG) app. This study analyses textual data mined from reviews in English written by IG mobile application users. The article’s authors used text mining (based on the LDA algorithm) to identify the main UUX-related topics. Next, they mapped the identified topics...

Pełny tekst do pobrania w portalu

A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

Publikacja

- SENSORS - Rok 2022

Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...

Pełny tekst do pobrania w portalu

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

Publikacja

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
S. Calamaro
B. Kostek

- Rok 2021

We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

Pełny tekst do pobrania w portalu

Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine

Publikacja

P. Falkowski-Gilski
G. Debita

- Archives of Acoustics - Rok 2023

In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

Pełny tekst do pobrania w portalu

Text classifiers for automatic articles categorization

Publikacja

- Rok 2012

The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

Visual Lip Contour Detection for the Purpose of Speech Recognition

Publikacja

- Rok 2014

A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...

Objectivization of phonological evaluation of speech elements by means of audio parametrization

Publikacja

- Rok 2018

This study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...

Knowledge and technology transfer in the Center for Scientific and Technical Information of the Wrocław University of Technology

Publikacja

A. Wałek
K. Kozłowska

- Rok 2015

Knowledge and Technology Transfer between a university and economic operators affects innovation and growth of competitiveness, as well as the development of a knowledge-based society. In the structures of Wroclaw University of Technology, regarded as one of the best and the most innovative technical universities in Poland, a number of units responsible for a wide understanding cooperation with the economy have been established....

Pełny tekst do pobrania w portalu

Text Mining Algorithms for Extracting Brand Knowledge; The fashion Industry Case

Publikacja

- Rok 2018

Brand knowledge is determined by customer knowledge. The opportunity to develop brands based on customer knowledge management has never been greater. Social media as a set of leading communication platforms enable peer to peer interplays between customers and brands. A large stream of such interactions is a great source of information which, when thoroughly analyzed, can become a source of innovation and lead to competitive advantage....

Pełny tekst do pobrania w portalu

Elimination of clicks from archive speech signals using sparse autoregressive modeling

Publikacja

- Rok 2012

This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

Pełny tekst do pobrania w serwisie zewnętrznym

A Text as a Set of Research Data. A Number of Aspects of Data Acquisition and Creation of Datasets in Neo-Latin Studies

Publikacja

- Rok 2022

In this paper, the authors, who specialise in part in neo-Latin studies and the his-tory of early modern education, share their experiences of collecting sources for Open Research Data sets under the Bridge of Data project. On the basis of inscription texts from St. Mary’s Church in Gdańsk, they created 29 Open Research Data sets. In turn, the text of the lectures of the Gdańsk scholar Michael Christoph Hanow, Praecepta de arte...

Pełny tekst do pobrania w portalu

Pozwolić odejść, czyli... Granice interwencji

Publikacja

G. Bukal

- Ochrona Dziedzictwa Kulturowego - Rok 2021

Celem tekstu jest zwrócenie uwagi na problem ostatecznej granicy interwencji konserwatorskich wobec zabytków architektury w kontekście holistycznej koncepcji ochrony dziedzictwa. Tekst adresowany jest do polskich czytelników ze względu na szczególne odniesienia do sytuacji polskiego systemu ochrony dziedzictwa. Choć architektura należy do sfery sztuk użytkowych, większość budynków nie powstawała...

Pełny tekst do pobrania w portalu

Endoscopic Videos Deinterlacing and On-Screen Text and Light Flashes Removal and Its Influence on Image Analysis Algorithms' Efficiency

Publikacja

- International Journal of Image Processing and Visual Communication - Rok 2013

In this article, deinterlacing and removing on- screen text and light flashes methods on endoscopic video images are discussed. The research is intended to improve disease recognition algorithms' performance. In the article, four configurations of deinterlacing methods and another four configurations of text and flashes removal methods are described and examined. The efficiency of endoscopic video analysis algorithms is measured...

Pełny tekst do pobrania w serwisie zewnętrznym

Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems

Publikacja

- Pomiary Automatyka Robotyka - Rok 2013

The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

Pełny tekst do pobrania w portalu

Human-computer interactions in speech therapy using a blowing interface

Publikacja

- Rok 2014

In this paper we present a new human-computer interface for the quantitative measurement of blowing activities. The interface can measure the air flow and air pressure during the blowing activity. The measured values are stored and used to control the state of the graphical objects in the graphical user interface. In speech therapy children will find easier to play attractive therapeutic games than to perform repetitive and tedious,...

Pełny tekst do pobrania w serwisie zewnętrznym

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: speech-to-text technology