Filters
total: 6802
filtered: 3597
-
Catalog
- Publications 3597 available results
- Journals 850 available results
- Conferences 80 available results
- Publishing Houses 1 available results
- People 968 available results
- Projects 76 available results
- Laboratories 1 available results
- Research Equipment 1 available results
- e-Learning Courses 414 available results
- Events 17 available results
- Open Research Data 797 available results
Chosen catalog filters
displaying 1000 best results Help
Search results for: speech-to-text technology
-
A Method of Real-Time Non-uniform Speech Stretching
PublicationDeveloped method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...
-
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech
PublicationWe present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...
-
Examining Influence of Distance to Microphone on Accuracy of Speech Recognition
PublicationThe problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...
-
Comparison of various speech time-scale modificartion methods
PublicationThe objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...
-
Speech codec enhancements utilizing time compression and perceptual coding
PublicationA method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...
-
Tensor Decomposition for Imagined Speech Discrimination in EEG
PublicationMost of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...
-
Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit
PublicationMethods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...
-
Multimodal English corpus for automatic speech recognition
PublicationA multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...
-
An audio-visual corpus for multimodal automatic speech recognition
Publicationreview of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...
-
Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically
PublicationThe aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...
-
Decoding imagined speech for EEG-based BCI
PublicationBrain–computer interfaces (BCIs) are systems that transform the brain's electrical activity into commands to control a device. To create a BCI, it is necessary to establish the relationship between a certain stimulus, internal or external, and the brain activity it provokes. A common approach in BCIs is motor imagery, which involves imagining limb movement. Unfortunately, this approach allows few commands. As an alternative, this...
-
Machine Learning and Text Analysis in an Artificial Intelligent System for the Training of Air Traffic Controllers
PublicationThis chapter presents the application of new information technology in education for the training of air traffic controllers (ATCs). Machine learning, multi-criteria decision analysis, and text analysis as the methods of artificial intelligence for ATCs training have been described. The authors have made an analysis of the International Civil Aviation Organization documents for modern principles of ATCs education. The prototype...
-
Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network
PublicationTo effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...
-
Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition
PublicationThe problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...
-
Ranking Speech Features for Their Usage in Singing Emotion Classification
PublicationThis paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...
-
System Supporting Speech Perception in Special Educational Needs Schoolchildren
PublicationThe system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.
-
High quality speech codec employing sines+noise+transients model
PublicationA method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...
-
Two Stage SVM and kNN Text Documents Classifier
PublicationThe paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...
-
Silence/noise detection for speech and music signals
PublicationThis paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.
-
Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions
PublicationThe aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...
-
The Method of a Two-Level Text-Meaning Similarity Approximation of the Customers’ Opinions
PublicationThe method of two-level text-meaning similarity approximation, consisting in the implementation of the classification of the stages of text opinions of customers and identifying their rank quality level was developed. Proposed and proved the significance of major hypotheses, put as the basis of the developed methodology, notably about the significance of suggestions about the existence of analogies between mathematical bases of...
-
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
PublicationThe speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...
-
Thresholding Strategies for Large Scale Multi-Label Text Classifier
PublicationThis article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classification tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classifier on medium scale text corpora extracted from Wikipedia. Obtained results show that the...
-
Corrupted speech intelligibility improvement using adaptive filter based algorithm
PublicationA technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.
-
Text-mining Similarity Approximation Operators for Opinion Mining in BI tools
PublicationThe concept of the Text-mining Similarity Approximation Operators for Opinion Mining as extensions to Natural Language Interface Database is defined. The new operators: “keywords of” dimension; subsetting operator “about C is q”; aggregation operator “by similar C” are proposed. These operators are based on the Latent Semantic Analysis and Social Network Analysis
-
Distortion of speech signals in the listening area: its mechanism and measurements
PublicationThe paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...
-
Database of speech and facial expressions recorded with optimized face motion capture settings
PublicationThe broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...
-
A non-uniform real-time speech time-scale stretching method
PublicationAn algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...
-
Building Knowledge for the Purpose of Lip Speech Identification
PublicationConsecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...
-
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
PublicationIn this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...
-
What matters most to patients? On the Core Determinants of Patient Experience from Free Text Feedback
PublicationFree-text feedback from patients is increasingly used for improving the quality of healthcare services and systems. A major reason for the growing interest in harnessing free-text feedback is the belief that it provides richer information about what patients want and care about. The use of computational approaches such as structural topic modelling for analysing large unstructured textual data such as free-text feedback from patients...
-
Prof. Haitham Abu-Rub - A Visit to Poland's Gdansk University of Technology
PublicationReport on visit of Prof. Haitham Abu-Rub in Gdansk University of Technology. Speech on the Smart Grid Centre. Visit in the new smart grid laboratory of the GUT, the Laboratory for Innovative Power Technologies and Integration of Renewable Energy Sources (LINTE^2).
-
Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System
PublicationThe broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...
-
Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency
PublicationIn this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.
-
Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency
PublicationIn this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.
-
Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary
PublicationThis paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...
-
Automated detection of pronunciation errors in non-native English speech employing deep learning
PublicationDespite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...
-
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
PublicationObjective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...
-
Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech
PublicationWe propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...
-
Exploring the Usability and User Experience of Social Media Apps through a Text Mining Approach
PublicationThis study aims to evaluate the applicability of a text mining approach for extracting UUX-related issues from a dataset of user comments and not to evaluate the Instagram (IG) app. This study analyses textual data mined from reviews in English written by IG mobile application users. The article’s authors used text mining (based on the LDA algorithm) to identify the main UUX-related topics. Next, they mapped the identified topics...
-
Application of Text Analytics in Public Service Co-Creation: Literature Review and Research Framework
PublicationThe public sector faces several challenges, such as a number of external and internal demands for change, citizens' dissatisfaction and frustration with public sector organizations, that need to be addressed. An alternative to the traditional top-down development of public services is co-creation of public services. Co-creation promotes collaboration between stakeholders with the aim to create better public services and achieve...
-
Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
PublicationIn order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...
-
Visual Lip Contour Detection for the Purpose of Speech Recognition
PublicationA method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...
-
Objectivization of phonological evaluation of speech elements by means of audio parametrization
PublicationThis study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...
-
Knowledge and technology transfer in the Center for Scientific and Technical Information of the Wrocław University of Technology
PublicationKnowledge and Technology Transfer between a university and economic operators affects innovation and growth of competitiveness, as well as the development of a knowledge-based society. In the structures of Wroclaw University of Technology, regarded as one of the best and the most innovative technical universities in Poland, a number of units responsible for a wide understanding cooperation with the economy have been established....
-
Text classifiers for automatic articles categorization
PublicationThe article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.
-
Elimination of clicks from archive speech signals using sparse autoregressive modeling
PublicationThis paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...
-
Pozwolić odejść, czyli... Granice interwencji
PublicationCelem tekstu jest zwrócenie uwagi na problem ostatecznej granicy interwencji konserwatorskich wobec zabytków architektury w kontekście holistycznej koncepcji ochrony dziedzictwa. Tekst adresowany jest do polskich czytelników ze względu na szczególne odniesienia do sytuacji polskiego systemu ochrony dziedzictwa. Choć architektura należy do sfery sztuk użytkowych, większość budynków nie powstawała...
-
Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems
PublicationThe aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...
-
Human-computer interactions in speech therapy using a blowing interface
PublicationIn this paper we present a new human-computer interface for the quantitative measurement of blowing activities. The interface can measure the air flow and air pressure during the blowing activity. The measured values are stored and used to control the state of the graphical objects in the graphical user interface. In speech therapy children will find easier to play attractive therapeutic games than to perform repetitive and tedious,...