Search results for: text-to-speech transcription - Bridge of Knowledge

Search

Search results for: text-to-speech transcription

Search results for: text-to-speech transcription

  • Speech synthesis controlled by eye gazing

    Publication

    - Year 2010

    A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

  • Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

    Publication

    The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

    Full text available to download

  • A Method of Real-Time Non-uniform Speech Stretching

    Publication

    Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

    Full text to download in external service

  • Text

    Journals

    eISSN: 1327-9556

  • DNAffinity: a machine-learning approach to predict DNA binding affinities of transcription factors

    Publication
    • S. Barissi
    • A. Sala
    • M. Wieczór
    • F. Battistini
    • M. Orozco

    - NUCLEIC ACIDS RESEARCH - Year 2022

    We present a physics-based machine learning approach to predict in vitro transcription factor binding affinities from structural and mechanical DNA properties directly derived from atomistic molecular dynamics simulations. The method is able to predict affinities obtained with techniques as different as uPBM, gcPBM and HT-SELEX with an excellent performance, much better than existing algorithms. Due to its nature, the method can...

    Full text available to download

  • Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Publication
    • D. Korzekwa
    • R. Barra-Chicote
    • B. Kostek
    • T. Drugman
    • M. Łajszczak

    - Year 2019

    We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

    Full text available to download

  • Examining Influence of Distance to Microphone on Accuracy of Speech Recognition

    Publication

    The problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...

    Full text to download in external service

  • Comparison of various speech time-scale modificartion methods

    The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

  • Speech codec enhancements utilizing time compression and perceptual coding

    Publication

    A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

  • Tensor Decomposition for Imagined Speech Discrimination in EEG

    Publication

    - LECTURE NOTES IN COMPUTER SCIENCE - Year 2018

    Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

    Full text to download in external service

  • Hanow - Praecepta de arte disputandi - transcription and photographs

    Open Research Data
    version 1.1 open access

    Praecepta de arte disputandi by Enlightenment Gdańsk scholar Michael Christoph Hanow (1695-1773) are a combination of rhetorical theory and practical tips on how to effectively conduct discussions.   

  • Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

    Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

    Full text available to download

  • Multimodal English corpus for automatic speech recognition

    A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

  • An audio-visual corpus for multimodal automatic speech recognition

    review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

    Full text available to download

  • Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network

    To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

    Full text available to download

  • Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

    Publication

    - Year 2022

    The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

    Full text available to download

  • Two Stage SVM and kNN Text Documents Classifier

    Publication

    - Year 2015

    The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...

  • Description of the Dataset Rhetoric at School – a Selection of the Syllabi from the Academic Gymnasium in Gdańsk – Transcription and Photographs

    Publication

    - Year 2022

    The research dataset described in the article was based on photographs and transcription of a textual record from Latin syllabi for classes at the Gdańsk Academic Gymnasium. The syllabi concern the years 1645/1648/1652/1653. The original document is held in the collection of the Gdańsk Library of the Polish Academy of Sciences [reference number: Ma 3920 8o]. The collected research material can be used for studying the practical...

    Full text available to download

  • The Method of a Two-Level Text-Meaning Similarity Approximation of the Customers’ Opinions

    The method of two-level text-meaning similarity approximation, consisting in the implementation of the classification of the stages of text opinions of customers and identifying their rank quality level was developed. Proposed and proved the significance of major hypotheses, put as the basis of the developed methodology, notably about the significance of suggestions about the existence of analogies between mathematical bases of...

    Full text available to download

  • Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

    Publication

    The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

  • Ranking Speech Features for Their Usage in Singing Emotion Classification

    Publication

    - Year 2020

    This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

    Full text available to download

  • System Supporting Speech Perception in Special Educational Needs Schoolchildren

    Publication

    - Year 2012

    The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

    Full text to download in external service

  • High quality speech codec employing sines+noise+transients model

    A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

    Full text available to download

  • Silence/noise detection for speech and music signals

    Publication

    - Year 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • Thresholding Strategies for Large Scale Multi-Label Text Classifier

    Publication

    This article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classification tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classifier on medium scale text corpora extracted from Wikipedia. Obtained results show that the...

    Full text to download in external service

  • Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

    Publication

    - Year 2018

    The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

  • Text-mining Similarity Approximation Operators for Opinion Mining in BI tools

    Publication

    - Year 2016

    The concept of the Text-mining Similarity Approximation Operators for Opinion Mining as extensions to Natural Language Interface Database is defined. The new operators: “keywords of” dimension; subsetting operator “about C is q”; aggregation operator “by similar C” are proposed. These operators are based on the Latent Semantic Analysis and Social Network Analysis

    Full text available to download

  • An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

    Publication

    - Year 2019

    The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

    Full text available to download

  • Corrupted speech intelligibility improvement using adaptive filter based algorithm

    Publication

    A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

  • Distortion of speech signals in the listening area: its mechanism and measurements

    Publication

    - Year 2014

    The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

    Full text to download in external service

  • A non-uniform real-time speech time-scale stretching method

    Publication

    An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

  • What matters most to patients? On the Core Determinants of Patient Experience from Free Text Feedback

    Publication

    - Year 2021

    Free-text feedback from patients is increasingly used for improving the quality of healthcare services and systems. A major reason for the growing interest in harnessing free-text feedback is the belief that it provides richer information about what patients want and care about. The use of computational approaches such as structural topic modelling for analysing large unstructured textual data such as free-text feedback from patients...

    Full text available to download

  • Emotions in polish speech recordings

    Open Research Data
    open access

    The data set presents emotions recorded in sound files that are expressions of Polish speech. Statements were made by people aged 21-23, young voices of 5 men. Each person said the following words / nie – no, oddaj - give back, podaj – pass, stop - stop, tak - yes, trzymaj -hold / five times representing a specific emotion - one of three - anger (a),...

  • A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

    Publication
    • G. Tamulevicius
    • G. Korvel
    • A. B. Yayak
    • P. Treigys
    • J. Bernataviciene
    • B. Kostek

    - Electronics - Year 2020

    In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

    Full text available to download

  • Nina Rizun dr

    Nina Rizun is an assistant professor at the Faculty of Management and Economics at the Gdańsk University of Technology. In October 1999 she obtained a PhD degree in technical sciences in the Faculty of Enterprise Economy and Production Organization, National Mining Academy, Dnipropetrovsk, Ukraine. PhD thesis title: Development of Complex Subsystem of the Organization and Planning of Mining and Transport Processes. In the years...

  • Communication Platform for Evaluation of Transmitted Speech Quality

    A voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...

    Full text available to download

  • Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary

    Publication

    - Year 2018

    This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...

    Full text available to download

  • Results of tests on speech intelligibility in reverberant conditions

    Open Research Data

    The dataset contains the results of tests that aimed to provide a relationship between the rate of speech (RoS) and reverberation conditions characterized by the Speech Transmission Index (STI).

  • Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

    In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

  • Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

    Publication

    - Year 2007

    In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

  • Automated detection of pronunciation errors in non-native English speech employing deep learning

    Publication

    - Year 2023

    Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

    Full text available to download

  • Exploring the Usability and User Experience of Social Media Apps through a Text Mining Approach

    This study aims to evaluate the applicability of a text mining approach for extracting UUX-related issues from a dataset of user comments and not to evaluate the Instagram (IG) app. This study analyses textual data mined from reviews in English written by IG mobile application users. The article’s authors used text mining (based on the LDA algorithm) to identify the main UUX-related topics. Next, they mapped the identified topics...

    Full text available to download

  • Application of Text Analytics in Public Service Co-Creation: Literature Review and Research Framework

    Publication

    - Year 2023

    The public sector faces several challenges, such as a number of external and internal demands for change, citizens' dissatisfaction and frustration with public sector organizations, that need to be addressed. An alternative to the traditional top-down development of public services is co-creation of public services. Co-creation promotes collaboration between stakeholders with the aim to create better public services and achieve...

    Full text available to download

  • A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

    Publication

    Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...

    Full text available to download

  • Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

    Publication
    • D. Korzekwa
    • J. Lorenzo-trueba
    • T. Drugman
    • S. Calamaro
    • B. Kostek

    - Year 2021

    We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

    Full text available to download

  • Text (new tilte Text and Talk)

    Journals

    ISSN: 0165-4888

  • Text classifiers for automatic articles categorization

    Publication

    The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

  • Anna Baj-Rogowska dr

      Anna Baj-Rogowska is employed as an assistant professor at the Department of Informatics in Management at the Faculty of Management and Economics, Gdańsk University of Technology. Her higher education is connected with the University of Gdańsk, where she graduated from a master's degree in business informatics, doctoral studies and then obtained a PhD degree in economics in management science (Department of Business Informatics...

  • Visual Lip Contour Detection for the Purpose of Speech Recognition

    Publication

    A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...

  • Third Text

    Journals

    ISSN: 0952-8822 , eISSN: 1475-5297