Search results for: text-to-speech transcription

Methodology and technology for the polymodal allophonic speech transcription

Publication

- Journal of the Acoustical Society of America - Year 2016

A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

Full text to download in external service

Methodology and technology for the polymodal allophonic speech transcription

Publication

- Journal of the Acoustical Society of America - Year 2016

A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

Full text to download in external service

Time-domain prosodic modifications for text-to-speech synthesizer

Publication

- Year 2010

An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Publication

- Year 2016

Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Full text to download in external service

Evaluation and Irony in Text in the Light of Speech Act Theory

Publication

K. Kukowicz-Zarska

- Forum Filologiczne Ateneum - Year 2020

Full text to download in external service

Automatic prosodic modification in a Text-To-Speech synthesizer of Polish language

Publication

K. Łopatka
P. Suchomski
A. Czyżewski

- Elektronika : konstrukcje, technologie, zastosowania - Year 2011

Przedstawiono system syntezy mowy polskiej z funkcją automatycznej modyfikacji prozodii wypowiedzi. Opisane zostały metody automatycznego wyznaczania akcentu i intonacji wypowiedzi. Przedstawiono zastosowanie algorytmów przetwarzania sygnału mowy w procesie kształtowania prozodii. Omówiono wpływ zastosowanych modyfikacji na naturalność brzmienia syntezowanego sygnału. Zastosowana metoda oparta jest na algorytmie TD-PSOLA. Opracowany...

SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM

Publication

- Journal of the Acoustical Society of America - Year 2023

The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

Full text available to download

Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

Publication

D. Piotrowski
R. Korzeniowski
A. Falai
S. Cygert
K. Pokora
G. Tinchev
Z. Zhang
K. Yanagisawa

- Year 2023

In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

Full text to download in external service

Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning

Publication

A. Czyżewski

- Journal of the Acoustical Society of America - Year 2023

Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...

Full text available to download

A survey of automatic speech recognition deep models performance for Polish medical terms

Publication

- Year 2023

Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

Full text to download in external service

Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich

Publication

K. Kowalik-Bańczyk

- Year 2015

The article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...

The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish

Publication

S. Zaporowski

- Year 2024

The article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...

Full text available to download

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

Publication

- Journal of the Acoustical Society of America - Year 2018

A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

Full text to download in external service

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
B. Kostek

- SPEECH COMMUNICATION - Year 2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Full text available to download

Virtual keyboard controlled by eye gaze employing speech synthesis

Publication

- Year 2010

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Virtual Keyboard controlled by eye gaze employing speech synthesis

Publication

- Elektronika : konstrukcje, technologie, zastosowania - Year 2011

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Full text to download in external service

Building Knowledge for the Purpose of Lip Speech Identification

Publication

- Advances in Intelligent Systems and Computing - Year 2017

Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

Full text to download in external service

Auditory-visual attention stimulator

Publication

- Year 2013

New approach to lateralization irregularities formation was proposed. The emphasis is put on the relationship between visual and auditory attention stimulation. In this approach hearing is stimulated using time scale modified speech and sight is stimulated by rendering the text of the currently heard speech. Moreover, displayed text is modified using several techniques i.e. zooming, highlighting etc. In the experimental part of...

Full text to download in external service

Marking the Allophones Boundaries Based on the DTW Algorithm

Publication

J. Rafałko

- Year 2018

The paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighborhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border...

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

Publication

D. Korzekwa
R. Barra-Chicote
S. Zaporowski
G. Beringer
J. Lorenzo-trueba
A. Serafinowicz
J. Droppo
T. Drugman
B. Kostek

- Year 2021

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

Full text available to download

Creating new voices using normalizing flows

Publication

P. Biliński
T. Merritt
A. Ezzerg
K. Pokora
S. Cygert
K. Yanagisawa
R. Barra-Chicote
D. Korzekwa

- Year 2022

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Full text available to download

DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING

Publication

- Rocznik Naukowy Wydzialu Zarzadzania w Ciechanowie - Year 2017

The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

Full text available to download

Analysis of allophones based on audio signal recordings and parameterization

Publication

- Journal of the Acoustical Society of America - Year 2017

The aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...

Full text to download in external service

Promocja zasobów Pomorskiej Biblioteki Cyfrowej na przykładzie XVIII-wiecznego rękopisu

Publication

- Z Badań nad Książką i Księgozbiorami Historycznymi - Year 2022

Celem artykułu jest przedstawienie sposobu udostępniania i promocji zbiorów rękopiśmiennych na przykładzie XVIII-wiecznego rękopisu Christiana Gabriela Fishera dostępnego w Pomorskiej Bibliotece Cyfrowej (dalej: PBC). Rękopis ten stał się inspiracją do podjęcia współpracy Biblioteki Politechniki Gdańskiej oraz Instytutu Kultury Miejskiej w Gdańsku. Dzięki wspólnej inicjatywie rozpoczęto prace nad transkrypcją niemieckiego tekstu...

Full text available to download

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Publication

G. Korvel
K. Kąkol
O. Kurasova
B. Kostek

- IEEE Access - Year 2020

The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

Full text available to download

Agile Commerce in the light of Text Mining

Publication

A. Baj-Rogowska

- Przedsiębiorczość i Zarządzanie - Year 2017

The survey conducted for this study reveals that more than 84% of respondents have never encountered the term “agile commerce” and do not understand its meaning. At the same time, they are active participants of this strategy. Using digital channels as customers more often than ever before, they have already been included in the agile philosophy. Based on the above, the purpose of the study is to analyse major text sets containing...

Full text available to download

Prioritising national healthcare service issues from free text feedback – A computational text analysis & predictive modelling approach

Publication

A. Ojo
N. Rizun
G. Walsh
M. I. Mashinchi
M. Venosa
M. N. Rao

- DECISION SUPPORT SYSTEMS - Year 2024

Patient experience surveys have become a key source of evidence for supporting decision-making and continuous quality improvement within healthcare services. To harness free-text feedback collected as part of these surveys for additional insights, text analytics methods are increasingly employed when the data collected is not amenable to traditional qualitative analysis due to volume. However, while text analytics techniques offer...

Full text available to download

Speech Intelligibility Measurements in Auditorium

Publication

K. Leo

- ACTA PHYSICA POLONICA A - Year 2010

Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

Full text available to download

Language Models in Speech Recognition

Publication

J. Daciuk

- Year 2022

This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

Full text to download in external service

Transient detection for speech coding applications

Publication

- International Journal of Computer Science and Network Security - Year 2006

Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

Full text to download in external service

Development and Research of the Text Messages Semantic Clustering Methodology

Publication

N. Rizun
P. Kapłański
Y. Taranenko

- Year 2016

The methodology of semantic clustering analysis of customer’s text-opinions collection is developed. The author's version of the mathematical models of formalization and practical realization of short textual messages semantic clustering procedure is proposed, based on the customer’s text-opinions collection Latent Semantic Analysis knowledge extracting method. An algorithm for semantic clustering of the text-opinions is developed,...

Full text available to download

Generating actionable evidence from free-text feedback to improve maternity and acute hospital experiences: A computational text analytics & predictive modelling approach

Publication

A. Ojo
N. Rizun
M. Isazad Mashinchi
G. Walsh
J. Gruda
M. N. Narayana
M. Venosa
C. Foley
D. Rohde
R. Flynn

- EUROPEAN JOURNAL OF PUBLIC HEALTH - Year 2023

Background Patient experience surveys are a key source of evidence for supporting decision-making and quality improvement in healthcare services. These surveys contain two main types of questions: closed and open-ended, asking about patients’ care experiences. Apart from the knowledge obtained from analysing closed-ended questions, invaluable insights can be gleaned from free-text data. Advanced analytics techniques are increasingly...

Full text to download in external service

Improving the quality of speech in the conditions of noise and interference

Publication

B. Kostek
K. Kąkol

- Journal of the Acoustical Society of America - Year 2018

The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

Full text to download in external service

Text Categorization Improvement via User Interaction

Publication

J. Atroszko
J. Szymański
D. Gil
H. Mora

- Year 2018

In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

Full text to download in external service

Comparative Analysis of Text Representation Methods Using Classification

Publication

J. Szymański

- CYBERNETICS AND SYSTEMS - Year 2014

In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

Full text to download in external service

Applying the Lombard Effect to Speech-in-Noise Communication

Publication

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Electronics - Year 2023

This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...

Full text available to download

Constructing a Dataset of Speech Recordingswith Lombard Effect

Publication

D. Weber
S. Zaporowski
D. Korzekwa

- Year 2020

Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Improved method for real-time speech stretching

Publication

- Year 2012

n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

Full text to download in external service

Semantic Analysis and Text Summarization in Socio-Technical Systems

Publication

N. Rizun

- Year 2018

In this chapter the authors present the results of the development the methodology for increasing the reliability of the functioning of the Socio-Technical System. The existed methods and algorithms for processing unstructured (textual) information were studied. Taking into account noted above strengths and weaknesses of Discriminant and Probabilistic approaches of Latent Semantic Relations analysis in of the summarization projection...

Full text to download in external service

Real-time speech-rate modification experiments

Publication

- Year 2010

An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

Full text to download in external service

Evaluation of Path Based Methods for Conceptual Representation of the Text

Publication

- Year 2014

Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Full text to download in external service

Interactive Information Search in Text Data Collections

Publication

- Year 2013

This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...

Full text to download in external service

Description of the Dataset Hanow – Praecepta de Arte Disputandi – Transcription and Photographs

Publication

J. Pokrzywnicki

- Year 2022

This article briefly characterises the “Hanow – Praecepta de arte disputandi – transcription and photographs” research dataset. The dataset was created based on photographs and transcriptions of the manuscript of the Latin lectures on the rules of effective discussion (the title of the manuscript: Praecepta de arte disputandi) by Michael Chris-toph Hanow (1695–1773), professor of Gdańsk Academic Gymnasium. The original document...

Full text available to download

Improving Objective Speech Quality Indicators in Noise Conditions

Publication

K. Kąkol
G. Korvel
B. Kostek

- Year 2020

This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

Full text to download in external service

Selection of Relevant Features for Text Classification with K-NN

Publication

- Year 2013

In this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated...

Full text to download in external service

Speech Analytics Based on Machine Learning

Publication

- Year 2019

In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

Full text to download in external service

Speech synthesis controlled by eye gazing

Publication

- Year 2010

A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

Detecting Lombard Speech Using Deep Learning Approach

Publication

K. Kąkol
G. Korvel
G. Tamulevicius
B. Kostek

- SENSORS - Year 2023

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

Full text available to download

Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

Publication

- SENSORS - Year 2021

The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

Full text available to download

A Method of Real-Time Non-uniform Speech Stretching

Publication

- Year 2012

Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

Full text to download in external service

Search

Filters

Catalog

Category

Year

Options

Search results for: text-to-speech transcription