Wyniki wyszukiwania dla: AUTOMATIC SPEECH RECOGNITION, WHISPER, MEDICAL LANGUAGE RECOGNITION, SPEECH PROCESSING - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: AUTOMATIC SPEECH RECOGNITION, WHISPER, MEDICAL LANGUAGE RECOGNITION, SPEECH PROCESSING

Wyniki wyszukiwania dla: AUTOMATIC SPEECH RECOGNITION, WHISPER, MEDICAL LANGUAGE RECOGNITION, SPEECH PROCESSING

  • Krzysztof Goczyła prof. dr hab. inż.

    Krzysztof Goczyła, profesor zwyczajny Politechniki Gdańskiej, informatyk, specjalista z inżynierii oprogramowania, inżynierii wiedzy i baz danych. Ukończył studia wyższe na  Wydziale Elektroniki Politechniki Gdańskiej w 1976 r. jako magister inżynier elektronik w specjalności automatyka. Na Politechnice Gdańskiej pracuje od 1976. Na Wydziale Elektroniki PG w 1982 r. uzyskał doktorat z informatyki, a w 1999 r. habilitację. W 2012...

  • Orken Mamyrbayev Professor

    Osoby

    1.  Education: Higher. In 2001, graduated from the Abay Almaty State University (now Abay Kazakh National Pedagogical University), in the specialty: Computer science and computerization manager. 2.  Academic degree: Ph.D. in the specialty "6D070300-Information systems". The dissertation was defended in 2014 on the topic: "Kazakh soileulerin tanudyn kupmodaldy zhuyesin kuru". Under my supervision, 16 masters, 1 dissertation...

  • Biometria i przetwarzanie mowy 2023

    Kursy Online
    • J. Daciuk

    {mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...

  • Biometria i przetwarzanie mowy 2024

    Kursy Online
    • J. Daciuk

    {mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...

  • Performance Analysis of the OpenCL Environment on Mobile Platforms

    Publikacja

    Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training

    In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

    Pełny tekst do pobrania w portalu

  • Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

    Publikacja

    In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation

    In this work we present a new Bayesian topic model: latent hierarchical Pitman-Yor process allocation (LHPYA), which uses hierarchical Pitman-Yor pr ocess priors for both word and topic distributions, and generalizes a few of the existing topic models, including the latent Dirichlet allocation (LDA), the bi- gram topic model and the hierarchical Pitman-Yor topic model. Using such priors allows for integration of -grams with a topic model,...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing

    In this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...

    Pełny tekst do pobrania w portalu

  • Dynamic Bayesian Networks for Symbolic Polyphonic Pitch Modeling

    Publikacja

    Symbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of an- alyzing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the “horizontal” and the “vertical” pitch struc- ture. These models are formulated as linear or log-linear interpo- lations of up to fi ve sub-models, each of which is...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering

    This paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...

    Pełny tekst do pobrania w portalu

  • Automatic sound recognition for security purposes

    Publikacja

    - Rok 2008

    In the paper an automatic sound recognition system is presented. It forms a part of a bigger security system developed in order to monitor outdoor places for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted in an outdoor place thus a non stationary noise of a significant energy is present in it. In the paper an especially designed algorithm for outdoor noise reduction is presented,...

  • Dependable Integration of Medical Image Recognition Components

    Computer driven medical image recognition may support medical doctors in the diagnosis process, but requires high dependability considering potential consequences of incorrect results. The paper presentsa system that improves dependability of medical image recognition by integration of results from redundant components. The components implement alternative recognition algorithms of diseases in thefield of gastrointestinal endoscopy....

  • Computer-assisted pronunciation training—Speech synthesis is almost all you need

    Publikacja

    - SPEECH COMMUNICATION - Rok 2022

    The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

    Pełny tekst do pobrania w portalu

  • Sign Language Recognition Using Convolution Neural Networks

    Publikacja

    The objective of this work was to provide an app that can automatically recognize hand gestures from the American Sign Language (ASL) on mobile devices. The app employs a model based on Convolutional Neural Network (CNN) for gesture classification. Various CNN architectures and optimization strategies suitable for devices with limited resources were examined. InceptionV3 and VGG-19 models exhibited negligibly higher accuracy than...

    Pełny tekst do pobrania w portalu

  • Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

    Publikacja

    - IEEE Access - Rok 2020

    The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

    Pełny tekst do pobrania w portalu

  • Accelerometer signal pre-processing influence on human activity recognition

    A study of data pre-processing influence on accelerometer-based human activity recognition algorithms is presented. The frequency band used to filter-out the accelerometer signals and the number of accelerometers involved were considered in terms of their influence on the recognition accuracy.

  • System for automatic singing voice recognition

    W artykule przedstawiono system automatycznego rozpoznawania jakości i typu głosu śpiewaczego. Przedstawiono bazę danych oraz zaimplementowane parametry. Algorytmem decyzyjnym jest algorytm sztucznych sieci neuronowych. Wytrenowany system decyzyjny osiąga skuteczność ok. 90% w obydwu kategoriach rozpoznawania. Dodatkowo wykazano przy pomocy metod statystycznych, że wyniki działania systemu automatycznej oceny jakości technicznej...

  • Automatic Singing Voice Recognition EmployingNeural Networks and Rough Sets

    Publikacja

    Celem badań jest automatyczne rozpoznawanie głosów śpiewaczych w kategorii rodzaju i jakości technicznej śpiewu. W artykule opisano stworzoną bazę danych głosów, która zawiera próbki głosu śpiewaków profesjonalnych i amatorskich. W dalszej części opisano parametry zdefiniowane w oparciu o zjawiska biomechaniczne w narządzie głosu podczas śpiewania. W oparciu o stworzone macierze parametrów wytrenowano i porównano automatyczne klasyfikatory...

  • Automatic Classification of Polish Sign Language Words

    In the article we present the approach to automatic recognition of hand gestures using eGlove device. We present the research results of the system for detection and classification of static and dynamic words of Polish language. The results indicate the usage of eGlove allows to gain good recognition quality that additionally can be improved using additional data sources such as RGB cameras.

    Pełny tekst do pobrania w portalu

  • LANGUAGE AND SPEECH

    Czasopisma

    ISSN: 0023-8309 , eISSN: 1756-6053

  • Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

    Publikacja

    The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

    Pełny tekst do pobrania w portalu

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja
    • P. Treigys
    • G. Korvel
    • G. Tamulevicius
    • J. Bernataviciene
    • B. Kostek

    - Rok 2020

    The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Intelligent processing of stuttered speech.

    W artykule zaprezentowano kilka metod analizy i automatycznego zliczania potknięć artykulacyjnych, związanych z jąkaniem się, opartych na wykorzystaniu algorytmów uczących się sztucznych sieci neuronowych i zbiorów przybliżonych.

  • Applying the Lombard Effect to Speech-in-Noise Communication

    Publikacja

    - Electronics - Rok 2023

    This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...

    Pełny tekst do pobrania w portalu

  • Real-time speech-rate modification experiments

    Publikacja

    An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Influence of accelerometer signal pre-processing and classification method on human activity recognition

    A study of data pre-processing influence on accelerometer-based human activity recognition algorithms is presented. The frequency band used to filter-out the accelerometer signals and the number of accelerometers involved were considered in terms of their influence on the recognition accuracy. In the test four methods of classification were used: support vector machine, decision trees, neural network, k-nearest neighbor.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Speech Intelligibility Measurements in Auditorium

    Publikacja

    Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

    Pełny tekst do pobrania w portalu

  • Integration in Multichannel Emotion Recognition

    Publikacja

    - Rok 2018

    The paper concerns integration of results provided by automatic emotion recognition algorithms. It presents both the challenges and the approaches to solve them. Paper shows experimental results of integration. The paper might be of interest to researchers and practitioners who deal with automatic emotion recognition and use more than one solution or multichannel observation.

    Pełny tekst do pobrania w portalu

  • Time-domain prosodic modifications for text-to-speech synthesizer

    Publikacja

    - Rok 2010

    An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

  • Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

    Publikacja

    - Rok 2022

    The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

    Pełny tekst do pobrania w portalu

  • Automatic recognition of therapy progress among children with autism

    Publikacja

    - Scientific Reports - Rok 2017

    The article presents a research study on recognizing therapy progress among children with autism spectrum disorder. The progress is recognized on the basis of behavioural data gathered via five specially designed tablet games. Over 180 distinct parameters are calculated on the basis of raw data delivered via the game flow and tablet sensors - i.e. touch screen, accelerometer and gyroscope. The results obtained confirm the possibility...

    Pełny tekst do pobrania w portalu

  • Automatic singing quality recognition employing artificial neural networks

    Publikacja

    Celem artykułu jest udowodnienie możliwości automatycznej oceny jakości technicznej głosów śpiewaczych. Pokrótce zaprezentowano w nim stworzoną bazę danych głosów śpiewaczych oraz zaimplementowane parametry. Przy pomocy sztucznych sieci neuronowych zaprojektowano system decyzyjny, który oceniono w pięciostopniowej skali jakość techniczną głosu. Przy pomocy metod statystycznych udowodniono, że wyniki generowane przez ten system...

    Pełny tekst do pobrania w portalu

  • Transient detection for speech coding applications

    Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

    Publikacja

    - Rok 2019

    The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

    Pełny tekst do pobrania w portalu

  • Examining Feature Vector for Phoneme Recognition

    Publikacja

    - Rok 2018

    The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

  • Automatic Watercraft Recognition and Identification on Water Areas Covered by Video Monitoring as Extension for Sea and River Traffic Supervision Systems

    Publikacja

    - Polish Maritime Research - Rok 2018

    The article presents the watercraft recognition and identification system as an extension for the presently used visual water area monitoring systems, such as VTS (Vessel Traffic Service) or RIS (River Information Service). The watercraft identification systems (AIS - Automatic Identification Systems) which are presently used in both sea and inland navigation require purchase and installation of relatively expensive transceivers...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Improving the quality of speech in the conditions of noise and interference

    Publikacja

    The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Uncertainty in emotion recognition

    Purpose–The purpose of this paper is to explore uncertainty inherent in emotion recognition technologiesand the consequences resulting from that phenomenon.Design/methodology/approach–The paper is a general overview of the concept; however, it is basedon a meta-analysis of multiple experimental and observational studies performed over the past couple of years.Findings–The mainfinding of the paper might be summarized as follows:...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Tensor Decomposition for Imagined Speech Discrimination in EEG

    Publikacja

    - LECTURE NOTES IN COMPUTER SCIENCE - Rok 2018

    Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Constructing a Dataset of Speech Recordingswith Lombard Effect

    Publikacja

    - Rok 2020

    Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

  • Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

    Publikacja
    • K. Kąkol

    - Rok 2023

    The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

    Pełny tekst do pobrania w portalu

  • Improved method for real-time speech stretching

    Publikacja

    n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • System Supporting Speech Perception in Special Educational Needs Schoolchildren

    Publikacja

    - Rok 2012

    The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • IEEE Transactions on Audio Speech and Language Processing

    Czasopisma

    ISSN: 1558-7916

  • Silence/noise detection for speech and music signals

    Publikacja

    - Rok 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • Improving Objective Speech Quality Indicators in Noise Conditions

    Publikacja

    - Rok 2020

    This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Rough Sets Applied to Mood of Music Recognition

    Publikacja

    With the growth of accessible digital music libraries over the past decade, there is a need for research into automated systems for searching, organizing and recommending music. Mood of music is considered as one of the most intuitive criteria for listeners, thus this work is focused on the emotional content of music and its automatic recognition. The research study presented in this work contains an attempt to music emotion recognition...

  • Speech synthesis controlled by eye gazing

    Publikacja

    A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

  • Detecting Lombard Speech Using Deep Learning Approach

    Publikacja
    • K. Kąkol
    • G. Korvel
    • G. Tamulevicius
    • B. Kostek

    - SENSORS - Rok 2023

    Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

    Pełny tekst do pobrania w portalu