Filtry
wszystkich: 26
Wyniki wyszukiwania dla: SPEAKER IDENTIFICATION
-
Open-Set Speaker Identification Using Closed-Set Pretrained Embeddings
PublikacjaThe paper proposes an approach for extending deep neural networks-based solutions to closed-set speaker identification toward the open-set problem. The idea is built on the characteristics of deep neural networks trained for the classification tasks, where there is a layer consisting of a set of deep features extracted from the analyzed inputs. By extracting this vector and performing anomaly detection against the set of known...
-
Sensors integration in the smart home environment - a proposal to solve the problem with user identification
PublikacjaIn this preliminary study we, investigate the possibility of user recognition techniques suitable on smart home devices like chairs, beds, aiming for low–power, high accuracy and quick response time. We propose the two well know technique: voice speaker recognition and accelerometer signal from device mounted on the chair, and the third one optical system basing on IR LED transmitter/receiver circuit. The preliminary results proved...
-
Playback Attack Detection: The Search for the Ultimate Set of Antispoof Features
PublikacjaAutomatic speaker verification systems are vulnerable to several kinds of spoofing attacks. Some of them can be quite simple – for example, the playback of an eavesdropped recording does not require any specialized equipment nor knowledge, but still may pose a serious threat for a biometric identification module built into an e-banking application. In this paper we follow the recent approach and convert recordings to images, assuming...
-
Comparison of the Ability of Neural Network Model and Humans to Detect a Cloned Voice
PublikacjaThe vulnerability of the speaker identity verification system to attacks using voice cloning was examined. The research project assumed creating a model for verifying the speaker’s identity based on voice biometrics and then testing its resistance to potential attacks using voice cloning. The Deep Speaker Neural Speaker Embedding System was trained, and the Real-Time Voice Cloning system was employed based on the SV2TTS, Tacotron,...
-
Speaker Recognition Using Convolutional Neural Network with Minimal Training Data for Smart Home Solutions
PublikacjaWith the technology advancements in smart home sector, voice control and automation are key components that can make a real difference in people's lives. The voice recognition technology market continues to involve rapidly as almost all smart home devices are providing speaker recognition capability today. However, most of them provide cloud-based solutions or use very deep Neural Networks for speaker recognition task, which are...
-
Examining Influence of Distance to Microphone on Accuracy of Speech Recognition
PublikacjaThe problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...
-
Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech
PublikacjaIn this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...
-
Creating new voices using normalizing flows
PublikacjaCreating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...
-
Novel 5.1 Downmix Algorithm with Improved Dialogue Intelligibility
PublikacjaA new algorithm for 5.1 to stereo downmix is introduced, which addresses the problem of dialogue intelligibility. The algorithm utilizes proposed signal processing algorithms to enhance the intelligibility of movie dialogues, especially in difficult listening conditions or in compromised speaker setup. To account for the latter, a playback configuration utilizing a portable device, i.e. an ultrabook, is examined. The experiments...
-
Developing a Low SNR Resistant, Text Independent Speaker Recognition System for Intercom Solutions - A Case Study
PublikacjaThis article presents a case study on the development of a biometric voice verification system for an intercom solution, utilizing the DeepSpeaker neural network architecture. Despite the variety of solutions available in the literature, there is a noted lack of evaluations for "text-independent" systems under real conditions and with varying distances between the speaker and the microphone. This article aims to bridge this gap....
-
Application of dynamic time warping and cepstrograms to text-dependent speaker verification
PublikacjaThis work provides a description of an automatic speaker verification (ASV) system. In particular, it documents the evolution of all individual stages of the proposed ASV system design from the phase of preprocessing to an operational decision making system. The aim of this research was to achieve the system of the best safety and ease of use in view of users. The objective estimation of this target has been accomplished by assessing...
-
Auto adaptation of mobile device characteristics to various acoustic conditions
PublikacjaThe proposed methodology of auto adaptation of the mobile device characteristics to various acoustic conditions is presented in the paper. The first goal of this study was to determine the parameters of the acoustic path of the mobile device, for both transmitting (speaker) and receiver (microphone). Results of the measurement of characteristics of mobile devices were presented. Information about characteristics of individual parts...
-
Texture Features for the Detection of Playback Attacks: Towards a Robust Solution
PublikacjaThis paper describes the new version of a method that is capable of protecting automatic speaker verification (ASV) systems from playback attacks. The presented approach uses computer vision techniques, such as the texture feature extraction based on Local Ternary Patterns (LTP), to identify spoofed recordings. Our goal is to make the algorithm independent from the contents of the training set as much as possible; we look for the...
-
Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition
PublikacjaThe problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...
-
Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization
PublikacjaAn allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted...
-
Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling
PublikacjaA common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...
-
Biometric identity verification
PublikacjaThis chapter discusses methods which are capable of protecting automatic speaker verification systems (ASV) from playback attacks. Additionally, it presents a new approach, which uses computer vision techniques, such as the texture feature extraction based on Local Ternary Patterns (LTP), to identify spoofed recordings. We show that in this case training the system with large amounts of spectrogram patches may be difficult, and...
-
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
PublikacjaIn this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...
-
Playback detection using machine learning with spectrogram features approach
PublikacjaThis paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect...
-
Analysis of allophones based on audio signal recordings and parameterization
PublikacjaThe aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...
-
Improving the quality of speech in the conditions of noise and interference
PublikacjaThe aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...
-
The shallow sea experiment with usage of linear hydrophone array
PublikacjaPurpose of this article is to present designed and made linear hydrophone array and the results obtained during in situ trails on Gulf of Gdańsk. The measuring system allowed to localize hydrophones in the selected points and perform measurements in both the horizontal antenna positioning and vertical. Made in this way recordings allow creating accurate 3D imaging of sound intensity/propagation. During research three floating objects...
-
Objectivization of phonological evaluation of speech elements by means of audio parametrization
PublikacjaThis study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...
-
Detecting Lombard Speech Using Deep Learning Approach
PublikacjaRobust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...
-
How Machine Learning Contributes to Solve Acoustical Problems
PublikacjaMachine learning is the process of learning functional relationships between measured signals (called percepts in the artificial intelligence literature) and some output of interest. In some cases, we wish to learn very specific relationships from signals such as identifying the language of a speaker (e.g. Zissman, 1996) which has direct applications such as in call center routing or performing a music information retrieval task...
-
Audio Feature Analysis for Precise Vocalic Segments Classification in English
PublikacjaAn approach to identifying the most meaningful Mel-Frequency Cepstral Coefficients representing selected allophones and vocalic segments for their classification is presented in the paper. For this purpose, experiments were carried out using algorithms such as Principal Component Analysis, Feature Importance, and Recursive Parameter Elimination. The data used were recordings made within the ALOFON corpus containing audio signal...