Search results for: LIP-READING, FACIAL MOTION CAPTURE, SPEECH RECOGNITION, VOCALIC SEGMENTS

Search results for: LIP-READING, FACIAL MOTION CAPTURE, SPEECH RECOGNITION, VOCALIC SEGMENTS

results on page:
embed this view on your website

Filters

total: 296

clear all filters disabled

Horizontally-split-drain MAGFET - a highly sensitive magnetic field sensor
Publication
- W. Kordalski
- M. Polowczyk
- M. Panek
- Bulletin of the Polish Academy of Sciences-Technical Sciences - Year 2007
We propose a novel magnetic field sensitive semiconductor device, viz., Horizontally-Split-Drain Magnetic-Field Sensitive Field-Effect Transistor (HSDMAGFET) which can be used to measure or detect steady or variable magnetic fields. Operating principle of the transistor is based on one of the galvanomagnetic phenomena and a Gradual Channel Detachment Effect (GCDE) and is very similar to that of Popovic and Baltes's SDMAGFET. The...

Full text available to download
Emotion monitoring system for drivers
Publication
- IFAC-PapersOnLine - Year 2019
This article describes a new approach to the issue of building a driver monitoring system. Actual systems focus, for example, on tracking eyelid and eyebrow movements that result from fatigue. We propose a different approach based on monitoring the state of emotions. Such a system assumes that by using the emotion model based on our own concept, referred to as the reverse Plutchik’s paraboloid of emotions, the recognition of emotions...

Full text available to download
Just look at to open it up: A biometric verification facility for password autofill to protect electronic documents
Publication
- M. Smiatacz
- B. Wiszniewski
- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2021
Electronic documents constitute specific units of information, and protecting them against unauthorized access is a challenging task. This is because a password protected document may be stolen from its host computer or intercepted while on transfer and exposed to unlimited offline attacks. The key issue is, therefore, making document passwords hard to crack. We propose to augment a common text password authentication interface...

Full text available to download
Remote Estimation of Video-Based Vital Signs in Emotion Invocation Studies
Publication
- Year 2018
Abstract— The goal of this study is to examine the influence of various imitated and video invoked emotions on the vital signs (respiratory and pulse rates). We also perform an analysis of the possibility to extract signals from sequences acquired with cost-effective cameras. The preliminary results show that the respiratory rate allows for better separation of some emotions than the pulse rate, yet this relation highly depends...

Full text available to download
Fuzzy rule-based dynamic gesture recognition employing camera & multimedia projector
Publication
- M. Lech
- B. Kostek
- Year 2010
In the paper the system based on camera and multimedia projector enabling a user to control computer applications by dynamic hand gestures is presented. The main objective is to present the gesture recognition methodology which bases on representing hand movement trajectory by motion vectors analyzed using fuzzy rule-based inference. The approach was engineered in the system developed with J2SE and C++ / OpenCV technology. OpenCV...

Full text to download in external service
Influence of Thermal Imagery Resolution on Accuracy of Deep Learning based Face Recognition
Publication
- Year 2019
Human-system interactions frequently require a retrieval of the key context information about the user and the environment. Image processing techniques have been widely applied in this area, providing details about recognized objects, people and actions. Considering remote diagnostics solutions, e.g. non-contact vital signs estimation and smart home monitoring systems that utilize person’s identity, security is a very important factor....

Full text available to download
Thermal Images Analysis Methods using Deep Learning Techniques for the Needs of Remote Medical Diagnostics
Publication
- A. Kwaśniewska
- Year 2020
Remote medical diagnostic solutions have recently gained more importance due to global demographic shifts and play a key role in evaluation of health status during epidemic. Contactless estimation of vital signs with image processing techniques is especially important since it allows for obtaining health status without the use of additional sensors. Thermography enables us to reveal additional details, imperceptible in images acquired...

Full text available to download
Analysis of allophones based on audio signal recordings and parameterization
Publication
- Journal of the Acoustical Society of America - Year 2017
The aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...

Full text to download in external service
Orken Mamyrbayev Professor

People

1. Education: Higher. In 2001, graduated from the Abay Almaty State University (now Abay Kazakh National Pedagogical University), in the specialty: Computer science and computerization manager. 2. Academic degree: Ph.D. in the specialty "6D070300-Information systems". The dissertation was defended in 2014 on the topic: "Kazakh soileulerin tanudyn kupmodaldy zhuyesin kuru". Under my supervision, 16 masters, 1 dissertation...
Sensing Direction of Human Motion Using Single-Input-Single-Output (SISO) Channel Model and Neural Networks
Publication
- S. A. Bhat
- M. A. Dar
- P. Szczuko
- D. Alyahya
- F. Mustafa
- IEEE Access - Year 2022
Object detection Through-the-Walls enables localization and identification of hidden objects behind the walls. While numerous studies have exploited Channel State Information of Multiple Input Multiple Output (MIMO) WiFi and radar devices in association with Artificial Intelligence based algorithms (AI) to detect and localize objects behind walls, this study proposes a novel non-invasive Through-the-Walls human motion direction...

Full text available to download
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
Publication
- P. Rościszewski
- Procedia Computer Science - Year 2017
In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

Full text available to download
Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów
Publication
- G. Korvel
- B. Kostek
- Year 2017
The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
Voice command recognition using hybrid genetic algorithm
Publication
- M. Wroniszewska
- J. Dziedzic
- TASK Quarterly - Year 2010
Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

Full text available to download
Data regarding a new, vector-enzymatic DNA fragment amplification-expression technology for the construction of artificial, concatemeric DNA, RNA and proteins, as well as biological effects of selected polypeptides obtained using this method
Publication
- P. Skowron
- N. Krawczun
- J. Żebrowska
- D. Krefft
- O. Żołnierkiewicz
- M. Bielawa
- J. Jeżewska-Frąckowiak
- Ł. Janus
- M. Witkowska
- M. Palczewska... and 10 others
- Data in Brief - Year 2020
Applications of bioactive peptides and polypeptides are emerging in areas such as drug development and drug delivery systems. These compounds are bioactive, biocompatible and represent a wide range of chemical properties, enabling further adjustments of obtained biomaterials. However, delivering large quantities of peptide derivatives is still challenging. Several methods have been developed for the production of concatemers –...

Full text available to download
Determination of Peak Impact Force for Buildings Exposed to Structural Pounding during Earthquakes
Publication
- S. M. Khatami
- H. Naderpour
- C. R. Barros
- A. Jakubczyk-Gałczyńska
- R. Jankowski
- Geosciences - Year 2019
Structural pounding between adjacent, insufficiently separated buildings, or bridge segments, has been repeatedly observed during seismic excitations. Such earthquake-induced collisions may cause severe structural damage or even lead to the collapse of colliding structures. The aim of the present paper was to show the results of the study focused on determination of peak impact forces during collisions between buildings exposed...

Full text available to download
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Publication
- P. Rościszewski
- J. Kaliski
- Year 2017
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Full text to download in external service
Noise profiling for speech enhancement employing machine learning models
Publication
- K. Kąkol
- G. Korvel
- B. Kostek
- Journal of the Acoustical Society of America - Year 2022
This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

Full text available to download
Structural insights, biocatalytic characteristics, and application prospects of lignin-modifying enzymes for sustainable biotechnology
Publication
- A. Kumar Singh
- H. M. N. Iqbal
- N. Cardullo
- V. Muccilli
- J. Fernández-Lucas
- J. Ejbye Schmidt
- T. Jesionowski
- M. Bilal
- INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES - Year 2023
Lignin modifying enzymes (LMEs) have gained widespread recognition in depolymerization of lignin polymers by oxidative cleavage. LMEs are a robust class of biocatalysts that include lignin peroxidase (LiP), manganese peroxidase (MnP), versatile peroxidase (VP), laccase (LAC), and dye-decolorizing peroxidase (DyP). Members of the LMEs family act on phenolic, non-phenolic substrates and have been widely researched for valorization...

Full text available to download
Superresolution algorithm to video surveillance system
Publication
- T. Merta
- A. Czyżewski
- Year 2010
An application of a multiframe SR (superresolution) algorithm applied to video monitoring is described. The video signal generated by various types of video cameras with different parameters and signal distortions which may be very problematic for superresolution algorithms. The paper focuses on disadvantages in video signal which occur in video surveillance systems. Especially motion estimation and its influence on superresolution...
Transfer learning in imagined speech EEG-based BCIs
Publication
- J. S. Garcia Salinas
- L. Villaseñor-Pineda
- C. A. Reyes-Garćia
- A. A. Torres-García
- Biomedical Signal Processing and Control - Year 2019
The Brain–Computer Interfaces (BCI) based on electroencephalograms (EEG) are systems which aim is to provide a communication channel to any person with a computer, initially it was proposed to aid people with disabilities, but actually wider applications have been proposed. These devices allow to send messages or to control devices using the brain signals. There are different neuro-paradigms which evoke brain signals of interest...

Full text available to download
Normalization of face illumination using basic knowledge and information extracted from a single image
Publication
- M. Smiatacz
- INFORMATION SCIENCES - Year 2018
This paper presents a method for face image normalization that can be applied to the extraction of illumination invariant facial features or used to remove bad lighting effects and produce high-quality, photorealistic results. Most of the existing approaches concentrate on separating the constant albedo from the variable light intensity; that concept, however, is based on the Lambertian model, which fails in the presence of specularities...

Full text to download in external service
Brownian Motion in Optical Tweezers, a Comparison between MD Simulations and Experimental Data in the Ballistic Regime
Publication
- K. Zembrzycki
- S. Pawłowska
- F. Pierini
- T. A. Kowalewski
- Polymers - Year 2023
The four most popular water models in molecular dynamics were studied in large-scale simulations of Brownian motion of colloidal particles in optical tweezers and then compared with experimental measurements in the same time scale. We present the most direct comparison of colloidal polystyrene particle diffusion in molecular dynamics simulations and experimental data on the same time scales in the ballistic regime. The four most...

Full text available to download
Multiscale model for blood flow after a bileaflet artificial aortic valve implantation
Publication
- M. L. Nowak
- E. Divo
- W. P. Adamczyk
- COMPUTERS IN BIOLOGY AND MEDICINE - Year 2023
Cardiovascular diseases are the leading cause of mortality in the world, mainly due to atherosclerosis and its consequences. The article presents the numerical model of the blood flow through artificial aortic valve. The overset mesh approach was applied to simulate the valve leaflets motion and to realize the moving mesh, in the aortic arch and the main branches of cardiovascular system. To capture the cardiac system’s response...

Full text to download in external service
Genetic programming extension to APF-based monocular human body pose estimation
Publication
- P. Szczuko
- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2012
New method of the human body pose estimation based on a single camera 2D observation is presented, aimed at smart surveillance related video analysis and action recognition. It employs 3D model of the human body, and genetic algorithm combined with annealed particle filter for searching the global optimum of model state, best matching the object's 2D observation. Additionally, new motion cost metric is employed, considering current...

Full text available to download
Separability Assessment of Selected Types of Vehicle-Associated Noise
Publication
- Advances in Intelligent Systems and Computing - Year 2016
Music Information Retrieval (MIR) area as well as development of speech and environmental information recognition techniques brought various tools in-tended for recognizing low-level features of acoustic signals based on a set of calculated parameters. In this study, the MIRtoolbox MATLAB tool, designed for music parameter extraction, is used to obtain a vector of parameters to check whether they are suitable for separation of...

Full text to download in external service
Performance Analysis of the OpenCL Environment on Mobile Platforms
Publication
- P. Falkowski-Gilski
- M. Plewka
- Year 2022
Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Full text to download in external service
Fully Automated AI-powered Contactless Cough Detection based on Pixel Value Dynamics Occurring within Facial Regions
Publication
- M. Szankin
- A. Kwaśniewska
- N. Kowalczyk
- J. Rumiński
- R. Nicolas
- D. Gamba
- Year 2021
Increased interest in non-contact evaluation of the health state has led to higher expectations for delivering automated and reliable solutions that can be conveniently used during daily activities. Although some solutions for cough detection exist, they suffer from a series of limitations. Some of them rely on gesture or body pose recognition, which might not be possible in cases of occlusions, closer camera distances or impediments...

Full text to download in external service
Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model
Publication
- K. Leckey
- R. Neininger
- W. Szpankowski
- Year 2013
Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel- Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing...
Nutrient transport and acquisition by diatom chains in a moving fluid
Publication
- M. M. Musielak
- L. Karp-Boss
- P. Jumars
- L. Fauci
- JOURNAL OF FLUID MECHANICS - Year 2009
The role of fluid motion in delivery of nutrients to phytoplankton cells is a fundamental question in biological and chemical oceanography. In the study of mass transfer to phytoplankton, diatoms are of particular interest. They are non-motile, are often the most abundant components in aggregates and often form chains, so they are the ones expected to benefit most from enhancement of nutrient flux due to dissipating turbulence....

Full text to download in external service
Metoda i algorytmy sterowania procesami miksowania dźwięku za pomocą gestów w oparciu o analizę obrazu wizyjnego
Publication
- M. Lech
- Year 2013
Głównym celem rozprawy było opracowanie systemu miksowania dźwięku za pomocą gestów rąk wykonywanych w powietrzu oraz zbadanie możliwości oferowanych przez takie rozwiązanie w porównaniu ze współczesną metodą miksowania sygnałów fonicznych, wykorzystującą środowisko komputera. Opracowany system rozpoznaje zarówno dynamiczne jak i statyczne gesty rąk. Rozpoznawanie gestów dynamicznych zrealizowano w oparciu o metody logiki rozmytej...
Identification of Emotional States Using Phantom Miro M310 Camera
Publication
- M. Przyborski
- Internal Security - Year 2013
The purpose of this paper is to present the possibilities associated with the use of remote sensing methods in identifying human emotional states, and to present the results of the research conducted by the authors in this field. The studies presented involved the use of advanced image analysis to identify areas on the human face that change their activity along with emotional expression. Most of the research carried out in laboratories...
IMAGE CORRELATION AS A TOLL FOR TRACKING FACIAL CHANGES CAUSING BY EXTERNAL STIMULI
Publication
- K. Bobkowska
- A. Janowski
- M. Przyborski
- Year 2015
Expressions of the human face bring a lot of information, which are a valuable source in the areas of computer vision, remote sensing and affective computing. For years, by analyzing the movement of the skin and facial muscles scientists are trying to create the perfect tool, based on image analysis, allowing the recognition of emotional states of human beings. To create a reliable algorithm, it is necessary to explore and examine...

Full text to download in external service
Thermal Image Processing for Respiratory Estimation from Cubical Data with Expandable Depth
Publication
- M. Szankin
- A. Kwaśniewska
- J. Rumiński
- Journal of Imaging - Year 2023
As healthcare costs continue to rise, finding affordable and non-invasive ways to monitor vital signs is increasingly important. One of the key metrics for assessing overall health and identifying potential issues early on is respiratory rate (RR). Most of the existing methods require multiple steps that consist of image and signal processing. This might be difficult to deploy on edge devices that often do not have specialized...

Full text available to download
Nonlocal Vibration of Carbon/Boron-Nitride Nano-hetero-structure in Thermal and Magnetic Fields by means of Nonlinear Finite Element Method
Publication
- H. M. Sedighi
- M. Malikan
- A. Valipour
- K. Kamil Żur
- Journal of Machinery Manufacture and Reliability - Year 2020
Hybrid nanotubes composed of carbon and boron-nitride nanotubes have manifested as innovative building blocks to exploit the exceptional features of both structures simultaneously. On the other hand, by mixing with other types of materials, the fabrication of relatively large nanotubes would be feasible in the case of macroscale applications. In the current article, a nonlinear finite element formulation is employed to deal with...

Full text available to download
“Shadow” vs. “Phase 3D” method within endoscopic examinations of marine engines
Publication
- Z. Korczewski
- J. Rudnicki
- Combustion Engines - Year 2013
A visual investigation of surfaces creating internal, working spaces of marine combustion engines by means of specialized view-finders so called endoscopes is at present almost a basic method of technical diag-nostics. The surface structure of constructional material is visible during investigations like through the magnifying glass (usually with a precisely determined magnification), which makes possible a detection, recognition...

Full text available to download
Combined Single Neuron Unit Activity and Local Field Potential Oscillations in a Human Visual Recognition Memory Task
Publication
- M. T. Kucewicz
- B. M. Berry
- M. R. Bower
- J. Cymbalnik
- V. Svehlik
- S. M. Stead
- G. A. Worrell
- IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING - Year 2016
GOAL: Activities of neuronal networks range from action potential firing of individual neurons, coordinated oscillations of local neuronal assemblies, and distributed neural populations. Here, we describe recordings using hybrid electrodes, containing both micro- and clinical macroelectrodes, to simultaneously sample both large-scale network oscillations and single neuron spiking activity in the medial temporal lobe structures...

Full text to download in external service
Multimodal learning application with interactive animated character. [Multimodalna aplikacja edukacyjna wykorzystująca interaktywną animowaną postać]
Publication
- P. Szczuko
- Year 2006
The aim of this study is to design a computer application that may assist teachers and therapists in multimodal manner in their work with impaired or disabled children. The application can be operated in many different ways, giving to a child with special educational needs a possibility to learn and train many skills or treat speech disorders. The main stress in this research is on the creation of animated character that will serve...
MODALITY corpus - SPEAKER 35 - COMMANDS C1
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 21 - SEQUENCE S6
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 21 - COMMANDS C5
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 21 - SEQUENCE S4
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 10 - SEQUENCE S1
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 01 - SEQUENCE S2
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 39 - COMMANDS C1
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 01 - SEQUENCE S3
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 01 - COMMANDS C3
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 21 - SEQUENCE S2
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 33 - SEQUENCE S1
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 01 - COMMANDS C2
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 21 - COMMANDS C3
Open Research Data
- series: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...

Search

Filters

Catalog

Search results for: LIP-READING, FACIAL MOTION CAPTURE, SPEECH RECOGNITION, VOCALIC SEGMENTS

Orken Mamyrbayev Professor