Wyniki wyszukiwania dla: recordings
-
MODALITY corpus - SPEAKER 02 - COMMANDS C1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 09 - COMMANDS C1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 21 - SEQUENCE S1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 04 - COMMANDS C1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 24 - COMMANDS C1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 32 - SEQUENCE S1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 22 - SEQUENCE S1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 26 - COMMANDS C1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 01 - SEQUENCE S1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 08 - COMMANDS C1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 24 - SEQUENCE S1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 07 - COMMANDS C1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
Video analytics-based algorithm for monitoring egress from buildings
PublikacjaA concept and a practical implementation of the algorithm for detecting of potentially dangerous situations related to crowding in passages is presented. An example of such a situation is a crush which may be caused by an obstructed pedestrian pathway. The surveillance video camera signal analysis performed in the online mode is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of the...
-
Traffic Noise Analysis Applied to Automatic Vehicle Counting and Classification
PublikacjaProblems related to determining traffic noise characteristics are discussed in the context of automatic dynamic noise analysis based on noise level measurements and traffic prediction models. The obtained analytical results provide the second goal of the study, namely automatic vehicle counting and classification. Several traffic prediction models are presented and compared to the results of in-situ noise level measurements. Synchronized...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
The American Sign Language alphabet
Dane BadawczeThe American Sign Language dataset contains all static letters of the American alphabet, meaning those that do not require movement to perform (the entire alphabet except for the letters 'J' and 'Z', which are dynamic and require hand movement).
-
Visual Lip Contour Detection for the Purpose of Speech Recognition
PublikacjaA method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...
-
Video Analytics-Based Algorithm for Monitoring Egress from Buildings
PublikacjaA concept and practical implementation of the algorithm for detecting of potentially dangerous situations of crowding in passages is presented. An example of such situation is a crush which may be caused by obstructed pedestrian pathway. Surveillance video camera signal analysis performed on line is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of implemented algorithm which uses...
-
ALOFON corpus
Dane BadawczeThe ALOFON corpus is one of the multimodal database of word recordings in English, available at http://www.modality-corpus.org/. The ALOFON corpus is oriented towards the recording of the speech equivalence variants. For this purpose, a total of 7 people who are or speak English with native speaker fluency and a variety of Standard Southern British...
-
Analysis of the influence of external conditions on temperature readings in thermograms and adaptive adjustment of the measured temperature value
PublikacjaMeasuring human temperature is a crucial step in preventing the spread of diseases such as COVID-19. For the proper operation of an automatic body temperature measurement system throughout the year, it is necessary to consider outdoor conditions. In this paper, the effect of atmospheric factors on facial temperature readings using infrared thermography is investigated. A thorough analysis of the variation of facial temperature...
-
Localization of impulsive disturbances in audio signals using template matching
PublikacjaIn this paper, a new solution to the problem of elimination of impulsive disturbances from audio signals, based on the matched filtering technique, is proposed. The new approach stems from the observation that a large proportion of noise pulses corrupting audio recordings have highly repetitive shapes that match several typical “patterns”. In many cases a representative set of exemplary pulse waveforms can be extracted from the...
-
An audio-visual corpus for multimodal automatic speech recognition
Publikacjareview of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...
-
Column base fixity in steel moment frames: Observations from instrumented buildings
PublikacjaThe rotational fixity of column base connections in Steel Moment Resisting Frames (SMRFs) strongly influences their seismic response. However, approaches for estimating base fixity have been validated only against laboratory test data. These approaches are examined based on strong motion recordings from four instrumented SMRF buildings in California to informbest practices for seismic response simulation. These buildings represent...
-
Playback Attack Detection: The Search for the Ultimate Set of Antispoof Features
PublikacjaAutomatic speaker verification systems are vulnerable to several kinds of spoofing attacks. Some of them can be quite simple – for example, the playback of an eavesdropped recording does not require any specialized equipment nor knowledge, but still may pose a serious threat for a biometric identification module built into an e-banking application. In this paper we follow the recent approach and convert recordings to images, assuming...
-
Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions
PublikacjaAutomatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...
-
Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing
PublikacjaIn this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...
-
Compensation of Voltage Drops in Trolleybus Supply System Using Battery-Based Buffer Station
PublikacjaThis paper analyzes the results of a trial operation of a battery-based buffer station supporting a selected section of trolleybus power supply systems in Pilsen, Czech Republic. The buffer station aims to prevent the catenary from excessive voltage drops in a part of the route that is most remote from the traction substation. Compensation of voltage drops is carried out by continuously measuring the catenary voltage and injecting...
-
Analyzing the relationship between sound, color, and emotion based on subjective and machine-learning approaches
PublikacjaThe aim of the research is to analyze the relationship between sound, color, and emotion. For this purpose, a survey application was prepared, enabling the assignment of a color to a given speaker’s/singer’s voice recordings. Subjective tests were then conducted, enabling the respondents to assign colors to voice/singing samples. In addition, a database of voice/singing recordings of people speaking in a natural way and with expressed...
-
Seafloor Characterisation Using Underwater Acoustic Devices
PublikacjaThe problem of seafloor characterisation is important in the context of management as well as investigation and protection of the marine environment. In the first part of the paper, a review of underwater acoustic technology and methodology used in seafloor characterisation is presented. It consists of the techniques based on the use of singlebeam echosounders and seismic sources, along with those developed for the use of sidescan...
-
1D convolutional context-aware architectures for acoustic sensing and recognition of passing vehicle type
PublikacjaA network architecture that may be employed to sensing and recognition of a type of vehicle on the basis of audio recordings made in the proximity of a road is proposed in the paper. The analyzed road traffic consists of both passenger cars and heavier vehicles. Excerpts from recordings that do not contain vehicles passing sounds are also taken into account and marked as ones containing silence....
-
Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing
PublikacjaDeveloping signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings....
-
Effects of Column Base Flexibility on Seismic Response of Steel Moment-Frame Buildings
PublikacjaSteel Moment Resisting Frames (SMRFs) are very popular lateral load resisting systems in many seismically active regions. However, their seismic response is strongly dependent on the rotational fixity of column base connections. Despite many studies (both experimental and numerical) in this particular area, available approaches for estimating column base flexibility have been validated only against laboratory test data. In the...
-
APPLICATION OF ENTROPY-BASED METHODS TO DISTINGUISH HEALTHY INDIVIDUALS WITH NORMAL SINUS RHYTHM FROM PATIENTS WITH CONGESTIVE HEART FAILURE
PublikacjaIn this paper, we examined whether entropy-based methods are able to differentiate healthy individuals from patients with congestive heart failure. To this aim, we applied two methods: Permutation Entropy and Block Entropy. Long-term ECG recordings (75 000 RR intervals) were analyzed. The results proved that both methods can distinguish those groups on condition that the parameters are appropriately chosen.
-
Rough Set-Based Classification of EEG Signals Related to Real and Imagery Motion
PublikacjaA rough set-based approach to classification of EEG signals registered while subjects were performing real and imagery motions is presented in the paper. The appropriate subset of EEG channels is selected, the recordings are segmented, and features are extracted, based on time-frequency decomposition of the signal. Rough set classifier is trained in several scenarios, comparing accuracy of classification for real and imagery motion....
-
Evaluation Criteria for Affect-Annotated Databases
PublikacjaIn this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...
-
Online sound restoration system for digital library applications
PublikacjaAudio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
-
FEEDB: A multimodal database of facial expressions and emotions
PublikacjaIn this paper a first version of a multimodal FEEDB database of facial expressions and emotions is presented. The database contains labeled RGB-D recordings of people expressing a specific set of expressions that have been recorded using Microsoft Kinect sensor. Such a database can be used for classifier training and testing in face recognition as well as in recognition of facial expressions and human emotions. Also initial experiences...
-
Automatic Analysis System of TV Commercial Emission Level
PublikacjaThe purpose of the study was to determine whether the commercial emission level is higher than the emission level of a regular program and to check if the commercials broadcasters follow the recommended levels of loudness. The paper shortly reviews some chosen methods of volume measurements specified in the ITU and EBU recommendations. Then, it describes a prototype of a system implemented in Embarcadero C++ Builder 2010 which...
-
Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions
PublikacjaThe aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...
-
Automatic Singing Voice Recognition EmployingNeural Networks and Rough Sets
PublikacjaCelem badań jest automatyczne rozpoznawanie głosów śpiewaczych w kategorii rodzaju i jakości technicznej śpiewu. W artykule opisano stworzoną bazę danych głosów, która zawiera próbki głosu śpiewaków profesjonalnych i amatorskich. W dalszej części opisano parametry zdefiniowane w oparciu o zjawiska biomechaniczne w narządzie głosu podczas śpiewania. W oparciu o stworzone macierze parametrów wytrenowano i porównano automatyczne klasyfikatory...
-
Entropy Measures in the Assessment of Heart Rate Variability in Patients with Cardiodepressive Vasovagal Syncope
PublikacjaSample entropy (SampEn) was reported to be useful in the assessment of the complexity of heart rate dynamics. Permutation entropy (PermEn) is a new measure based on the concept of order and was previously shown to be accurate for short, non-stationary datasets. The aim of the present study is to assess if SampEn and PermEn obtained from baseline recordings might differentiate patients with various outcomes of the head-up tilt test...
-
Detection of impulsive disturbances in archive audio signals
PublikacjaIn this paper the problem of detection of impulsive disturbances in archive audio signals is considered. It is shown that semi-causal/noncausal solutions based on joint evaluation of signal prediction errors and leave-one-out signal interpolation errors, allow one to noticeably improve detection results compared to the prediction-only based solutions. The proposed approaches are evaluated on a set of clean audio signals contaminated...
-
Selection of Features for Multimodal Vocalic Segments Classification
PublikacjaEnglish speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the influence of feature vector dimension reduction for the accuracy of vocalic segments classification employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...
-
Online sound restoration system for digital library applications.
PublikacjaAudio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
-
Further Developments of the Online Sound Restoration System for Digital Library Applications
PublikacjaNew signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...
-
Detecting Lombard Speech Using Deep Learning Approach
PublikacjaRobust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...
-
Face detection algorithms evaluation for the bank client verification
PublikacjaResults of investigation of face detection algorithms in the video sequences are presented in the paper. The recordings were made with a miniature industrial USB camera in real conditions met in three bank operating rooms. The aim of the experiments was to check the practical usability of the face detection method in the biometric bank client verification system. The main assumption was to provide as much as possible user interaction...
-
STEADY STATE VISUALLY EVOKED POTENTIALS FOR BRAIN COMPUTER INTERFACE
PublikacjaAn experiment conducted to validate a possibility of use a single active electrode EEG device for detecting Steady State Visually Evoked Potentials (SSVEP) is shown. A LED stimulator was applied to stimulate patients with two different frequencies - 13 Hz and 17 Hz. First, EEG signals were recorded and pre-processed using MATLAB software. In the next step recordings were analysed and classified employing the WEKA software. As indicated...
-
Expert System and Decision Support System for Electrocardiogram Interpretation and Diagnosis: Review, Challenges and Research Directions
PublikacjaElectrocardiography (ECG) is one of the most widely used recordings in clinical medicine. ECG deals with the recording of electrical activity that is generated by the heart through the surface of the body. The electrical activity generated by the heart is measured using electrodes that are attached to the body surface. The use of ECG in the diagnosis and management of cardiovascular disease (CVD) has been in existence for over...
-
Audio Feature Analysis for Precise Vocalic Segments Classification in English
PublikacjaAn approach to identifying the most meaningful Mel-Frequency Cepstral Coefficients representing selected allophones and vocalic segments for their classification is presented in the paper. For this purpose, experiments were carried out using algorithms such as Principal Component Analysis, Feature Importance, and Recursive Parameter Elimination. The data used were recordings made within the ALOFON corpus containing audio signal...