MODALITY corpus - SPEAKER 10 - SEQUENCE S2

Description

The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system, as every utterance was labelled. Recordings in noisy conditions can be used to test the robustness of speech recognition systems.

The language material was based on a remote control scenario and it includes 231 words -numbers, names of months and days, a set of verbs and nouns related to a computer device control. They were read by speakers as separated words and sequences resulting in a set of 12 recording sessions per speaker. Half of the sessions were recorded in quiet conditions, the other half contained three kinds of intrusive signals (traffic, babble and factory noise).

The corpus includes recordings of 42 speakers (33 male, 9 female). The participants include 20 students and staff of Multimedia Systems Department of the Gdańsk University of Technology, 5 students of the Institute of English and American Studies of the University of Gdańsk, and 17 native English speakers.

The dataset consist of recordings and visual features for SPEAKER 10:

sex: man
native speaker: no
age: 26

The language material: SEQUENCE S2

All recordings for all speakers are available at http://www.modality-corpus.org/

Sample still from the corpus
(SPEAKER 10)

Due to the size of the corpus (approx. 2.5 TB of data), every speaker’s recording was placed in a separate zip file of the size approx. 4-7 GB each.

The recordings were organized according to the speakers’ language skills. The group A (17 speakers) consists of native-speakers. Non-native speakers recordings (Polish nationals) were placed in the Group B (25 speakers).

The audio files use the Waveform Audio File Format (.wav), and contain a single PCM audio stream sampled at 44.1 kSa/s with 16-bit depth. The video files utilize the Matroska Multimedia Container Format (.mkv) in which a video stream in 1080p resolution, captured at 100 fps was placed after being compressed with h.265 codec (using High 4:4:4 profile). The ‘.lab’ files are text files containing the information on word positions in audio files, and follow the HTK label format. Each line of a ‘.lab’ file contains the actual label preceded by start and end times (in 100 ns units) e.g. : 1239620000 1244790000 FIVE which denotes the word “five”, occurring between the 123.962 s and 124.479 s of audio.
Word-accurate SNR values calculated for every recording are also included in the ZIP file.

Dataset file

SP10_SEQUENCE2.ZIP

561.4 MB, S3 ETag 4b449a6774b93f9ef4ddf8fa21b08498-2, downloads: 52

The file hash is calculated from the formula
hexmd5(md5(part1)+md5(part2)+...)-{parts_count} where a single part of the file is 512 MB in size.

Example script for calculation:
https://github.com/antespi/s3md5

download

File details

License:: Custom
read

Details

Year of publication:

2016

Verification date:

2021-06-22

Dataset language:

English

Fields of science:

information and communication technology (Engineering and Technology)

DOI:

10.34808/rjc5-kf21

Series:

MODALITY corpus

Verified by:

Gdańsk University of Technology

Keywords

References

publication An audio-visual corpus for multimodal automatic speech recognition
publication A comparative study of English viseme recognition methods and algorithms

Cite as

Authors

Andrzej Czyżewski prof. dr hab. inż.
Politechnika Gdańska - Katedra Systemów Multimedialnych
orcid number 0000-0001-9159-8658open in new tab
Project Leader
Bożena Kostek prof. dr hab. inż.
Politechnika Gdańska - Laboratorium Akustyki Fonicznej
orcid number 0000-0001-6288-2908open in new tab
Creator
Piotr Bratoszewski mgr inż.

Creator
Marcin Szykulski mgr inż.

Creator
Józef Kotus dr hab. inż.
Politechnika Gdańska - Katedra Systemów Multimedialnych
orcid number 0000-0001-8087-3095open in new tab
Creator
Szymon Zaporowski mgr inż.
Politechnika Gdańska - Katedra Systemów Multimedialnych
orcid number 0000-0003-0814-1097open in new tab
Creator
Paweł Spaleniak mgr inż.
Politechnika Gdańska - Katedra Systemów Multimedialnych
orcid number 0000-0002-2487-8956open in new tab
Creator
Piotr Odya dr inż.
Politechnika Gdańska - Katedra Systemów Multimedialnych
orcid number 0000-0003-0288-6178open in new tab

seen 100 times

Search