review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...
A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...
In recent years, increasingly complex algorithms for automated analysis of surveillance data are being developed. The rapid growth in the number of monitoring installations and higher expectations of the quality parameters of the captured data result in an enormous computational cost of analyzing the massive volume of data. In this paper a new model of online processing of surveillance data streams is proposed, which assumes the...
seen 174 times