review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...
In recent years, increasingly complex algorithms for automated analysis of surveillance data are being developed. The rapid growth in the number of monitoring installations and higher expectations of the quality parameters of the captured data result in an enormous computational cost of analyzing the massive volume of data. In this paper a new model of online processing of surveillance data streams is proposed, which assumes the...
Method for profile view retrieving of the human face is presented. The depth data from the 3D camera is taken as an input. The preprocessing is, besides of standard filtration, extended by the process of filling of the holes which are present in depth data. The keypoints, defined as the nose tip and the chin are detected in user’s face and tracked. The Kalman filtering is applied to smooth the coordinates of those points which...
seen 98 times