Description
We introduce Vident-synth, a large dataset of synthetic dental videos with corresponding ground truth forward and backward optical flows and occlusion masks. It can be used for:
- evaluation of optical flow models in challenging dynamic scenes characterized by fast and complex motions, independently moving objects, variable illumination, specular reflections, fluid dynamics, sparse textures
- evaluation of blind measures for unsupervised learning
- development of long temporal models for optical flow estimation and dense point trackers
- training models in supervised and semi-supervised manner, including domain adaptation and transfer learning
- cross-domain learning
The simulated motions are complex, combining rotation, scaling, and perspective changes due to small distance between the camera and observed objects, frequent and fast change in depth. The dataset is an order of magnitude larger than the Sintel dataset. We utilized Blender to manually craft animations that ranged from closely resembling to less similar to real dental scene videos from the Vident-real dataset. For generating ground truth optical flow, we employed the Vision Blender library. The process of creating the synthetic videos involved the following steps:
- manual preparation of models of the mouth interior with exact 3D scans of real, extracted teeth with different textures
- modelling of independently moving objects like dental tools, tongues, rolls, and tubes,
- rendering the sequences with different kinds of artifacts.
The models of sequences were rendered in Blender in three different variants. The first rendering variant generated sequences with constant light, without blur and water. The second variant induced motion and focus blur by changing camera parameters across time (depth of field, f-stop, and blades) with a point light source attached to the camera. The third variant generated water spills, which are considered as artifacts and thus excluded from the dense motion of the main dental scene. We used BlenderKit for skin and metal textures. Tooth textures were transferred from photos of real teeth and mapped onto 3D teeth models.
Optical flow is integral to numerous video processing tasks such as restoration, super-resolution, and stabilization. Although recent advancements in optical flow estimation have shown significant efficacy in general scenes, their applicability to challenging medical scenarios, which exhibit unique domain-specific visual phenomena, remains limited. Supervised learning methods facilitate the robust training of motion estimators. However, the absence of ground truth optical flow in many medical video-assisted applications poses a significant barrier to their progress. This is particularly evident in Video-Assisted Dentistry (VAD), where enhanced and continual vision could improve the educational, training, and fully operational dental workflows. Therefore, development of domain-specific synthetic datasets with available ground truth optical flow appears as a natural first step towards the adaptation of general purpose optical flow models to domain-specific real scenes using domain adaptation techniques.
Dataset file
hexmd5(md5(part1)+md5(part2)+...)-{parts_count}
where a single part of the file is 512 MB in size.Example script for calculation:
https://github.com/antespi/s3md5
File details
- License:
-
open in new tabCC BY-NCNon-commercial
- File embargo:
- 2025-09-30
Details
- Year of publication:
- 2024
- Verification date:
- 2024-07-01
- Dataset language:
- English
- Fields of science:
-
- information and communication technology (Engineering and Technology)
- automation, electronics, electrical engineering and space technologies (Engineering and Technology)
- medical sciences (Medical and Health Sciences )
- biomedical engineering (Engineering and Technology)
- DOI:
- DOI ID 10.34808/8yba-cr72 open in new tab
- Verified by:
- Gdańsk University of Technology
Keywords
References
- dataset Vident-lab: a dataset for multi-task video processing of phantom dental scenes
- dataset Vident-real: an intra-oral video dataset for multi-task learning
Cite as
Authors
seen 440 times