Deep Video Multi-task Learning Towards Generalized Visual Scene Enhancement and Understanding

Efkleidis Katsaros

Deep Video Multi-task Learning Towards Generalized Visual Scene Enhancement and Understanding

Abstract

The goal of this thesis was to develop efficient video multi-task convolutional architectures for a range of diverse vision tasks, on RGB scenes, leveraging i) task relationships and ii) motion information to improve multi-task performance. The approach we take starts from the integration of diverse tasks within video multi-task learning networks. We present the first two datasets of their kind in the existing literature, featuring frame-level annotations for both visual scene enhancement and understanding. This thesis proposes novel architectures, capable of accommodating multiple tasks across various hierarchy levels. The second contribution of this thesis extends those findings into the MOST (Multi-Output, -Scale, -Task) model which exploits the inherent multi-scale nature of convolutional networks in a manner that benefits video multi- tasking. Thereafter, we propose a principled pruning approach inspired by NAS (Neural Architecture Search), named NSS (Neural Structure Search). NSS discovers a more effective MOST network, which boosts performance while simultaneously reducing computational requirements and parameter count. Lastly, we introduce ATB (Adaptive Task Balancing), an efficient training method that ensures tasks are trained at consistent rates with almost no additional computational cost, enabling a more balanced multi-task training process.

Author (1)

Efkleidis Katsaros

Cite as

Full text

download paper

downloaded 37 times

Publication version: Accepted or Published Version
License: Copyright (Author(s))

Keywords

Details

Category:: Thesis, nostrification
Type:: praca doktorska pracowników zatrudnionych w PG oraz studentów studium doktoranckiego
Language:: English
Publication year:: 2024
Verified by:: Gdańsk University of Technology

seen 69 times

Recommended for you

Multi-task Video Enhancement for Dental Interventions

2022

From Linear Classifier to Convolutional Neural Network for Hand Pose Recognition

P. Rościszewski

2017

Concurrent Video Denoising and Deblurring for Dynamic Scenes

2021

Robust and Efficient Machine Learning Algorithms for Visual Recognition

S. Cygert

2022

Meta Tags