Title: Deformable Linear Object Perception Pipeline in 3D: Segmentation, Reconstruction, and Inverse Modelling.
Abstract: 3D perception of deformable linear objects (DLOs) is crucial for DLO manipulation. However, perceiving DLOs in 3D from a single RGBD image is challenging. Previous DLO perception methods sometimes fail to obtain a decent 3D DLO model due to occlusions, sparse and false depth information, or not physical realistic estimations. To address these problems and provide a better DLO state estimation for downstream perception tasks like tracking and manipulation (e.g. DLO shape control which requires a good DLO state initialization), we propose a 3D DLO perception pipeline to first segment a DLO in 2D images based on 5 labelled images with minimal human efforts and without any pertaining, reconstruct the DLO in 3D space based on geometric completion to predict missing part of the DLO, and inverse model the DLO by using Discrete Elastic Rods (DER) to get a more physical realistic perception.
Title: EatSense: Human Centric, Action Recognition and Localization Dataset for Understanding Eating Behaviors and Quality of Motion Assessment.
Abstract: We introduce a new dataset named EatSense that targets both computer vision and the health-care community. EatSense is recorded while a person eats in a dining room uncontrolled setting. Key features are: First, it introduces challenging atomic actions for recognition. Second, the hugely varying lengths of actions in EatSense, make it nearly impossible for current temporal action localization frameworks to localize them. Third, it provides the capability to model complete eating behaviour (chain of action-based). Lastly, it simulates minor changes in motion/performance. Moreover, we conduct extensive experiments on EatSense with baseline deep learning-based approaches for benchmarking and hand-crafted feature-based approaches for explainable applications. We believe this dataset will benefit future researchers in building robust temporal action localization networks, behaviour recognition and performance assessment models for eating.
Title: Towards Inner-Body SLAM in Lungs
Abstract: Bronchoscopy is a medical procedure that involves using a thin, flexible tube called a bronchoscope to examine the inside of a person's lungs and airways for diagnostic or treatment purposes. However, localizing the bronchoscope in the lungs can be a challenging task for navigation. In this presentation I will focus on the challenges faced when implementing vision-based localization approaches, including the lack of features and limitations associated with monocular vision. Traditional SLAM approaches are often insufficient to address these challenges, and recent deep learning methods have shown promise but require extensive datasets for training. To overcome this, we developed a way to create synthetic bronchus-like structures. Using them, we can generate datasets of annotated bronchoscopy videos in the Unity environment. Furthermore, instead of local image features, the dataset allows us to focus on high-level airway geometry to aid navigation.