This project aims to improve real world, wide field of view depth estimation using monocular sensors. In doing so, various geometry of indoor and outdoor sceneries will be experimented with using large deep learning models. A focus will be placed on data representation in the process in order to investigate and identify the most efficient pipelines.
- Jerome Quenum, University of California - Berkeley
- Brent Yi, University of California - Berkeley
- Avideh Zakhor, University of California - Berkeley
- Austin Stone, Google
- Rico Jonschkowski, Google
- Carolina Parada, Google
Automated collision prediction and avoidance technology is an indispensable part of mobile robots. As an alternative to traditional approaches using multi-modal sensors, purely image-based collision avoidance strategies have recently gained attention in robotics. The most straightforward way of detecting obstacles using a monocular camera is to apply single image, learned depth estimation techniques. The most successful of these techniques estimate relative depth with ambiguity of a scale factor. Nevertheless there are a number of shortcomings with this simple approach: First, thin objects are not detected due to the fact that the learned depth estimation happens in a lower resolution, rather than the high resolution RGB image from the camera. Secondly, the learned single image depth estimation is incapable of detecting dynamic moving objects until they are too close and hence could result in collision. Third, estimating a scale factor to compute absolute depth in practical situations is nontrivial and requires additional sensors and corresponding algorithms. In this work we develop a framework for monocular obstacle detection for autonomous systems by finding the distance and movement of multiple dynamic objects in the field of view of a given monocular sensor.