Adaptive Long-Distance Navigation for Autonomous Drones


This project leverages a Deep Reinforcement Learning (DRL) approach to enable a large drone to navigate toward goal positions in unknown outdoor settings while avoiding obstacles. Utilizing state information and depth imagery, our method uniquely integrates pre-computed optimal trajectories—determined during privileged learning phases—as a supervisory signal with the exploratory benefits of an RL agent.

Pipeline for Training and Deployment, showing how pre-generated optimal trajectories, depth images and drone states are processed by the agent to obtain a  obstacle avoid and navigation policy


Drones, with their capability for 3D agile motion, are ideal for applications in last-mile delivery, search and rescue (SAR), and 3D mapping and imaging. Yet, challenges in drone autonomy arise from real-world environmental conditions such as navigating tight spaces or counteracting unexpected wind disturbances. Furthermore, the limitations of current autonomous drone navigation systems are evident. They struggle with dynamic environmental shifts, accumulating sensor noise, and often their capabilities are restricted to simulations. Such constraints become even more noticeable during extended flight durations.

Building on this motivation, our project aims to overcome these challenges. In contrast to traditional methodologies that separate perception, planning, and control, our work introduces a Deep Reinforcement Learning paradigm. For real-life deployment to be feasible and efficient, our navigation algorithm is designed to run online and onboard the drone, operating without full prior knowledge of the environment. An essential distinction in our methodology is the deployment of privileged learning during the training phase. This technique involves utilizing full environment states available beforehand to determine obstacle-free and efficient flight trajectories. As the RL agent is trained, it receives penalties for deviations from these optimal paths, allowing both exploratory capabilities and adherence to pre-determined optimal paths.

Notably, this trained policy exhibits a remarkable capability to transfer knowledge from simulation to real-world environments. Preliminary real-world experiments have produced promising results, demonstrating the policy's ability to navigate to set targets and evade obstacles, even in unfamiliar terrains.  We now set our sights on refining our policy to navigate cluttered forests, adapt to real-time environmental changes, and process natural language instructions, all while being onboard and online.


  • Kehan Li, University of California - Berkeley
  • Shiladitya Dutta, University of California - Berkeley
  • Aayush Gupta, University of California - Berkeley
  • Avideh Zakhor, University of California - Berkeley
  • Jimmy Yang, Meta AI