Control of Microrobots with Data and Computationally Efficient Reinforcement Learning

The ideal method for generating a robot controller would be extremely data efficient, free of requirements on domain knowledge, and safe to run. Model-based reinforcement learning (MBRL) has been established as a compelling approach to synthesize controllers even for systems without analytic dynamics models and with high cost per experiment, but it is still limited to preliminary applications in robotics -- we propose to advance MBRL by designing algorithms that are more data and computationally efficient on novel platforms



Microrobotics & MBRL: Recent work has demonstrated progress towards a novel class of microrobots with controlled walking and flying microrobots smaller than a quarter. Microrobots offer a compelling platform for testing methods for controller synthesis as they force considerations of current limitations in MBRL – data efficiency and hardware safety. Microrobots 1) lack analytical dynamics models for strong priors on control and 2) have a high cost per test via lab-bench test environments. We will focus in three areas of MBRL: computationally efficient planning, dynamics models that inform exploration, and sharing data across robots.

Computation Limits: Running the PETS MBRL algorithm on complex systems (e.g. halfcheetah) can take multiple days to run on a GPU. Additionally, current model-based RL algorithms limit transfer from simulation to robotic deployment in experiment because the PETS  and MBPO  algorithms run at about 1 Hz with a GPU, which is non-tractable for controlling real robots. We will address the computational limits by a) designing new non-sampling based controllers (e.g. imitative, neural network policies and gradient based control, such as L-BFGS) and b) deploying models that can predict long-term behavior without computationally intensive, recursive dynamics predictions.

Dynamics Model Mismatch: The end goal of building a model is effective control – current models in robot learning are designed to be accurate, not effective at tasks. We will re-design MBRL around this paradigm by optimizing models w.r.t. the task reward and matching them to the control method used. Building on recent work, one way to match the model to the task is by reframing dynamics predictions from discrete steps to long-term trajectory-based models can improve planning effectiveness when using model predictive control.

Shared Dynamics for Multiple Tasks: In the real world, robots are not identical and break. Current MBRL algorithms have no way to re-use the relevant past experiences – we will address this by conditioning the state-action data used for prediction with basic robot properties such as mass, inertial moments, and equilibrium control voltages forming a joint distribution.


Fall 2021 Update & Planning.