Learning Dexterous In-Hand Manipulation with Vision and Touch


Consider the task of stacking LEGO bricks or assembling IKEA furniture in Figure. Given a goal image configuration, humans can rapidly figure out a plan to accurately manipulate the LEGO bricks or furniture parts to achieve the goal. This is mainly due to: 1) humans are already equipped with a good mental dynamics model through daily interaction with objects, and can use it to perform long-term planning by imagining the outcome of possible actions 2) humans make extensive usage of multi-modal perception, especially vision and touch, to reduce their uncertainty about prediction and make their long-term predictions accurate.

Project Update (Sep 25, 2022):

Update Website: https://haozhi.io/hora

In Conference on Robot Learning (CoRL) 2022, we demonstrate how to design and learn a simple adaptive controller to achieve in-hand object rotation using only fingertips. The controller is trained entirely in simulation on only cylindrical objects, which then – without any fine-tuning – can be directly deployed to a real robot hand to rotate dozens of objects with diverse sizes, shapes, and weights over the z-axis. This is achieved via rapid online adaptation of the robot’s controller to the object properties using only proprioception history. Furthermore, natural and stable finger gaits automatically emerge from training the control policy via reinforcement learning. The results are summarized in the following video:

Left: Our controller is trained only in simulation on simple cylindrical objects of different sizes and weights. Right: Without any real world fine-tuning, the controller can be deployed to a real robot on a diverse set of objects with different shapes, sizes and weights (object mass and the shortest/longest diameter axis length along the fingertips are shown in the figure) using only proprioceptive information.Contacts: