Interactive Learning from Vision and Touch

The virtuoso plays the piano with passion, poetry and extraordinary technical ability. As Liszt said (a virtuoso) “must call up scent and blossom, and breathe the breath of life. ” Despite recent advances,  how to enable a robot to accurately, naturally and poetically play the piano remains an open and largely unexplored question. While rule-based methods with pure vision achieve some success, we believe learning-based approaches with multi-modal sensory data are ultimately needed for handling complex and delicate performance. In this work, we advocate a learning-based paradigm with vision, audio and touch sensors for robot piano playing that leverages the state-of-the-art reinforcement learning algorithms. Compared with traditional methods, learning-based methods can learn from a large amount of data that are easily accessible and hence automatically learn to interpret pieces that are hard to manually design.



We aim at piano playing with robot hands. While many sheet music pieces provide detailed dynamic and articulation instructions, there is no fixed interpretation due to different styles of pianists. We aim to learn these styles with the robot hand that has multi-modal sensors. Toward this goal, we built our own piano simulator with DIGIT sensors. We present the first and initial effort to let a robot play the piano stylistically.

We have built a suitable environment for piano performance with human-like dexterous hands. In this environment, we also enabled a graphics based touch sensor simulator to mimic the DIGIT sensor in the real world.

On top of the simulator, we have learned an RL agent that can read music scores and provide correct performance in terms of notes and note velocity.  This work will be served as an initial step toward poetic music performance in for robotics.