Jitendra Malik

On the Benefits of 3D Pose and Tracking for Human Action Recognition

lart1

In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, we take the Lagrangian view on analysing actions over a trajectory of human motion rather than at a fixed point in space. Taking this stand allows us to use the tracklets of people to predict their actions. In this spirit, first we show the benefits of using 3D pose to infer actions, and study person-...

Learning Dexterous In-Hand Manipulation with Vision and Touch

Overview

Consider the task of stacking LEGO bricks or assembling IKEA furniture in Figure. Given a goal image configuration, humans can rapidly figure out a plan to accurately manipulate the LEGO bricks or furniture parts to achieve the goal. This is mainly due to: 1) humans are already equipped with a good mental dynamics model through daily interaction with objects,...

Amazon-Berkeley Objects: A Large-Scale Dataset for 3D Object Understanding

Overview

Collecting large amounts of high-quality 3D annotations (such as voxels or meshes) for individual real-world objects poses a challenge. One way around this problem is to focus only on synthetic, computer-aided design models. This has the advantage that the data is large in scale, but most objects are untextured and there is no guarantee that the object may exist in...

Visual Locomotion: Synergistic Approach via Perception and Proprioception

Figure Description: We demonstrate the performance of our new algorithm RMA in several challenging environments. The robot is successfully able to walk on sand, mud, hiking trails, tall grass, and dirt pile without a single failure in all our trials. The robot was successful in 70% of the trials when walking down stairs along a hiking trail, and succeeded in 80% of the trials when walking across...

Long Term Video Understanding

While comprehending the long range temporal structure of events in a video stream is a fundamental problem in computer vision, little progress has been made towards this goal beyond short range comprehension. The roadblocks to progress include (1) Severe memory constraints imposed by on device RAM on GPUs (2) Lack of effective learning techniques that can avoid...

Large-scale 3D Reconstruction from Multi-view Image Datasets

This ongoing project attempts at using large scale multi-view datasets available online to build a multi-view 3D reconstruction approach that works on wide-baseline images.

Researchers Shubham Goel, UC Berkeley, https://...