With today's methods, having a robot solve complex multi-step manipulation tasks would either require numerous lengthy demonstrations, or a sequence of carefully choreographed motion plans, often rendering such an approach impractical. We instead build a two-level hierarchical system which can be trained using short snippets of robotic interaction data collected via teleoperation as well as a natural language dataset of high-level instructions paired with low-level tasks, which is significantly easier to create. Our proposed system features a high-level controller which accepts abstract instructions in natural language such as "cook eggplant soup" and sends feasible short-term goals to a low-level controller (e.g. "put pot on stove"), which directly controls the robot's motors. With this system we hope to generalize to new types of high-level instructions which have not been part of the training set, which would constitute a significant step towards more broadly applicable long-horizon robotic manipulation.