Learning-Driven Exploration For Search

Kevin Yang, Tianjun Zhang, Yuandong Tian, Dan Klein


Search/path planning is an important topic with applications to a variety of other domains, e.g., RL, robotics, chemistry, compiler optimization. However, random search is too inefficient for many applications. Classic global approaches such as Bayesian Optimization may struggle with high-dimensional spaces and smaller numbers of samples. Local approaches such as CEM and CMA-ES may struggle to escape local optima. We propose to explore and partition the search space in a learned manner; in fact, we have already demonstrated some improvements in our upcoming NeurIPS paper https://arxiv.org/abs/2106.10544, as shown in the maze task below. 

In the above diagram, a modified version of the MazeS3 environment from MiniWorld, the agent navigates from the orange circle to the red circle with the final reward being proximity to the red circle. Dots indicate final agent positions over 2000 proposed trajectories (green: first iteration, blue: last iteration). (a) Random Shooting doesn’t improve over iterations. (b) CEM gets stuck in a local optimum. (c) Our approach PlaLaM explores using MCTS in a space of latent action sequences with dynamically learned partitioning, and solves the task. 

Novelty and Innovation

Compared to previous search approaches, our search approach would partition the search space in a fully learned manner. In comparison, previous space-partitioning approaches like VOOT recursively partition the search space according to a rule-based approach. Meanwhile, traditional MCTS approaches on an action sequence will partition the search space based on one action at a time. On the other hand, we would consider as search space the full concatenated action sequence (or possibly a latent encoding), and directly partition on this full space according to final reward. In a submission to NeurIPS, we have already demonstrated that such an approach (PlaLaM) works well in some common RL environments. 

Technical Objective

One major focus of ours is to extend learning-driven search to common NLP tasks, where works have found that the data representation can greatly influence the final performance on many tasks. In syntactic parsing, there exist a variety of specialized methods such as dynamic programming based chart parsers and left-to-right shift-reduce parsers . In text generation and translation/summarization the predominant approach is autoregressive left-to-right generation, but there are also efforts toward generating according to e.g., a tree structure. Fundamentally, the end result is a sequence of tokens, and we want to learn to search in this domain, similarly to how we showed our learning-driven search improves over traditional MCTS in some commonly used RL environments. For example, one possibility is to search in the space of concatenated word embeddings of all words in the decoded sequence, although it’s likely we would want to take further advantage of the structure of language as well. Another idea is to generate tokens in a different non-left-to-right order, such as “easy-to-predict” to “hard-to-predict.”The eventual goal would be matching/improving the performance of standard SOTA decoding approaches in one or more standard NLP tasks using learning-driven search approaches.