Enabling Non-Experts to Annotate Complex Logical Forms at Scale

The goal of semantic parsing is to map natural language utterances into logical forms, which will then be executed to fulfill the users’ needs. For example, a user might seek information by asking “What’s the height of the highest mountain in the U.S.”, and the semantic parser will produce an SQL query Select Max(altitude) from Mountain where country = ‘U.S.’, and execute it against a database to produce the answer. Semantic parsers can also be used to formally represent intended actions, track dialogue states, or process data, and they are widely deployed in production systems.

However, this research area suffers from the lack of large, high-quality datasets. While question answering/machine translation datasets typically have > 100K or even 2M data points, most semantic parsing datasets have < 10K data points. To scale up data collection, we propose to use non-expert annotators, who do not understand logical forms. We hope our research can lead to better semantic parsers with lower cost, and that it can be incorporated into tools that allow non-experts to collect data and build their own semantic parsers. Our project will make use of recent advances in natural language processing, programming languages, and active learning to obtain indirect supervisions from the non-experts, thus significantly lowering the bar for data annotation. 

This project is generously supported by compute resources from Microsoft.

Researchers: Ruiqi Zhong, Jason Eisner, Dan Klein