Factorized language representations with knowledge and logic

Knowledge and logical reasoning are essential constituents of spoken and written language. Despite the availability of more compact and accessible representations, such as knowledge graphs (KG) and logic forms, modern NLU technologies predominantly encapsulate knowledge and logical reasoning in vector representations along with other linguistic patterns. Despite its successes, this encapsulation erects barriers between NLU and symbolic technologies developed for knowledge representation and reasoning, which are inherently more transparent, robust, and scalable than their connectionist counterparts.

The purpose of this project is to develop models that can avoid the encapsulation and learn to factorize natural language independently into vector representations and KG (or logic form) representations. In this scheme, the vector representations can capture rich linguistic patterns that are infeasible to enumerate, while the KG (or logic form) representations can facilitate seamless integration with reasoning engines. Both representations are learned from gradients. To overcome the challenge to learning posted by the discrete nature of KG (or logic form) representations, we propose to apply straight-through gradient estimation, an effective technique for learning discrete image representations. The proposed scheme can potentially enable, for example, empathetic chatbots that reason efficiently in the context of large-scale commonsense knowledge graphs. Such contextualized global reasoning will likely improve the chatbot’s generalization, and the transparency and robustness of the reasoning procedure can significantly boost interpretability and ease chatbot maintenance.

This project is generously supported by compute resources from AWS.