In recent years, large-scale pre-trained language models (e.g. BERT, BART, GPT3) have been widely adopted in various text generation applications such as machine translation, document summarization, and question answering. However, as previous works  analyzed, powerful language models tend to dominate the prediction of conditional generation, and the model is likely to hallucinate only based on the target history. For example in summarization tasks, a conditional generation model may ignore the source texts, and generate summarization which does not exist in the input document. Such phenomena will get much worse when fine-tuning language models with limited supervision. Some recent papers  are proposed to detect such hallucination content in a self-supervised manner, however, it is still a challenging problem to combat hallucination and fix the generation model in a principled way.
The goal of this research project is to first build a theoretical framework for conditional sequence generation based on information theory and based on this, we want to analyze the mechanism behind hallucination in pre-trained language models. Next, approaches will be developed based on the proposed theory from the various perspectives including advanced modeling (such as ), training and inference to reduce hallucination for general applications.
 Voita, Elena, Rico Sennrich, and Ivan Titov. "Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation." arXiv preprint arXiv:2010.10907 (2020).
 Zhou, Chunting, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, and Marjan Ghazvininejad. "Detecting hallucinated content in conditional neural sequence generation." arXiv preprint arXiv:2011.02593 (2020).
 Tu, Zhaopeng, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. "Modeling coverage for neural machine translation." arXiv preprint arXiv:1601.04811 (2016).