Overview In recent years, large-scale pre-trained language models (e.g. BERT, BART, GPT3) have been widely adopted in various text generation applications such as machine translation, document summarization, and question answering. However, as previous works [1] analyzed, powerful language models tend to dominate the prediction of conditional generation, and the model is likely to hallucinate only based on the target history. For example in summarization tasks, a conditional generation model may ignore the source texts, and generate summarization which does not exist in...