Coherent and Consistent Long Story Generation

This is a continuation of our previous Year 3 collaboration, Learning-Driven Exploration For Search 

Participants

Berkeley Advisor: Dan Klein, klein@berkeley.edu

Industry Member: Yuandong Tian, yuandong@fb.com

BAIR Researchers: Kevin Yang, yangk@berkeley.edu

Proposed Research 

Related Work + Motivation:

Several recent works have trained models to generate stories from scratch [1, 5]. However, previous works primarily focus on generating very short stories, typically only a few sentences long, even when they themselves are using huge pretrained language models [4]. The focus on short stories is perhaps partially due to the existence of well-curated datasets for training [2]. 

The goal of our project is to leverage large pretrained language models such as GPT3 to generate much longer high-quality stories, on the order of multiple pages of text, which is orders of magnitude longer than those stories generated in previous work. To our knowledge, no previous work has generated very long high-quality stories. Those few works such as [3] which show examples of longer generations are highly disfluent. In our own experiments we observed that GPT3, when directly prompted to generate stories, actually can generate coherent short stories; however, it struggles heavily to generate stories longer than two or three hundred words. 

Novelty and Innovation

Our work focuses on two major problems with generating high-quality long stories which are largely unexplored in prior work. 

Major Problem 1: The story must be coherent, in the sense that it must have a clear overarching structure (beginning, middle, end), despite spanning multiple pages. This is not the case with a simple “rolling window” GPT3 baseline (i.e., simply prompting GPT3 to write a story, and using the previously text as the prompt to continue generation when exceeding the context window of 2048 tokens). Rather, we find it is critical to employ a hierarchical structure in the generation, whereby we first generate a setup and outline for the story, and then prompt GPT3 to expand the individual components. Additionally, we find that it is helpful to employ search procedures such as beam search on multiple levels of generation (e.g., on paragraphs rather, than on tokens as is usually the case). 

Major Problem 2: The story must be internally consistent. That is, the details generated in one paragraph should not contradict anything generated previously. A character introduced with blonde hair in the first paragraph should not suddenly become a brunette two paragraphs later, and if Mary lives in Berkeley she should not later turn out to live in Boston. The internal consistency problem is highly challenging; detecting such contradictions is akin to finding a needle in a haystack due to the overwhelming number of potential false positives. As a result, it is critical to devise a system which detects contradictions while maintaining extraordinarily high precision. Thus far, we have engineered a highly structured system centered on only detecting entity attributes, and we aim to use this system to improve the internal consistency of generated stories. Overall, the internal consistency problem is likely the harder of the two major problems we aim to tackle in this work, and we believe there will remain a great deal of room for future exploration even after our initial attempt in this work. 

Technical Objective

Since automatic evaluation of stories is difficult, our final metric will of course be human evaluation; we will ask humans to rate the coherence and internal consistency of stories generated by our system and baselines, and aim to significantly and substantially improve other the baselines. We also plan to introduce an evaluation set for internal consistency detection systems which we hope can demonstrate the effectiveness or ineffectiveness of different methods for checking internal consistency of stories. 

Collaboration/Sharing of Resources

If the additional computation is needed, we would like to use the Meta AI cluster. 

References

[1] Fan, Angela, Mike Lewis, and Yann Dauphin. "Hierarchical neural story generation." ACL 2018.

[2] Mostafazadeh, Nasrin, et al. "A corpus and cloze evaluation for deeper understanding of commonsense stories." NAACL 2016.

[3] Wang, Rose E., et al. "Language modeling via stochastic processes." ICLR 2022.

[4] Xu, Peng, et al. "MEGATRON-CNTRL: Controllable story generation with external knowledge using large-scale language models." EMNLP 2020.

[5] Yao, Lili, et al. "Plan-and-write: Towards better automatic storytelling." AAAI 2019.