Long-Range Understanding and Consistency in Story Generation

This is a continuation of our previous Year 4 collaboration, Coherent and Consistent Long Story Generation.

Participants

Berkeley Advisor: Dan Klein, klein@berkeley.edu

Industry Member: Yuandong Tian, yuandong@fb.com

BAIR Researchers: Kevin Yang, yangk@berkeley.edu

Motivation

While several previous works have aimed to automatically generate stories ranging in length from a few sentences to a couple of paragraphs, few have automatically generated stories of substantially longer length–for example, 2000 words or more. To our knowledge, our previous Year 4 works Re3 and DOC are the first to attempt this task. 

Generating stories of such length involves qualitatively different challenges compared to stories of shorter length. For example, maintaining long-range high-level plot coherence becomes a complex problem requiring strong planning and controlled text generation, and we have made substantial gains in this direction in our previous works. Other thorny problems remain, however, such as long-range consistency of details as well as improving the interestingness / creativity of the generated stories. 

Novelty and Innovation

 We initially attempted to improve long-range consistency by investigating long-context QA, but weren't able to achieve strong results, so we switched to trying to improve the interestingness of generated results. We've since released an initial work in this direction, a general-purpose LM alignment method based on improving RLHF pipelines that use simulated data: "RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment." RLCD trains a preference model using simulated preference pairs that contain both a high-quality and low-quality example, generated using contrasting positive and negative prompts. The preference model is then used to improve a base unaligned language model via reinforcement learning. Empirically, RLCD outperforms RLAIF (Bai et al., 2022) and context distillation (Huang et al., 2022) baselines across three diverse alignment tasks--harmlessness, helpfulness, and story outline generation--and on both 7B and 30B model scales for preference data simulation. We intend to submit RLCD to an upcoming conference.

Technical Objective

We intend to continue our exploration of improving both long-range consistency and interestingness / creativity of generated stories.