This is a continuation of our previous Year 4 collaboration, Coherent and Consistent Long Story Generation.
Berkeley Advisor: Dan Klein, firstname.lastname@example.org
Industry Member: Yuandong Tian, email@example.com
BAIR Researchers: Kevin Yang, firstname.lastname@example.org
While several previous works have aimed to automatically generate stories ranging in length from a few sentences to a couple of paragraphs, few have automatically generated stories of substantially longer length–for example, 2000 words or more. To our knowledge, our previous Year 4 works Re3 and DOC are the first to attempt this task.
Generating stories of such length involves qualitatively different challenges compared to stories of shorter length. For example, maintaining long-range high-level plot coherence becomes a complex problem requiring strong planning and controlled text generation, and we have made substantial gains in this direction in our previous works. Other thorny problems remain, however, such as long-range consistency of details as well as improving the interestingness / creativity of the generated stories.
Novelty and Innovation
We initially attempted to improve long-range consistency by investigating long-context QA, but weren't able to achieve strong results, so we switched to trying to improve the interestingness of generated results. We've since released an initial work in this direction, a general-purpose LM alignment method based on improving RLHF pipelines that use simulated data: "RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment." RLCD trains a preference model using simulated preference pairs that contain both a high-quality and low-quality example, generated using contrasting positive and negative prompts. The preference model is then used to improve a base unaligned language model via reinforcement learning. Empirically, RLCD outperforms RLAIF (Bai et al., 2022) and context distillation (Huang et al., 2022) baselines across three diverse alignment tasks--harmlessness, helpfulness, and story outline generation--and on both 7B and 30B model scales for preference data simulation. We intend to submit RLCD to an upcoming conference.
We intend to continue our exploration of improving both long-range consistency and interestingness / creativity of generated stories.