Writing with Speech — Using LLMs for Gist-Level Manipulation of Spoken Text

Dictation provides a more efficient text input method for mobile devices. However, using speech for writing can lead to verbose, incoherent, and inconsistent text, necessitating substantial editing. Our project creates Rambler, an LLM-integrated user interface designed for conceptual-level editing of dictated content through two main sets of functions: gist extraction and macro revision. Gist extraction generates summaries and keywords to support the interaction with and review of spoken text. LLM-powered macro revisions allow users to split, merge, respeak, and transform dictated text without specifying precise editing locations. Together, these features promote an interactive approach to dictation and editing, bridging the divide between impromptu speech and polished written output. In an experimental comparison involving 12 participants tasked with long form verbal composition, Rambler surpassed the standard speech-to-text editor combined with ChatGPT. This was due to Rambler's ability to better support iterative edits, offer users a greater degree of control over their content, and accommodate a wide range of user approaches.

Researchers

  • Susan Lin, UC Berkeley
  • Jeremy Warner, UC Berkeley
  • J.D. Zamfirescu, UC Berkeley
  • Matthew Lee, UC Berkeley
  • Sauhard Jain, UC Berkeley
  • Bjoern Hartmann, UC Berkeley
  • Can Liu, UC Berkeley & City University of Hong Kong
  • Michael Xuelin Huang, Google
  • Piyawat Lertvittayakumjorn, Google
  • Shanqing Cai, Google
  • Shumin Zhai, Google