Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language

Peter A. Jansen, Jordan Boyd-Graber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Tamarian, a fictional language introduced in the Star Trek episode Darmok, communicates meaning through utterances of metaphorical references, such as “Darmok and Jalad at Tanagra” instead of “We should work together.” This work assembles a Tamarian-English dictionary of utterances from the original episode and several follow-on novels, and uses this to construct a parallel corpus of 456 English-Tamarian utterances. A machine translation system based on a large language model (T5) is trained using this parallel corpus, and is shown to produce an accuracy of 76% when translating from English to Tamarian on known utterances.

Original languageEnglish (US)
Title of host publicationFLP 2022 - 3rd Workshop on Figurative Language Processing, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages34-38
Number of pages5
ISBN (Electronic)9781959429111
StatePublished - 2022
Event3rd Workshop on Figurative Language Processing, FigLang 2022, as part of EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: Dec 8 2022 → …

Publication series

NameFLP 2022 - 3rd Workshop on Figurative Language Processing, Proceedings of the Workshop

Conference

Conference3rd Workshop on Figurative Language Processing, FigLang 2022, as part of EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period12/8/22 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Artificial Intelligence
  • Computer Science Applications
  • Linguistics and Language

Cite this