Inferring missing metadata from environmental policy texts

Steven Bethard, Egoitz Laparra, Sophia Wang, Yiyun Zhao, Ragheb Al-Ghezi, Aaron Lien, Laura López-Hoffman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The National Environmental Policy Act (NEPA) provides a trove of data on how environmental policy decisions have been made in the United States over the last 50 years. Unfortunately, there is no central database for this information and it is too voluminous to assess manually. We describe our efforts to enable systematic research over US environmental policy by extracting and organizing metadata from the text of NEPA documents. Our contributions include collecting more than 40,000 NEPA-related documents, and evaluating rule-based baselines that establish the difficulty of three important tasks: Identifying lead agencies, aligning document versions, and detecting reused text.

Original languageEnglish (US)
Title of host publicationLaTeCH@NAACL-HLT 2019 - 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages46-51
Number of pages6
ISBN (Electronic)9781950737000
StatePublished - 2019
Event3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH@NAACL-HLT 2019 - Minneapolis, United States
Duration: Jun 7 2019 → …

Publication series

NameLaTeCH@NAACL-HLT 2019 - 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Proceedings

Conference

Conference3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH@NAACL-HLT 2019
Country/TerritoryUnited States
CityMinneapolis
Period6/7/19 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Inferring missing metadata from environmental policy texts'. Together they form a unique fingerprint.

Cite this