A Dataset and Evaluation Framework for Complex Geographical Description Parsing

Egoitz Laparra, Steven Bethard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

Much previous work on geoparsing has focused on identifying and resolving individual toponyms in text like Adrano, S.Maria di Licodia or Catania. However, geographical locations occur not only as individual toponyms, but also as compositions of reference geolocations joined and modified by connectives, e.g., “. . . between the towns of Adrano and S.Maria di Licodia, 32 kilometres northwest of Catania”. Ideally, a geoparser should be able to take such text, and the geographical shapes of the toponyms referenced within it, and parse these into a geographical shape, formed by a set of coordinates, that represents the location described. But creating a dataset for this complex geoparsing task is difficult and, if done manually, would require a huge amount of effort to annotate the geographical shapes of not only the geolocation described but also the reference toponyms. We present an approach that automates most of the process by combining Wikipedia and OpenStreetMap. As a result, we have gathered a collection of 360,187 uncurated complex geolocation descriptions, from which we have manually curated 1,000 examples intended to be used as a test set. To accompany the data, we define a new geoparsing evaluation framework along with a scoring methodology and a set of baselines.

Original languageEnglish (US)
Title of host publicationCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
EditorsDonia Scott, Nuria Bel, Chengqing Zong
PublisherAssociation for Computational Linguistics (ACL)
Pages936-948
Number of pages13
ISBN (Electronic)9781952148279
DOIs
StatePublished - 2020
Event28th International Conference on Computational Linguistics, COLING 2020 - Virtual, Online, Spain
Duration: Dec 8 2020Dec 13 2020

Publication series

NameCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference

Conference

Conference28th International Conference on Computational Linguistics, COLING 2020
Country/TerritorySpain
CityVirtual, Online
Period12/8/2012/13/20

ASJC Scopus subject areas

  • Computer Science Applications
  • Computational Theory and Mathematics
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'A Dataset and Evaluation Framework for Complex Geographical Description Parsing'. Together they form a unique fingerprint.

Cite this