Extractive Question Answering for Spanish and Arabic Political Text

Sultan Alsarra, Parker Whitehead, Naif Alatrush, Luay Abdeljaber, Latifur Khan, Javier Osorio, Patrick T. Brandt, Vito D’Orazio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This study advances the integration of domain-specific large language models (LLMs) for low-resource languages with applications for question-answering (QA). Leveraging on recent LLMs trained to extract events of political violence and conflict, we introduce ConfliBERT-Arabic and ConfliBERT-Spanish, fine-tuned for extractive QA. Contributions include tailored QA fine-tuning techniques for Arabic and Spanish, curation of five datasets, and a comprehensive performance analysis. These new models provide language and domain-specific enhancements over extant models trained on general corpora. Substantively, these tools allow implementation of high-quality QA about conflict and violence in multiple world regions in their native languages.

Original languageEnglish (US)
Title of host publicationSocial, Cultural, and Behavioral Modeling - 17th International Conference, SBP-BRiMS 2024, Proceedings
EditorsRobert Thomson, Aryn Pyke, Aravind Hariharan, Scott Renshaw, Patrick Park, Samer Al-khateeb, Annetta Burger
PublisherSpringer Science and Business Media Deutschland GmbH
Pages144-153
Number of pages10
ISBN (Print)9783031722400
DOIs
StatePublished - 2024
Event17th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, SBP-BRiMS 2024 - Pittsburgh, United States
Duration: Sep 18 2024Sep 20 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14972 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, SBP-BRiMS 2024
Country/TerritoryUnited States
CityPittsburgh
Period9/18/249/20/24

Keywords

  • Arabic
  • Large language models
  • Natural language processing
  • Question answering
  • Spanish

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Extractive Question Answering for Spanish and Arabic Political Text'. Together they form a unique fingerprint.

Cite this