ConfliBERT-Spanish: A Pre-trained Spanish Language Model for Political Conflict and Violence

Wooseong Yang, Sultan Alsarra, Luay Abdeljaber, Niamat Zawad, Zeinab Delaram, Javier Osorio, Latifur Khan, Patrick T. Brandt, Vito D'Orazio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This article introduces ConfliBERT-Spanish, a pre-trained language model specialized in political conflict and violence for text written in the Spanish language. Our methodology relies on a large corpus specialized in politics and violence to extend the capacity of pre-trained models capable of processing text in Spanish. We assess the performance of ConfliBERT-Spanish in comparison to Multilingual BERT and BETO baselines for binary classification, multi-label classification, and named entity recognition. Results show that ConfliBERT-Spanish consistently outperforms baseline models across all tasks. These results show that our domain-specific language-specific cyberinfrastructure can greatly enhance the performance of NLP models for Latin American conflict analysis. This methodological advancement opens vast opportunities to help researchers and practitioners in the security sector to effectively analyze large amounts of information with high degrees of accuracy, thus better equipping them to meet the dynamic and complex security challenges affecting the region.

Original languageEnglish (US)
Title of host publicationCiSt 2023 - 7th IEEE International Congress on Information Science and Technology
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages287-292
Number of pages6
ISBN (Electronic)9781665461337
DOIs
StatePublished - 2023
Event7th IEEE International Congress on Information Science and Technology, CiSt 2023 - Agadir - Essaouira, Morocco
Duration: Dec 16 2023Dec 22 2023

Publication series

NameColloquium in Information Science and Technology, CIST
ISSN (Print)2327-185X
ISSN (Electronic)2327-1884

Conference

Conference7th IEEE International Congress on Information Science and Technology, CiSt 2023
Country/TerritoryMorocco
CityAgadir - Essaouira
Period12/16/2312/22/23

Keywords

  • BERT
  • Conflict
  • Deep Learning
  • Machine Learning
  • NLP
  • Politics
  • Spanish
  • Violence

ASJC Scopus subject areas

  • Computer Science Applications
  • Signal Processing
  • Information Systems and Management
  • Management Science and Operations Research

Fingerprint

Dive into the research topics of 'ConfliBERT-Spanish: A Pre-trained Spanish Language Model for Political Conflict and Violence'. Together they form a unique fingerprint.

Cite this