TY - GEN
T1 - 3M-Transformers for Event Coding on Organized Crime Domain
AU - Parolin, Erick Skorupa
AU - Khan, Latifur
AU - Osorio, Javier
AU - Brandt, Patrick T.
AU - D’Orazio, Vito
AU - Holmes, Jennifer
N1 - Funding Information:
The research reported herein was supported in part by NSF awards OAC-1931541, OAC-1828467, DMS-1737978, DGE-2039542, and DGE-1906630, ONR awards N00014-17-1-2995 and N00014-20-1-2738, Army Research Office Contract No. W911NF2110032 and IBM faculty award (Research).
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Political scientists and security agencies increasingly rely on computerized event data generation to track conflict processes and violence around the world. However, most of these approaches rely on pattern-matching techniques constrained by large dictionaries that are too costly to develop, update, or expand to emerging domains or additional languages. In this paper, we provide an effective solution to those challenges. Here we develop the 3M-Transformers (Multilingual, Multi-label, Multitask) approach for Event Coding from domain specific multilingual corpora, dispensing external large repositories for such task, and expanding the substantive focus of analysis to organized crime, an emerging concern for security research. Our results indicate that our 3M-Transformers configurations outperform state-of-the-art usual Transformers models (BERT and XLM-RoBERTa) for coding events on actors, actions and locations in English, Spanish, and Portuguese languages.
AB - Political scientists and security agencies increasingly rely on computerized event data generation to track conflict processes and violence around the world. However, most of these approaches rely on pattern-matching techniques constrained by large dictionaries that are too costly to develop, update, or expand to emerging domains or additional languages. In this paper, we provide an effective solution to those challenges. Here we develop the 3M-Transformers (Multilingual, Multi-label, Multitask) approach for Event Coding from domain specific multilingual corpora, dispensing external large repositories for such task, and expanding the substantive focus of analysis to organized crime, an emerging concern for security research. Our results indicate that our 3M-Transformers configurations outperform state-of-the-art usual Transformers models (BERT and XLM-RoBERTa) for coding events on actors, actions and locations in English, Spanish, and Portuguese languages.
KW - Deep neural networks
KW - event coding
KW - Multi-task learning
KW - Natural language processing
KW - Organized crime
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85125359937&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125359937&partnerID=8YFLogxK
U2 - 10.1109/DSAA53316.2021.9564232
DO - 10.1109/DSAA53316.2021.9564232
M3 - Conference contribution
AN - SCOPUS:85125359937
T3 - 2021 IEEE 8th International Conference on Data Science and Advanced Analytics, DSAA 2021
BT - 2021 IEEE 8th International Conference on Data Science and Advanced Analytics, DSAA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2021
Y2 - 6 October 2021 through 9 October 2021
ER -