TY - GEN
T1 - Automatic Event Coding Framework for Spanish Political News Articles
AU - Salam, Sayeed
AU - Khan, Lamisah
AU - El-Ghamry, Amir
AU - Brandt, Patrick
AU - Holmes, Jennifer
AU - D'Orazio, Vito
AU - Osorio, Javier
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - Today, Spanish speaking countries face widespread political crisis. These political conflicts are published in a large volume of Spanish news articles from Spanish agencies. Our goal is to create a fully functioning system that parses realtime Spanish texts and generates scalable event code. Rather than translating Spanish text into English text and using English event coders, we aim to create a tool that uses raw Spanish text and Spanish event coders for better flexibility, coverage, and cost.To accommodate the processing of a large number of Spanish articles, we adapt a distributed framework based on Apache Spark. We highlight how to extend the existing ontology to provide support for the automated coding process for Spanish texts. We also present experimental data to provide insight into the data collection process with filtering unrelated articles, scaling the framework, and gathering basic statistics on the dataset.
AB - Today, Spanish speaking countries face widespread political crisis. These political conflicts are published in a large volume of Spanish news articles from Spanish agencies. Our goal is to create a fully functioning system that parses realtime Spanish texts and generates scalable event code. Rather than translating Spanish text into English text and using English event coders, we aim to create a tool that uses raw Spanish text and Spanish event coders for better flexibility, coverage, and cost.To accommodate the processing of a large number of Spanish articles, we adapt a distributed framework based on Apache Spark. We highlight how to extend the existing ontology to provide support for the automated coding process for Spanish texts. We also present experimental data to provide insight into the data collection process with filtering unrelated articles, scaling the framework, and gathering basic statistics on the dataset.
KW - Apache Spark
KW - Automated Event Coder
KW - BERT
KW - Multilingual
KW - NLP
KW - Universal Dependency
UR - http://www.scopus.com/inward/record.url?scp=85087910066&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087910066&partnerID=8YFLogxK
U2 - 10.1109/BigDataSecurity-HPSC-IDS49724.2020.00052
DO - 10.1109/BigDataSecurity-HPSC-IDS49724.2020.00052
M3 - Conference contribution
AN - SCOPUS:85087910066
T3 - Proceedings - 2020 IEEE 6th Intl Conference on Big Data Security on Cloud, BigDataSecurity 2020, 2020 IEEE Intl Conference on High Performance and Smart Computing, HPSC 2020 and 2020 IEEE Intl Conference on Intelligent Data and Security, IDS 2020
SP - 246
EP - 253
BT - Proceedings - 2020 IEEE 6th Intl Conference on Big Data Security on Cloud, BigDataSecurity 2020, 2020 IEEE Intl Conference on High Performance and Smart Computing, HPSC 2020 and 2020 IEEE Intl Conference on Intelligent Data and Security, IDS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th IEEE International Conference on Big Data Security on Cloud, BigDataSecurity 2020, 6th IEEE International Conference on High Performance and Smart Computing, HPSC 2020 and 5th IEEE International Conference on Intelligent Data and Security, IDS 2020
Y2 - 25 May 2020 through 27 May 2020
ER -