TY - GEN
T1 - Odinson
T2 - 12th International Conference on Language Resources and Evaluation, LREC 2020
AU - Valenzuela-Escárcega, Marco A.
AU - Hahn-Powell, Gus
AU - Bell, Dane
N1 - Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC
PY - 2020
Y1 - 2020
N2 - We present Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time. In the Odinson query language, a single pattern may combine regular expressions over surface tokens with regular expressions over graphs such as syntactic dependencies. To guarantee the rapid matching of these patterns, our framework indexes most of the necessary information for matching patterns, including directed graphs such as syntactic dependencies, into a custom Lucene index. Indexing minimizes the amount of expensive pattern matching that must take place at runtime. As a result, the runtime system matches a syntax-based graph traversal in 2.8 seconds in a corpus of over 134 million sentences, nearly 150,000 times faster than its predecessor.
AB - We present Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time. In the Odinson query language, a single pattern may combine regular expressions over surface tokens with regular expressions over graphs such as syntactic dependencies. To guarantee the rapid matching of these patterns, our framework indexes most of the necessary information for matching patterns, including directed graphs such as syntactic dependencies, into a custom Lucene index. Indexing minimizes the amount of expensive pattern matching that must take place at runtime. As a result, the runtime system matches a syntax-based graph traversal in 2.8 seconds in a corpus of over 134 million sentences, nearly 150,000 times faster than its predecessor.
KW - Information extraction
KW - Information retrieval
KW - Rule-based methods
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85094763662&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094763662&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85094763662
T3 - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
SP - 2183
EP - 2191
BT - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
A2 - Calzolari, Nicoletta
A2 - Bechet, Frederic
A2 - Blache, Philippe
A2 - Choukri, Khalid
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Piperidis, Stelios
PB - European Language Resources Association (ELRA)
Y2 - 11 May 2020 through 16 May 2020
ER -