TY - GEN
T1 - A comparison of statistical and rule-induction learners for automatic tagging of time expressions in English
AU - Poveda, Jordi
AU - Surdeanu, Mihai
AU - Turmo, Jordi
PY - 2007
Y1 - 2007
N2 - Proper recognition and handling of temporal information contained in a text is key to understanding the flow of events depicted in the text and their accompanying circumstances. Consequently, time expression recognition and representation of the time information they convey in a suitable normalized form is an important task relevant to several problems in Natural Language Processing. In particular, such an analysis is largely significant for Information Extraction (IE), Question Answering (QA) and Automatic Summarization (AS). The most common approach to time expression recognition in the past has been the use of handmade extraction rules (grammars), which also served as the basis for normalization. Our aim is to explore the possibilities afforded by applying machine learning techniques to the recognition of time expressions. We focus on recognizing the appearances of time expressions in text (not normalization) and transform the problem into one of chunking, where the aim is to correctly assign Begin, Inside or Outside (BIO) tags to tokens. In this paper, we explain the knowledge representation used and compare the results obtained in our experiments with two different methods, one statistical (support vector machines) and one of rule induction (FOIL). Our empirical analysis shows that SVMs are superior.
AB - Proper recognition and handling of temporal information contained in a text is key to understanding the flow of events depicted in the text and their accompanying circumstances. Consequently, time expression recognition and representation of the time information they convey in a suitable normalized form is an important task relevant to several problems in Natural Language Processing. In particular, such an analysis is largely significant for Information Extraction (IE), Question Answering (QA) and Automatic Summarization (AS). The most common approach to time expression recognition in the past has been the use of handmade extraction rules (grammars), which also served as the basis for normalization. Our aim is to explore the possibilities afforded by applying machine learning techniques to the recognition of time expressions. We focus on recognizing the appearances of time expressions in text (not normalization) and transform the problem into one of chunking, where the aim is to correctly assign Begin, Inside or Outside (BIO) tags to tokens. In this paper, we explain the knowledge representation used and compare the results obtained in our experiments with two different methods, one statistical (support vector machines) and one of rule induction (FOIL). Our empirical analysis shows that SVMs are superior.
UR - http://www.scopus.com/inward/record.url?scp=47349120576&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47349120576&partnerID=8YFLogxK
U2 - 10.1109/TIME.2007.38
DO - 10.1109/TIME.2007.38
M3 - Conference contribution
AN - SCOPUS:47349120576
SN - 0769528368
SN - 9780769528366
T3 - Proceedings of the International Workshop on Temporal Representation and Reasoning
SP - 141
EP - 149
BT - Proceedings - 14th International Symposium on Temporal Representation and Reasoning, TIME 2007
T2 - 14th International Symposium on Temporal Representation and Reasoning, TIME 2007
Y2 - 28 June 2007 through 30 June 2007
ER -