TY - GEN
T1 - Scientific discovery as link prediction in influence and citation graphs
AU - Luo, Fan
AU - Valenzuela-Escárcega, Marco
AU - Hahn-Powell, Gus
AU - Surdeanu, Mihai
N1 - Funding Information:
Marco Valenzuela-Escárcega, Gus Hahn-Powell, and Mihai Surdeanu declare a financial interest in lum.ai. This interest has been properly disclosed to the University of Arizona Institutional Review Committee and is managed in accordance with its conflict of interest policies. This work was funded by the Bill and Melinda Gates Foundation HBGDki Initiative.
Publisher Copyright:
© 2018 Association for Computational Linguistics.
PY - 2018
Y1 - 2018
N2 - We introduce a machine learning approach for the identification of "white spaces" in scientific knowledge. Our approach addresses this task as link prediction over a graph that contains over 2M influence statements such as "CTCF activates FOXA1", which were automatically extracted using open-domain machine reading. We model this prediction task using graph-based features extracted from the above influence graph, as well as from a citation graph that captures scientific communities. We evaluated the proposed approach through backtesting. Although the data is heavily unbalanced (50 times more negative examples than positives), our approach predicts which influence links will be discovered in the "near future" with a F1 score of 27 points, and a mean average precision of 68%.
AB - We introduce a machine learning approach for the identification of "white spaces" in scientific knowledge. Our approach addresses this task as link prediction over a graph that contains over 2M influence statements such as "CTCF activates FOXA1", which were automatically extracted using open-domain machine reading. We model this prediction task using graph-based features extracted from the above influence graph, as well as from a citation graph that captures scientific communities. We evaluated the proposed approach through backtesting. Although the data is heavily unbalanced (50 times more negative examples than positives), our approach predicts which influence links will be discovered in the "near future" with a F1 score of 27 points, and a mean average precision of 68%.
UR - http://www.scopus.com/inward/record.url?scp=85083660261&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083660261&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85083660261
T3 - NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Student Research Workshop
SP - 1
EP - 6
BT - NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - Student Research Workshop, SRW 2018
Y2 - 2 June 2018 through 4 June 2018
ER -