TY - JOUR
T1 - Heuristic shortest hyperpaths in cell signaling hypergraphs
AU - Krieger, Spencer
AU - Kececioglu, John
N1 - Funding Information:
We especially wish to thank T.M. Murali for introducing us to the problem of shortest hyperpaths in cell-signaling hypergraphs, for orienting us to the biology literature, and for discussing the JUP/DSP biological example. In addition, we thank Anna Ritz for discussing the NCI-PID and Reactome datasets, and for providing the BioPax parser. We also thank the anonymous reviewers for their helpful comments. This paper is an extended journal version of a prior conference paper by the coauthors [39].
Funding Information:
This research was supported by the US National Science Foundation through grants CCF-1617192 and IIS-2041613 to JK.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs, where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. The current state-of-the-art for shortest hyperpaths in cell signaling hypergraphs solves a mixed-integer linear program to find an optimal hyperpath that is restricted to be acyclic, and offers no efficiency guarantees. Results: We present, for the first time, a heuristic for general shortest hyperpaths that properly handles cycles, and is guaranteed to be efficient. We show the heuristic finds provably optimal hyperpaths for the class of singleton-tail hypergraphs, and also give a practical algorithm for tractably generating all source-sink hyperpaths. The accuracy of the heuristic is demonstrated through comprehensive experiments on all source-sink instances from the standard NCI-PID and Reactome pathway databases, which show it finds a hyperpath that matches the state-of-the-art mixed-integer linear program on over 99% of all instances that are acyclic. On instances where only cyclic hyperpaths exist, the heuristic surpasses the state-of-the-art, which finds no solution; on every such cyclic instance, enumerating all source-sink hyperpaths shows the solution found by the heuristic was in fact optimal. Conclusions: The new shortest hyperpath heuristic is both fast and accurate. This makes finding source-sink hyperpaths, which in general may contain cycles, now practical for real cell signaling networks. Availability: Source code for the hyperpath heuristic in a new tool we call Hhugin (as well as for hyperpath enumeration, and all dataset instances) is available free for non-commercial use at http://hhugin.cs.arizona.edu.
AB - Background: Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs, where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. The current state-of-the-art for shortest hyperpaths in cell signaling hypergraphs solves a mixed-integer linear program to find an optimal hyperpath that is restricted to be acyclic, and offers no efficiency guarantees. Results: We present, for the first time, a heuristic for general shortest hyperpaths that properly handles cycles, and is guaranteed to be efficient. We show the heuristic finds provably optimal hyperpaths for the class of singleton-tail hypergraphs, and also give a practical algorithm for tractably generating all source-sink hyperpaths. The accuracy of the heuristic is demonstrated through comprehensive experiments on all source-sink instances from the standard NCI-PID and Reactome pathway databases, which show it finds a hyperpath that matches the state-of-the-art mixed-integer linear program on over 99% of all instances that are acyclic. On instances where only cyclic hyperpaths exist, the heuristic surpasses the state-of-the-art, which finds no solution; on every such cyclic instance, enumerating all source-sink hyperpaths shows the solution found by the heuristic was in fact optimal. Conclusions: The new shortest hyperpath heuristic is both fast and accurate. This makes finding source-sink hyperpaths, which in general may contain cycles, now practical for real cell signaling networks. Availability: Source code for the hyperpath heuristic in a new tool we call Hhugin (as well as for hyperpath enumeration, and all dataset instances) is available free for non-commercial use at http://hhugin.cs.arizona.edu.
KW - Systems biology
KW - cell signaling networks
KW - directed hypergraphs
KW - efficient heuristics
KW - hyperpath enumeration
KW - reaction pathways
KW - shortest hyperpaths
UR - http://www.scopus.com/inward/record.url?scp=85130698933&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130698933&partnerID=8YFLogxK
U2 - 10.1186/s13015-022-00217-9
DO - 10.1186/s13015-022-00217-9
M3 - Article
AN - SCOPUS:85130698933
SN - 1748-7188
VL - 17
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
M1 - 12
ER -