TY - JOUR
T1 - Using content-based and link-based analysis in building vertical search engines
AU - Chau, Michael
AU - Chen, Hsinchun
PY - 2004
Y1 - 2004
N2 - This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches.
AB - This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches.
UR - http://www.scopus.com/inward/record.url?scp=85088188309&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088188309&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-30544-6_58
DO - 10.1007/978-3-540-30544-6_58
M3 - Article
AN - SCOPUS:85088188309
SN - 0302-9743
VL - 3334
SP - 515
EP - 518
JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ER -