Abstract
This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches.
Original language | English (US) |
---|---|
Pages (from-to) | 515-518 |
Number of pages | 4 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 3334 |
DOIs | |
State | Published - 2004 |
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science(all)