TY - JOUR
T1 - An intelligent personal spider (agent) for dynamic Internet/Intranet searching
AU - Hsinchun, Chen
AU - Yi-Ming, Chung
AU - Ramsey, Marshall
AU - Yang, Christopher C.
N1 - Funding Information:
We would like to thank University of Arizona Artificial Intelligence Group members for their participation in our experiment. We also thank Prof. Jerome Yen of Hong Kong University and Prof. Pai-chun Ma of Hong Kong University of Science and Technology for their comments and involvement during system development and testing. This project was supported mainly by the following grants: NSF/ARPA/NASA Digital Library Initiative, IRI-9411318, 1994–1998 (B. Schatz, H. Chen et al., Building the Interspace: Digital Library Infrastructure for a University Engineering Community); NSF CISE, IRI-9525790, 1995–1998 (H. Chen, Concept-based Categorization and Search on Internet: a Machine Learning, Parallel Computing Approach); AT&T Foundation Special Purpose Grants in Science and Engineering, 1994–1995 (H. Chen); and National Center for Supercomputing Applications (NCSA), High-performance Computing Resources Grants, 1994–1996 (H. Chen).
Funding Information:
Hsinchun Chen is a Professor of Management Information Systems at the University of Arizona and head of the UA/MIS Artificial Intelligence Group. He is also a Visiting Senior Research Scientist at the National Center for Supercomputing Applications (NCSA). He received an NSF Research Initiation Award in 1992, the Hawaii International Conference on System Sciences (HICSS) Best Paper Award and an AT&T Foundation Award in Science and Engineering in 1994 and 1995. He received the PhD degree in Information Systems from New York University in 1989. Chen has published more than 30 articles covering semantic retrieval, search algorithms, knowledge discovery and collaborative computing. He is a PI of the Illinois Digital Library Initiative project, funded by NSF/ARPA/NASA, 1994–1998 and has received several grants from NSF, DARPA, NASA, NIH and NCSA. He is the guest editor of IEEE Computer special issue on `Building Large-Scale Digital Libraries' and the Journal of the American Society for Information Science special issue on `Artificial Intelligence Techniques for Emerging Information Systems Applications.' His recent work was featured at Science (Computation Cracks `Semantic Barriers' Between Databases, June 7, 1996), NCSA Access Magazine, HPCWire and Business Week.
PY - 1998/5
Y1 - 1998/5
N2 - As Internet services based on the World-Wide Web become more popular, information overload has become a pressing research problem. Difficulties with search on Internet will worsen as the amount of on-line information increases. A scalable approach to Internet search is critical to the success of Internet services and other current and future National Information Infrastructure (Nil) applications. As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent personal spider (agent) approach to Internet searching. The approach, which is grounded on automatic textual analysis and general-purpose search algorithms, is expected to be an improvement over the current static and inefficient Internet searches. In this experiment, we implemented Internet personal spiders based on best first search and genetic algorithm techniques. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the web, based on the links and keyword indexing. A plain, static CGI/HTML-based interface was developed earlier, followed by a recent enhancement of a graphical, dynamic Java-based interface. Preliminary evaluation results and two working prototypes (available for Web access) are presented. Although the examples and evaluations presented are mainly based on Internet applications, the applicability of the proposed techniques to the potentially more rewarding Intranet applications should be obvious. In particular, we believe the proposed agent design can be used to locate organization-wide information, to gather new, time-critical organizational information, and to support team-building and communication in Intranets.
AB - As Internet services based on the World-Wide Web become more popular, information overload has become a pressing research problem. Difficulties with search on Internet will worsen as the amount of on-line information increases. A scalable approach to Internet search is critical to the success of Internet services and other current and future National Information Infrastructure (Nil) applications. As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent personal spider (agent) approach to Internet searching. The approach, which is grounded on automatic textual analysis and general-purpose search algorithms, is expected to be an improvement over the current static and inefficient Internet searches. In this experiment, we implemented Internet personal spiders based on best first search and genetic algorithm techniques. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the web, based on the links and keyword indexing. A plain, static CGI/HTML-based interface was developed earlier, followed by a recent enhancement of a graphical, dynamic Java-based interface. Preliminary evaluation results and two working prototypes (available for Web access) are presented. Although the examples and evaluations presented are mainly based on Internet applications, the applicability of the proposed techniques to the potentially more rewarding Intranet applications should be obvious. In particular, we believe the proposed agent design can be used to locate organization-wide information, to gather new, time-critical organizational information, and to support team-building and communication in Intranets.
KW - Agents
KW - Evolutionary programming
KW - Information retrieval
KW - Internet
KW - Intranet
KW - Java
KW - Machine learning
KW - Semantic retrieval
KW - Spider
KW - World-Wide Web
UR - http://www.scopus.com/inward/record.url?scp=0032068674&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0032068674&partnerID=8YFLogxK
U2 - 10.1016/S0167-9236(98)00035-9
DO - 10.1016/S0167-9236(98)00035-9
M3 - Article
AN - SCOPUS:0032068674
SN - 0167-9236
VL - 23
SP - 41
EP - 58
JO - Decision Support Systems
JF - Decision Support Systems
IS - 1
ER -