TY - JOUR
T1 - An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents
AU - Lin, Chung Hsin
AU - Chen, Hsinchun
N1 - Funding Information:
Manuscript received March 5, 1993; revised April 24, 1994, and October 7, 1994. This project was supported in part by a grant awarded by the International Program Development Fund, University of Arizona, 1992-1993, and a Research Initiation Award grant awarded by the Division of In- formation, Robotics, and Intelligent Systems, National Science Foundation (IRI-9211418), 1992-1994. The authors are with the Department of Management Information Systems, Karl Eller Graduate School of Management, University of Arizona, Tucson, AZ 85721. Publisher Item Identifier S 1083-4419(96)00409-8. I English references are listed in arabic numbers. For Chinese references, the reference number is prefixed with ‘C’, e.g., [Cl].
PY - 1996
Y1 - 1996
N2 - An automatic indexing and concept classification approach to a multilingual (Chinese and English) bibliographic database is presented. We introduced a multi-linear term-phrasing technique to extract concept descriptors (terms or keywords) from a Chinese-English bibliographic database. A concept space of related descriptors was then generated using a co-occurrence analysis technique. Like a man-made thesaurus, the system-generated concept space can be used to generate additional semantically-relevant terms for search. For concept classification and clustering, a variant of a Hopfield neural network was developed to cluster similar concept descriptors and to generate a small number of concept groups to represent (summarize) the subject matter of the database. The concept space approach to information classification and retrieval has been adopted by the authors in other scientific databases and business applications, but multilingual information retrieval presents a unique challenge. This research reports our experiment on multilingual databases. Our system was initially developed in the MS-DOS environment, running ETEN Chinese operating system. For performance reasons, it was then tested on a UNIX-based system. Due to the unique ideographic nature of the Chinese language, a Chinese term-phrase indexing paradigm considering the ideographic characteristics of Chinese was developed as a multilingual information classification model. By applying the neural network based concept classification technique, the model presents a novel way of organizing unstructured multilingual information.
AB - An automatic indexing and concept classification approach to a multilingual (Chinese and English) bibliographic database is presented. We introduced a multi-linear term-phrasing technique to extract concept descriptors (terms or keywords) from a Chinese-English bibliographic database. A concept space of related descriptors was then generated using a co-occurrence analysis technique. Like a man-made thesaurus, the system-generated concept space can be used to generate additional semantically-relevant terms for search. For concept classification and clustering, a variant of a Hopfield neural network was developed to cluster similar concept descriptors and to generate a small number of concept groups to represent (summarize) the subject matter of the database. The concept space approach to information classification and retrieval has been adopted by the authors in other scientific databases and business applications, but multilingual information retrieval presents a unique challenge. This research reports our experiment on multilingual databases. Our system was initially developed in the MS-DOS environment, running ETEN Chinese operating system. For performance reasons, it was then tested on a UNIX-based system. Due to the unique ideographic nature of the Chinese language, a Chinese term-phrase indexing paradigm considering the ideographic characteristics of Chinese was developed as a multilingual information classification model. By applying the neural network based concept classification technique, the model presents a novel way of organizing unstructured multilingual information.
UR - http://www.scopus.com/inward/record.url?scp=0030085144&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030085144&partnerID=8YFLogxK
U2 - 10.1109/3477.484439
DO - 10.1109/3477.484439
M3 - Short survey
C2 - 18263007
AN - SCOPUS:0030085144
SN - 1083-4419
VL - 26
SP - 75
EP - 88
JO - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
JF - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
IS - 1
ER -