Abstract
Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important step in integrating the data sources. This article proposes a cluster analysis based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. The authors apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on a combination of features such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. An SOM prototype the authors have developed provides users with a visualization tool for display of clustering results as well as for incremental evaluation of candidate similar elements.
Original language | English (US) |
---|---|
Pages (from-to) | 88-106 |
Number of pages | 19 |
Journal | Journal of Database Management |
Volume | 15 |
Issue number | 4 |
DOIs | |
State | Published - 2004 |
Keywords
- Attribute correspondence
- Cluster analysis
- Heterogeneous database integration
- Interschema relationship identification
- Schema correspondence
- Self-organizing map
ASJC Scopus subject areas
- Software
- Information Systems
- Hardware and Architecture