TY - CHAP
T1 - Clustering similar schema elements across heterogeneous databases
T2 - A first step in database integration
AU - Zhao, Huimin
AU - Ram, Sudha
PY - 2006
Y1 - 2006
N2 - Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on multiple types of features, such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. We describe an SOM prototype we have developed that provides users with a visualization tool for displaying clustering results and for incremental evaluation of potentially similar elements. We also report on some empirical results demonstrating the utility of the proposed approach.
AB - Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on multiple types of features, such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. We describe an SOM prototype we have developed that provides users with a visualization tool for displaying clustering results and for incremental evaluation of potentially similar elements. We also report on some empirical results demonstrating the utility of the proposed approach.
UR - http://www.scopus.com/inward/record.url?scp=33947183147&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33947183147&partnerID=8YFLogxK
U2 - 10.4018/978-1-59140-935-9.ch013
DO - 10.4018/978-1-59140-935-9.ch013
M3 - Chapter
AN - SCOPUS:33947183147
SN - 9781591409359
VL - 5
SP - 227
EP - 248
BT - Advanced Topics in Database Research
PB - IGI Global
ER -