Clustering Relational Data: A Transactional Approach

Abstract

A methodology for clustering multi-relational data is proposed. Initially, tuple linkages in the database schema of the multi-relational entities are leveraged to virtually organize the available relational data into as many transactions, i.e. sets of feature-value pairs. The identified transactions are then partitioned into homogeneous groups. Each discovered cluster is equipped with a representative, that provides an explanation of the corresponding group of transactions, in terms of those feature-value pairs that are most likely to appear in a transaction belonging to that particular group. Outlier data are placed into a trash cluster, that is finally partitioned to mitigate the dissimilarity between the trash cluster and the previously generated clusters.

Publication
ICTAI 2009, 21st IEEE International Conference on Tools with Artificial Intelligence, Newark, New Jersey, USA, 2-4 November 2009