A hierarchical model-based approach to co-clustering high-dimensional data

Gianni Costa, Giuseppe Manco, Riccardo Ortale

2008

Abstract

We propose a hierarchical, model-based co-clustering framework for handling high-dimensional datasets. The technique views the dataset as a joint probability distribution over row and column variables. Our approach starts by clustering tuples in a dataset, where each cluster is characterized by a different probability distribution. Subsequently, the conditional distribution of attributes over tuples is exploited to discover natural co-clusters in the data. An intensive empirical evaluation highlights the effectiveness of our approach.

Type

Conference paper

Publication

Proceedings of the 2008 ACM Symposium on Applied Computing (SAC), Fortaleza, Ceara, Brazil, March 16-20, 2008