Clustering

Dealing with trajectory streams by clustering and mathematical transforms

Nowadays, almost all kind of electronic devices leave traces of their movements (e.g. smartphone, GPS devices and so on). Thus, the huge number of this “tiny” data sources leads to the generation of massive data streams of geo-referenced data. As a …

Hierarchical clustering of XML documents focused on structural components

Clustering XML documents by structure is the task of grouping them by common structural components. Hitherto, this has been accomplished by looking at the occurrence of one preestablished type of structural components in the structures of the XML …

An incremental clustering scheme for data de-duplication

We propose an incremental technique for discovering duplicates in large databases of textual sequences, i.e., syntactically different tuples, that refer to the same real-world entity. The problem is approached from a clustering perspective: given a …

Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

A parameter-free, fully-automatic approach to clustering high-dimensional categorical data is proposed. The technique is based on a two-phase iterative procedure, which attempts to improve the overall quality of the whole partition. In the first …

Mining categories for emails via clustering and pattern discovery

The continuous exchange of information by means of the popular email service has raised the problem of managing the huge amounts of messages received from users in an effective and efficient way. We deal with the problem of email classification by …