Clustering

Correlation Clustering

Correlation Clustering is a model for clustering data when a binary relationship between data points is known. More precisely, for each pair of points we have two scores measuring the similarity and dissimilarity respectively, of the two points, and we would like to compute an optimal partition where the value of a partition is obtained by summing up scores of pairs involving points from a same cluster and scores of pairs involving points from different clusters. The main difference between Correlation Clustering and, say, k-means is that for the former the number of sets of the solution is not known a priori.

A closely related problem is Consensus Clustering, where we are given a set of partitions and we would like to obtain a partition that best summarizes the input partitions. We have investigate the computational complexity of both problems and we have designed some efficient approximation algorithms.