Analysis of Removing Weak Associations During Consensus Clustering
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Given multiple base clusterings of a dataset, e.g., as created by multiple clustering algorithms on the same data, consensus clustering aims to generate a single robust aggregated clustering. Consensus methods measure the strength of an association between two data objects based on how often the objects are grouped together by the base clusterings. However, incorporating weak associations in the consensus process can have a negative e ect on the quality of the aggregated clustering. This thesis presents our research on an automatic approach for removing weak associations during the consensus process. In particular, we propose an e cient approach called the WAT approach for removing weak associations, and two methods using the WAT approach, namely WAT(K) and WAT(GMM), are tested in this thesis. We compare our methods to a brute force method used in an existing consensus function, NegMM, which tends to be rather inefficient in terms of runtime. Our empirical analysis on multiple datasets shows that the proposed approach produces consensus clusterings that are comparable in quality to the ones produced by the original NegMM method, yet at a much lower run time. Moreover, this thesis also presents an empirical analysis to study the effect of our approach to remove the weak associations on the CSPA and MCLA consensus functions, which are well-known consensus functions from the literature. Our WAT approach improved the consensus built by CSPA significantly in many cases, but the original MCLA tends to outperform the combination of MCLA with the WAT methods.