Analysis of Removing Weak Associations During Consensus Clustering

Date
2020-08
Authors
Naran Chirakkal, Ruckiya Sinorina
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Graduate Studies and Research, University of Regina
Abstract

Given multiple base clusterings of a dataset, e.g., as created by multiple clustering algorithms on the same data, consensus clustering aims to generate a single robust aggregated clustering. Consensus methods measure the strength of an association between two data objects based on how often the objects are grouped together by the base clusterings. However, incorporating weak associations in the consensus process can have a negative e ect on the quality of the aggregated clustering. This thesis presents our research on an automatic approach for removing weak associations during the consensus process. In particular, we propose an e cient approach called the WAT approach for removing weak associations, and two methods using the WAT approach, namely WAT(K) and WAT(GMM), are tested in this thesis. We compare our methods to a brute force method used in an existing consensus function, NegMM, which tends to be rather inefficient in terms of runtime. Our empirical analysis on multiple datasets shows that the proposed approach produces consensus clusterings that are comparable in quality to the ones produced by the original NegMM method, yet at a much lower run time. Moreover, this thesis also presents an empirical analysis to study the effect of our approach to remove the weak associations on the CSPA and MCLA consensus functions, which are well-known consensus functions from the literature. Our WAT approach improved the consensus built by CSPA significantly in many cases, but the original MCLA tends to outperform the combination of MCLA with the WAT methods.

Description
A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science, University of Regina. xiii, 95 p.
Keywords
Citation
Collections