![]() Outlier removal can also be integrated in the clustering directly by modifying the objective function. After the pre-processing steps, the main challenge is to optimize the clustering so that the objective function would be minimized. We use the centroid index (CI) as our primary measure of success. It counts how many real clusters are missing a prototype, and how many have too many prototypes. The CI-value is the higher of these two numbers. 1 where four real clusters are missing a prototype. This value provides a clear intuition about the result. Specifically, if CI = 0, the result is correct clustering. Sometimes we normalize CI by the number of clusters, and report the relative CI-value (CI/ k). If the ground truth is not available, the result can be compared with the global minimum (if available), or with the best available solution used as gold standard. On the other hand, the correct locations of the prototypes can be solved by a sequence of prototype swaps, and leaving the fine-tuning of their exact location to k-means. ![]() 2, only one swap is needed to fix the solution. An important observation is that it is not even necessary to swap one of the redundant prototypes but simply removing any prototype in their immediate neighborhood is enough since k-means can fine-tune their exact location locally. Also, the exact location where the prototype is relocated is not important, as long as it is in the immediate neighborhood where the prototype is needed.
0 Comments
Leave a Reply. |