The partitional clustering concept started with K-means algorithm which was published in 1957. Since then many classical partitional clustering algorithms have been reported based on gradient descent approach. The 1990 kick started a new era in cluster analysis with the application of nature inspired metaheuristics. After initial formulation nearly two decades have passed and researchers have developed numerous new algorithms in this field. This paper embodies an up-to-date review of all major nature inspired metaheuristic algorithms employed till date for partitional clustering. Further, key issues involved during formulation of various metaheuristics as a clustering problem and major application areas are discussed.
Data clustering is one of the most popular techniques in data mining. It is a process of partitioning an unlabeled dataset into groups, where each group contains objects which are similar to each other with respect to a certain similarity measure and different from those of other groups. Clustering high-dimensional data is the cluster analysis of data which have anywhere from a few dozen to many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas such as medicine, bioinformatics, biology, recommendation systems and the clustering of text documents. Many algorithms for large data sets have been proposed in the literature using different techniques. However, conventional algorithms have some shortcomings such as the slowness of their convergence and their sensitivity to initialization values. Particle Swarm Optimization (PSO) is a population-based globalized search algorithm that uses the principles of the social behavior of swarms. PSO produces better results in complicated and multi-peak problems. This paper presents a literature survey on the PSO algorithm and its variants to clustering high-dimensional data. An attempt is made to provide a guide for the researchers who are working in the area of PSO and high-dimensional data clustering.
Applying k-Means to minimize the sum of the intra-cluster variances is the most popular clustering approach. However, after a bad initialization, poor local optima can be easily obtained. To tackle the initialization problem of k-Means, we propose the MinMax k-Means algorithm, a method that assigns weights to the clusters relative to their variance and optimizes a weighted version of the k-Means objective. Weights are learned together with the cluster assignments, through an iterative procedure. The proposed weighting scheme limits the emergence of large variance clusters and allows high quality solutions to be systematically uncovered, irrespective of the initialization. Experiments verify the effectiveness of our approach and its robustness over bad initializations, as it compares favorably to both k-Means and other methods from the literature that consider the k-Means initialization problem.
This paper is the second part of a two-part paper, which is a survey of multiobjective evolutionary algorithms for data mining problems. In Part I , multiobjective evolutionary algorithms used for feature selection and classification have been reviewed. In this part, different multiobjective evolutionary algorithms used for clustering, association rule mining, and other data mining tasks are surveyed. Moreover, a general discussion is provided along with scopes for future research in the domain of multiobjective evolutionary algorithms for data mining.
Optimization based pattern discovery has emerged as an important field in knowledge discovery and data mining (KDD), and has been used to enhance the efficiency and accuracy of clustering, classification, association rules and outlier detection. Cluster analysis, which identifies groups of similar data items in large datasets, is one of its recent beneficiaries. The increasing complexity and large amounts of data in the datasets have seen data clustering emerge as a popular focus for the application of optimization based techniques. Different optimization techniques have been applied to investigate the optimal solution for clustering problems. Swarm intelligence (SI) is one such optimization technique whose algorithms have successfully been demonstrated as solutions for different data clustering domains. In this paper we investigate the growth of literature in SI and its algorithms, particularly Particle Swarm Optimization (PSO). This paper makes two major contributions. Firstly, it provides a thorough literature overview focusing on some of the most cited techniques that have been used for PSO-based data clustering. Secondly, we analyze the reported results and highlight the performance of different techniques against contemporary clustering techniques. We also provide an brief overview of our PSO-based hierarchical clustering approach (HPSO-clustering) and compare the results with traditional hierarchical agglomerative clustering (HAC), K-means, and PSO clustering.