Python k-means algorithm

Update: (Eleven years after this original answer, it’s probably time for an update.) First off, are you sure you want k-means? This page gives an excellent graphical summary of some different clustering algorithms. I’d suggest that beyond the graphic, look especially at the parameters that each method requires and decide whether you can provide the … Read more

Scikit Learn – K-Means – Elbow – criterion

If the true label is not known in advance(as in your case), then K-Means clustering can be evaluated using either Elbow Criterion or Silhouette Coefficient. Elbow Criterion Method: The idea behind elbow method is to run k-means clustering on a given dataset for a range of values of k (num_clusters, e.g k=1 to 10), and … Read more

plotting results of hierarchical clustering on top of a matrix of data

The question does not define matrix very well: “matrix of values”, “matrix of data”. I assume that you mean a distance matrix. In other words, element D_ij in the symmetric nonnegative N-by-N distance matrix D denotes the distance between two feature vectors, x_i and x_j. Is that correct? If so, then try this (edited June … Read more

Unsupervised clustering with unknown number of clusters

You can use hierarchical clustering. It is a rather basic approach, so there are lots of implementations available. It is for example included in Python’s scipy. See for example the following script: import matplotlib.pyplot as plt import numpy import scipy.cluster.hierarchy as hcluster # generate 3 clusters of each around 100 points and one orphan point … Read more

What is an intuitive explanation of the Expectation Maximization technique? [closed]

Note: the code behind this answer can be found here. Suppose we have some data sampled from two different groups, red and blue: Here, we can see which data point belongs to the red or blue group. This makes it easy to find the parameters that characterise each group. For example, the mean of the … Read more

Difference between classification and clustering in data mining? [closed]

In general, in classification you have a set of predefined classes and want to know which class a new object belongs to. Clustering tries to group a set of objects and find whether there is some relationship between the objects. In the context of machine learning, classification is supervised learning and clustering is unsupervised learning. … Read more

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

Here’s a small kmeans that uses any of the 20-odd distances in scipy.spatial.distance, or a user function. Comments would be welcome (this has had only one user so far, not enough); in particular, what are your N, dim, k, metric ? #!/usr/bin/env python # kmeans.py using any of the 20-odd metrics in scipy.spatial.distance # kmeanssample … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)