How to get the samples in each cluster?

I had a similar requirement and i am using pandas to create a new dataframe with the index of the dataset and the labels as columns. data = pd.read_csv(‘filename’) km = KMeans(n_clusters=5).fit(data) cluster_map = pd.DataFrame() cluster_map[‘data_index’] = data.index.values cluster_map[‘cluster’] = km.labels_ Once the DataFrame is available is quite easy to filter, For example, to filter … Read more

Python k-means algorithm

Update: (Eleven years after this original answer, it’s probably time for an update.) First off, are you sure you want k-means? This page gives an excellent graphical summary of some different clustering algorithms. I’d suggest that beyond the graphic, look especially at the parameters that each method requires and decide whether you can provide the … Read more

Scikit Learn – K-Means – Elbow – criterion

If the true label is not known in advance(as in your case), then K-Means clustering can be evaluated using either Elbow Criterion or Silhouette Coefficient. Elbow Criterion Method: The idea behind elbow method is to run k-means clustering on a given dataset for a range of values of k (num_clusters, e.g k=1 to 10), and … Read more

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

Here’s a small kmeans that uses any of the 20-odd distances in scipy.spatial.distance, or a user function. Comments would be welcome (this has had only one user so far, not enough); in particular, what are your N, dim, k, metric ? #!/usr/bin/env python # kmeans.py using any of the 20-odd metrics in scipy.spatial.distance # kmeanssample … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)