Spectral Clustering a graph in python

Without much experience with Spectral-clustering and just going by the docs (skip to the end for the results!): Code: import numpy as np import networkx as nx from sklearn.cluster import SpectralClustering from sklearn import metrics np.random.seed(1) # Get your mentioned graph G = nx.karate_club_graph() # Get ground-truth: club-labels -> transform to 0/1 np-array # (possible … Read more

Extracting clusters from seaborn clustermap

While using result.linkage.dendrogram_col or result.linkage.dendrogram_row will currently work, it seems to be an implementation detail. The safest route is to first compute the linkages explicitly and pass them to the clustermap function, which has row_linkage and col_linkage parameters just for that. Replacing the last line in your example (result = …) with the following code … Read more

How to group latitude/longitude points that are ‘close’ to each other?

There are a number of ways of determining the distance between two points, but for plotting points on a 2-D graph you probably want the Euclidean distance. If (x1, y1) represents your first point and (x2, y2) represents your second, the distance is d = sqrt( (x2-x1)^2 + (y2-y1)^2 ) Regarding grouping, you may want … Read more

What makes the distance measure in k-medoid “better” than k-means?

1. K-medoid is more flexible First of all, you can use k-medoids with any similarity measure. K-means however, may fail to converge – it really must only be used with distances that are consistent with the mean. So e.g. Absolute Pearson Correlation must not be used with k-means, but it works well with k-medoids. 2. … Read more

sklearn agglomerative clustering linkage matrix

It’s possible, but it isn’t pretty. It requires (at a minimum) a small rewrite of AgglomerativeClustering.fit (source). The difficulty is that the method requires a number of imports, so it ends up getting a bit nasty looking. To add in this feature: Insert the following line after line 748: kwargs[‘return_distance’] = True Replace line 752 … Read more

Will pandas dataframe object work with sklearn kmeans clustering?

Assuming all the values in the dataframe are numeric, # Convert DataFrame to matrix mat = dataset.values # Using sklearn km = sklearn.cluster.KMeans(n_clusters=5) km.fit(mat) # Get cluster assignment labels labels = km.labels_ # Format results as a DataFrame results = pandas.DataFrame([dataset.index,labels]).T Alternatively, you could try KMeans++ for Pandas.

scikit-learn DBSCAN memory usage

The problem apparently is a non-standard DBSCAN implementation in scikit-learn. DBSCAN does not need a distance matrix. The algorithm was designed around using a database that can accelerate a regionQuery function, and return the neighbors within the query radius efficiently (a spatial index should support such queries in O(log n)). The implementation in scikit however, … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)