cluster-analysis – Page 2

Spectral Clustering a graph in python

December 4, 2023 by Tarik

Without much experience with Spectral-clustering and just going by the docs (skip to the end for the results!): Code: import numpy as np import networkx as nx from sklearn.cluster import SpectralClustering from sklearn import metrics np.random.seed(1) # Get your mentioned graph G = nx.karate_club_graph() # Get ground-truth: club-labels -> transform to 0/1 np-array # (possible … Read more

Clustering Algorithm for Paper Boys

December 1, 2023 by Tarik

I’ve written an inefficient but simple algorithm in Java to see how close I could get to doing some basic clustering on a set of points, more or less as described in the question. The algorithm works on a list if (x,y) coords ps that are specified as ints. It takes three other parameters as … Read more

Extracting clusters from seaborn clustermap

September 23, 2023 by Tarik

While using result.linkage.dendrogram_col or result.linkage.dendrogram_row will currently work, it seems to be an implementation detail. The safest route is to first compute the linkages explicitly and pass them to the clustermap function, which has row_linkage and col_linkage parameters just for that. Replacing the last line in your example (result = …) with the following code … Read more

How to group latitude/longitude points that are ‘close’ to each other?

September 23, 2023 by Tarik

There are a number of ways of determining the distance between two points, but for plotting points on a 2-D graph you probably want the Euclidean distance. If (x1, y1) represents your first point and (x2, y2) represents your second, the distance is d = sqrt( (x2-x1)^2 + (y2-y1)^2 ) Regarding grouping, you may want … Read more

What makes the distance measure in k-medoid “better” than k-means?

September 11, 2023 by Tarik

1. K-medoid is more flexible First of all, you can use k-medoids with any similarity measure. K-means however, may fail to converge – it really must only be used with distances that are consistent with the mean. So e.g. Absolute Pearson Correlation must not be used with k-means, but it works well with k-medoids. 2. … Read more

How does clustering (especially String clustering) work?

September 11, 2023 by Tarik

To understand what clustering is imagine a geographical map. You can see many distinct objects (such as houses). Some of them are close to each other, and others are far. Based on this, you can split all objects into groups (such as cities). Clustering algorithms make exactly this thing – they allow you to split … Read more

sklearn agglomerative clustering linkage matrix

September 2, 2023 by Tarik

It’s possible, but it isn’t pretty. It requires (at a minimum) a small rewrite of AgglomerativeClustering.fit (source). The difficulty is that the method requires a number of imports, so it ends up getting a bit nasty looking. To add in this feature: Insert the following line after line 748: kwargs[‘return_distance’] = True Replace line 752 … Read more

Text clustering with Levenshtein distances

September 1, 2023 by Tarik

Will pandas dataframe object work with sklearn kmeans clustering?

August 31, 2023 by Tarik

Assuming all the values in the dataframe are numeric, # Convert DataFrame to matrix mat = dataset.values # Using sklearn km = sklearn.cluster.KMeans(n_clusters=5) km.fit(mat) # Get cluster assignment labels labels = km.labels_ # Format results as a DataFrame results = pandas.DataFrame([dataset.index,labels]).T Alternatively, you could try KMeans++ for Pandas.

scikit-learn DBSCAN memory usage

August 23, 2023 by Tarik

The problem apparently is a non-standard DBSCAN implementation in scikit-learn. DBSCAN does not need a distance matrix. The algorithm was designed around using a database that can accelerate a regionQuery function, and return the neighbors within the query radius efficiently (a spatial index should support such queries in O(log n)). The implementation in scikit however, … Read more