DBSCAN for clustering of geographic location data

You can cluster spatial latitude-longitude data with scikit-learn’s DBSCAN without precomputing a distance matrix. db = DBSCAN(eps=2/6371., min_samples=5, algorithm=’ball_tree’, metric=”haversine”).fit(np.radians(coordinates)) This comes from this tutorial on clustering spatial data with scikit-learn DBSCAN. In particular, notice that the eps value is still 2km, but it’s divided by 6371 to convert it to radians. Also, notice that … Read more

scikit-learn DBSCAN memory usage

The problem apparently is a non-standard DBSCAN implementation in scikit-learn. DBSCAN does not need a distance matrix. The algorithm was designed around using a database that can accelerate a regionQuery function, and return the neighbors within the query radius efficiently (a spatial index should support such queries in O(log n)). The implementation in scikit however, … Read more

scikit-learn: Predicting new points with DBSCAN

While Anony-Mousse has some good points (Clustering is indeed not classifying) I think the ability of assigning new points has it’s usefulness. * Based on the original paper on DBSCAN and robertlaytons ideas on github.com/scikit-learn, I suggest running through core points and assigning to the cluster of the first core point that is within eps … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)