DBSCAN for clustering of geographic location data

You can cluster spatial latitude-longitude data with scikit-learn’s DBSCAN without precomputing a distance matrix.

db = DBSCAN(eps=2/6371., min_samples=5, algorithm='ball_tree', metric="haversine").fit(np.radians(coordinates))

This comes from this tutorial on clustering spatial data with scikit-learn DBSCAN. In particular, notice that the eps value is still 2km, but it’s divided by 6371 to convert it to radians. Also, notice that .fit() takes the coordinates in radian units for the haversine metric.

Leave a Comment