How can I find the center of a cluster of data points?

The following solution works even if the points are scattered all over the Earth, by converting latitude and longitude to Cartesian coordinates. It does a kind of KDE (kernel density estimation), but in a first pass the sum of kernels is evaluated only at the data points. The kernel should be chosen to fit the … Read more

Python Implementation of OPTICS (Clustering) Algorithm

I’m not aware of a complete and exact python implementation of OPTICS. The links posted here seem just rough approximations of the OPTICS idea. They also do not use an index for acceleration, so they will run in O(n^2) or more likely even O(n^3). OPTICS has a number of tricky things besides the obvious idea. … Read more

What makes the distance measure in k-medoid “better” than k-means?

1. K-medoid is more flexible First of all, you can use k-medoids with any similarity measure. K-means however, may fail to converge – it really must only be used with distances that are consistent with the mean. So e.g. Absolute Pearson Correlation must not be used with k-means, but it works well with k-medoids. 2. … Read more

How do I extract keywords used in text? [closed]

This is an open question in NLP, so there is no simple answer. My recommendation for quick-and-dirty “works-for-me” is topia.termextract. Yahoo has a keyword extraction service (http://developer.yahoo.com/search/content/V1/termExtraction.html) which is low recall but high precision. In other words, it gives you a small number of high quality terms, but misses many of the terms in your … Read more

scikit-learn DBSCAN memory usage

The problem apparently is a non-standard DBSCAN implementation in scikit-learn. DBSCAN does not need a distance matrix. The algorithm was designed around using a database that can accelerate a regionQuery function, and return the neighbors within the query radius efficiently (a spatial index should support such queries in O(log n)). The implementation in scikit however, … Read more

Can anyone give a real life example of supervised learning and unsupervised learning? [closed]

Supervised learning: You get a bunch of photos with information about what is on them and then you train a model to recognize new photos. You have a bunch of molecules and information about which are drugs and you train a model to answer whether a new molecule is also a drug. Unsupervised learning: You … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)