levenshtein-distance
Improving search result using Levenshtein distance in Java
Without understanding the meaning of the words like @DrYap suggests, the next logical unit to compare two words (if you are not looking for misspellings) is syllables. It is very easy to modify Levenshtein to compare syllables instead of characters. The hard part is breaking the words into syllables. There is a Java implementation TeXHyphenator-J … Read more
Most efficient way to calculate Levenshtein distance
The wikipedia entry on Levenshtein distance has useful suggestions for optimizing the computation — the most applicable one in your case is that if you can put a bound k on the maximum distance of interest (anything beyond that might as well be infinity!) you can reduce the computation to O(n times k) instead of … Read more
How can I optimize this Python code to generate all words with word-distance 1?
If your wordlist is very long, might it be more efficient to generate all possible 1-letter-differences from ‘word’, then check which ones are in the list? I don’t know any Python but there should be a suitable data structure for the wordlist allowing for log-time lookups. I suggest this because if your words are reasonable … Read more
How to add levenshtein function in mysql?
I have connected to my MySQL server and simply executed this statement in MySQL Workbench, and it simply worked – I now have new function levenshtein(). For example, this works as expected: SELECT levenshtein(‘abcde’, ‘abced’) 2
Best machine learning technique for matching product strings
My first thought is to try to parse the names into a description of features (company LG, size 42 Inch, resolution 1080p, type LCD HDTV). Then you can match these descriptions against each other for compatibility; it’s okay to omit a product number but bad to have different sizes. Simple are-the-common-attributes-compatible might be enough, or … Read more
Implementing a simple Trie for efficient Levenshtein Distance calculation – Java
From what I can tell you don’t need to improve the efficiency of Levenshtein Distance, you need to store your strings in a structure that stops you needing to run distance computations so many times i.e by pruning the search space. Since Levenshtein distance is a metric, you can use any of the metric spaces … Read more