String similarity score/hash

I believe what you’re looking for is called a Locality Sensitive Hash. Whereas most hash algorithms are designed such that small variations in input cause large changes in output, these hashes attempt the opposite: small changes in input generate proportionally small changes in output.

As others have mentioned, there are inherent issues with forcing a multi-dimensional mapping into a 2-dimensional mapping. It’s analogous to creating a flat map of the Earth… you can never accurately represent a sphere on a flat surface. Best you can do is find a LSH that is optimized for whatever feature it is you’re using to determine whether strings are “alike”.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)