Vector Space Model: Cosine Similarity vs Euclidean Distance

One informal but rather intuitive way to think about this is to consider the 2 components of a vector: direction and magnitude.

Direction is the “preference”https://stackoverflow.com/”style”https://stackoverflow.com/”sentiment”https://stackoverflow.com/”latent variable” of the vector, while the magnitude is how strong it is towards that direction.

When classifying documents we’d like to categorize them by their overall sentiment, so we use the angular distance.

Euclidean distance is susceptible to documents being clustered by their L2-norm (magnitude, in the 2 dimensional case) instead of direction. I.e. vectors with quite different directions would be clustered because their distances from origin are similar.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)