cosine-similarity – Tarik Billa

Cosine similarity and tf-idf

July 23, 2023 by Tarik

Calculate cosine similarity given 2 sentence strings

January 30, 2023 by Tarik

A simple pure-Python implementation would be: import math import re from collections import Counter WORD = re.compile(r”\w+”) def get_cosine(vec1, vec2): intersection = set(vec1.keys()) & set(vec2.keys()) numerator = sum([vec1[x] * vec2[x] for x in intersection]) sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())]) sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())]) denominator = math.sqrt(sum1) … Read more

What’s the fastest way in Python to calculate cosine similarity given sparse matrix data?

January 29, 2023 by Tarik

You can compute pairwise cosine similarity on the rows of a sparse matrix directly using sklearn. As of version 0.17 it also supports sparse output: from sklearn.metrics.pairwise import cosine_similarity from scipy import sparse A = np.array([[0, 1, 0, 0, 1], [0, 0, 1, 1, 1],[1, 1, 0, 1, 0]]) A_sparse = sparse.csr_matrix(A) similarities = cosine_similarity(A_sparse) … Read more

Cosine Similarity between 2 Number Lists

October 16, 2022 by Tarik

You should try SciPy. It has a bunch of useful scientific routines for example, “routines for computing integrals numerically, solving differential equations, optimization, and sparse matrices.” It uses the superfast optimized NumPy for its number crunching. See here for installing. Note that spatial.distance.cosine computes the distance, and not the similarity. So, you must subtract the … Read more

Can someone give an example of cosine similarity, in a very simple, graphical way? [closed]

October 14, 2022 by Tarik

Here are two very short texts to compare: Julie loves me more than Linda loves me Jane likes me more than Julie loves me We want to know how similar these texts are, purely in terms of word counts (and ignoring word order). We begin by making a list of the words from both texts: … Read more