Doc2Vec Get most similar documents

You need to use infer_vector to get a document vector of the new text – which does not alter the underlying model. Here is how you do it: tokens = “a new sentence to match”.split() new_vector = model.infer_vector(tokens) sims = model.docvecs.most_similar([new_vector]) #gives you top 10 document tags and their cosine similarity Edit: Here is an … Read more

How to use Gensim doc2vec with pre-trained word vectors?

Note that the “DBOW” (dm=0) training mode doesn’t require or even create word-vectors as part of the training. It merely learns document vectors that are good at predicting each word in turn (much like the word2vec skip-gram training mode). (Before gensim 0.12.0, there was the parameter train_words mentioned in another comment, which some documentation suggested … Read more

tech