gensim – Page 2 – Tarik Billa

How to use Gensim doc2vec with pre-trained word vectors?

May 28, 2023 by Tarik

Note that the “DBOW” (dm=0) training mode doesn’t require or even create word-vectors as part of the training. It merely learns document vectors that are good at predicting each word in turn (much like the word2vec skip-gram training mode). (Before gensim 0.12.0, there was the parameter train_words mentioned in another comment, which some documentation suggested … Read more

How to create a word cloud from a corpus in Python?

May 23, 2023 by Tarik

from wordcloud import WordCloud, STOPWORDS import matplotlib.pyplot as plt stopwords = set(STOPWORDS) def show_wordcloud(data, title = None): wordcloud = WordCloud( background_color=”white”, stopwords=stopwords, max_words=200, max_font_size=40, scale=3, random_state=1 # chosen at random by flipping a coin; it was heads ).generate(str(data)) fig = plt.figure(1, figsize=(12, 12)) plt.axis(‘off’) if title: fig.suptitle(title, fontsize=20) fig.subplots_adjust(top=2.3) plt.imshow(wordcloud) plt.show() show_wordcloud(Samsung_Reviews_Negative[‘Reviews’]) show_wordcloud(Samsung_Reviews_positive[‘Reviews’])

gensim error: ImportError: No module named ‘gensim’

May 16, 2023 by Tarik

Install gensim using: pip install -U gensim Or, if you have instead downloaded and unzipped the source tar.gz package, then run: python setup.py test python setup.py install

PyTorch / Gensim – How do I load pre-trained word embeddings?

May 7, 2023 by Tarik

I just wanted to report my findings about loading a gensim embedding with PyTorch. Solution for PyTorch 0.4.0 and newer: From v0.4.0 there is a new function from_pretrained() which makes loading an embedding very comfortable. Here is an example from the documentation. import torch import torch.nn as nn # FloatTensor containing pretrained weights weight = … Read more

gensim word2vec: Find number of words in vocabulary

April 19, 2023 by Tarik

In recent versions, the model.wv property holds the words-and-vectors, and can itself can report a length – the number of words it contains. So if w2v_model is your Word2Vec (or Doc2Vec or FastText) model, it’s enough to just do: vocab_len = len(w2v_model.wv) If your model is just a raw set of word-vectors, like a KeyedVectors … Read more

gensim Doc2Vec vs tensorflow Doc2Vec

April 17, 2023 by Tarik

Old question, but an answer would be useful for future visitors. So here are some of my thoughts. There are some problems in the tensorflow implementation: window is 1-side size, so window=5 would be 5*2+1 = 11 words. Note that with PV-DM version of doc2vec, the batch_size would be the number of documents. So train_word_dataset … Read more

Doc2vec: How to get document vectors

April 4, 2023 by Tarik

If you want to train Doc2Vec model, your data set needs to contain lists of words (similar to Word2Vec format) and tags (id of documents). It can also contain some additional info (see https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb for more information). # Import libraries from gensim.models import doc2vec from collections import namedtuple # Load data doc1 = [“This is … Read more

Convert word2vec bin file to text

March 2, 2023 by Tarik

I use this code to load binary model, then save the model to text file, from gensim.models.keyedvectors import KeyedVectors model = KeyedVectors.load_word2vec_format(‘path/to/GoogleNews-vectors-negative300.bin’, binary=True) model.save_word2vec_format(‘path/to/GoogleNews-vectors-negative300.txt’, binary=False) References: API and nullege. Note: Above code is for new version of gensim. For previous version, I used this code: from gensim.models import word2vec model = word2vec.Word2Vec.load_word2vec_format(‘path/to/GoogleNews-vectors-negative300.bin’, binary=True) model.save_word2vec_format(‘path/to/GoogleNews-vectors-negative300.txt’, binary=False)

How to calculate the sentence similarity using word2vec model of gensim with python

November 29, 2022 by Tarik

This is actually a pretty challenging problem that you are asking. Computing sentence similarity requires building a grammatical model of the sentence, understanding equivalent structures (e.g. “he walked to the store yesterday” and “yesterday, he walked to the store”), finding similarity not just in the pronouns and verbs but also in the proper nouns, finding … Read more