wordnet
Using NLTK and WordNet; how do I convert simple tense verb into its present, past or past participle form?
With the help of NLTK this can also be done. It can give the base form of the verb. But not the exact tense, but it still can be useful. Try the following code. from nltk.stem.wordnet import WordNetLemmatizer words = [‘gave’,’went’,’going’,’dating’] for word in words: print word+”–>”+WordNetLemmatizer().lemmatize(word,’v’) The output is: gave–>give went–>go going–>go dating–>date Have … Read more
Python: Semantic similarity score for Strings [duplicate]
The best package I’ve seen for this is Gensim, found at the Gensim Homepage. I’ve used it many times, and overall been very happy with it’s ease of use; it is written in Python, and has an easy to follow tutorial to get you started, which compares 9 strings. It can be installed via pip, … Read more
wordnet lemmatization and pos tagging in python
First of all, you can use nltk.pos_tag() directly without training it. The function will load a pretrained tagger from a file. You can see the file name with nltk.tag._POS_TAGGER: nltk.tag._POS_TAGGER >>> ‘taggers/maxent_treebank_pos_tagger/english.pickle’ As it was trained with the Treebank corpus, it also uses the Treebank tag set. The following function would map the treebank tags … Read more
Stemmers vs Lemmatizers
Q1: “[..] are English stemmers any useful at all today? Since we have a plethora of lemmatization tools for English” Yes. Stemmers are much simpler, smaller, and usually faster than lemmatizers, and for many applications, their results are good enough. Using a lemmatizer for that is a waste of resources. Consider, for example, dimensionality reduction … Read more
How to check if a word is an English word with Python?
For (much) more power and flexibility, use a dedicated spellchecking library like PyEnchant. There’s a tutorial, or you could just dive straight in: >>> import enchant >>> d = enchant.Dict(“en_US”) >>> d.check(“Hello”) True >>> d.check(“Helo”) False >>> d.suggest(“Helo”) [‘He lo’, ‘He-lo’, ‘Hello’, ‘Helot’, ‘Help’, ‘Halo’, ‘Hell’, ‘Held’, ‘Helm’, ‘Hero’, “He’ll”] >>> PyEnchant comes with a … Read more