What is the best stemming method in Python?

The results you are getting are (generally) expected for a stemmer in English. You say you tried “all the nltk methods” but when I try your examples, that doesn’t seem to be the case.

Here are some examples using the PorterStemmer

import nltk
ps = nltk.stemmer.PorterStemmer()
ps.stem('grows')
'grow'
ps.stem('leaves')
'leav'
ps.stem('fairly')
'fairli'

The results are ‘grow’, ‘leav’ and ‘fairli’ which, even if they are what you wanted, are stemmed versions of the original word.

If we switch to the Snowball stemmer, we have to provide the language as a parameter.

import nltk
sno = nltk.stem.SnowballStemmer('english')
sno.stem('grows')
'grow'
sno.stem('leaves')
'leav'
sno.stem('fairly')
'fair'

The results are as before for ‘grows’ and ‘leaves’ but ‘fairly’ is stemmed to ‘fair’

So in both cases (and there are more than two stemmers available in nltk), words that you say are not stemmed, in fact, are. The LancasterStemmer will return ‘easy’ when provided with ‘easily’ or ‘easy’ as input.

Maybe you really wanted a lemmatizer? That would return ‘article’ and ‘poodle’ unchanged.

import nltk
lemma = nltk.wordnet.WordNetLemmatizer()
lemma.lemmatize('article')
'article'
lemma.lemmatize('leaves')
'leaf'

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)