Java or Python for Natural Language Processing [closed]

Java vs Python for NLP is very much a preference or necessity. Depending on the company/projects you’ll need to use one or the other and often there isn’t much of a choice unless you’re heading a project.

Other than NLTK (www.nltk.org), there are actually other libraries for text processing in python:

  • TextBlob: http://textblob.readthedocs.org/en/dev/
  • Gensim: http://radimrehurek.com/gensim/
  • Pattern: http://www.clips.ua.ac.be/pattern
  • Spacy:: http://spacy.io
  • Orange: http://orange.biolab.si/features/
  • Pineapple: https://github.com/proycon/pynlpl

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)

For Java, there’re tonnes of others but here’s another list:

  • Freeling: http://nlp.lsi.upc.edu/freeling/
  • OpenNLP: http://opennlp.apache.org/
  • LingPipe: http://alias-i.com/lingpipe/
  • Stanford CoreNLP: http://stanfordnlp.github.io/CoreNLP/ (comes with wrappers for other languages, python included)
  • CogComp NLP: https://github.com/CogComp/cogcomp-nlp

This is a nice comparison for basic string processing, see http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

A useful comparison of GATE vs UIMA vs OpenNLP, see https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4

If you’re uncertain, which is the language to go for NLP, personally i say, “any language that will give you the desired analysis/output”, see Which language or tools to learn for natural language processing?

Here’s a pretty recent (2017) of NLP tools: https://github.com/alvations/awesome-community-curated-nlp

An older list of NLP tools (2013): http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp


Other than language processing tools, you would very much need machine learning tools to incorporate into NLP pipelines.

There’s a whole range in Python and Java, and once again it’s up to preference and whether the libraries are user-friendly enough:

Machine Learning libraries in python:

  • Sklearn (Scikit-learn): http://scikit-learn.org/stable/
  • Milk: http://luispedro.org/software/milk
  • Scipy: http://www.scipy.org/
  • Theano: http://deeplearning.net/software/theano/
  • PyML: http://pyml.sourceforge.net/
  • pyBrain: http://pybrain.org/
  • Graphlab Create (Commerical tool but free academic license for 1 year): https://dato.com/products/create/

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)

  • Weka: http://www.cs.waikato.ac.nz/ml/weka/index.html
  • Mallet: http://mallet.cs.umass.edu/
  • Mahout: https://mahout.apache.org/

With the recent (2015) deep learning tsunami in NLP, possibly you could consider: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

I’ll avoid listing deep learning tools out of non-favoritism / neutrality.


Other Stackoverflow questions that also asked for NLP/ML tools:

  • Machine Learning and Natural Language Processing
  • What are good starting points for someone interested in natural language processing?
  • Natural language processing
  • Natural Language Processing in Java (NLP)
  • Is there a good natural language processing library
  • Simple Natural Language Processing Startup for Java
  • What libraries offer basic or advanced NLP methods?
  • Latest good languages and books for Natural Language Processing, the basics
  • (For NER) Entity Extraction/Recognition with free tools while feeding Lucene Index
  • (With PHP) NLP programming tools using PHP?
  • (With Ruby) https://stackoverflow.com/questions/3776361/ruby-nlp-libraries

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)