nlp – Tarik Billa

word2vec lemmatization of corpus before training

April 7, 2024 by Tarik

Difference between Rasa core and Rasa nlu

December 26, 2023 by Tarik

Computing precision and recall in Named Entity Recognition

September 18, 2023 by Tarik

What are good starting points for someone interested in natural language processing? [closed]

July 26, 2023 by Tarik

Definition of downstream tasks in NLP

June 3, 2023 by Tarik

“Downstream tasks is what the field calls those supervised-learning tasks that utilize a pre-trained model or component”. Derived from this blogpost: http://jalammar.github.io/illustrated-bert/

How is WordPiece tokenization helpful to effectively deal with rare words problem in NLP?

April 28, 2023 by Tarik

WordPiece and BPE are two similar and commonly used techniques to segment words into subword-level in NLP tasks. In both cases, the vocabulary is initialized with all the individual characters in the language, and then the most frequent/likely combinations of the symbols in the vocabulary are iteratively added to the vocabulary. Consider the WordPiece algorithm … Read more

What Is the Difference Between POS Tagging and Shallow Parsing?

April 27, 2023 by Tarik

POS tagging would give a POS tag to each and every word in the input sentence. Parsing the sentence (using the stanford pcfg for example) would convert the sentence into a tree whose leaves will hold POS tags (which correspond to words in the sentence), but the rest of the tree would tell you how … Read more

CBOW v.s. skip-gram: why invert context and target words?

April 19, 2023 by Tarik

Here is my oversimplified and rather naive understanding of the difference: As we know, CBOW is learning to predict the word by the context. Or maximize the probability of the target word by looking at the context. And this happens to be a problem for rare words. For example, given the context yesterday was a … Read more

Text Summarization Evaluation – BLEU vs ROUGE

April 16, 2023 by Tarik

In general: Bleu measures precision: how much the words (and/or n-grams) in the machine generated summaries appeared in the human reference summaries. Rouge measures recall: how much the words (and/or n-grams) in the human reference summaries appeared in the machine generated summaries. Naturally – these results are complementing, as is often the case in precision … Read more

Is there a human readable programming language? [closed]

April 14, 2023 by Tarik

How about LOLCODE? HAI CAN HAS STDIO? VISIBLE “HAI WORLD!” KTHXBYE Simplicity itself!