nlp
Definition of downstream tasks in NLP
“Downstream tasks is what the field calls those supervised-learning tasks that utilize a pre-trained model or component”. Derived from this blogpost: http://jalammar.github.io/illustrated-bert/
How is WordPiece tokenization helpful to effectively deal with rare words problem in NLP?
WordPiece and BPE are two similar and commonly used techniques to segment words into subword-level in NLP tasks. In both cases, the vocabulary is initialized with all the individual characters in the language, and then the most frequent/likely combinations of the symbols in the vocabulary are iteratively added to the vocabulary. Consider the WordPiece algorithm … Read more
What Is the Difference Between POS Tagging and Shallow Parsing?
POS tagging would give a POS tag to each and every word in the input sentence. Parsing the sentence (using the stanford pcfg for example) would convert the sentence into a tree whose leaves will hold POS tags (which correspond to words in the sentence), but the rest of the tree would tell you how … Read more
CBOW v.s. skip-gram: why invert context and target words?
Here is my oversimplified and rather naive understanding of the difference: As we know, CBOW is learning to predict the word by the context. Or maximize the probability of the target word by looking at the context. And this happens to be a problem for rare words. For example, given the context yesterday was a … Read more
Text Summarization Evaluation – BLEU vs ROUGE
In general: Bleu measures precision: how much the words (and/or n-grams) in the machine generated summaries appeared in the human reference summaries. Rouge measures recall: how much the words (and/or n-grams) in the human reference summaries appeared in the machine generated summaries. Naturally – these results are complementing, as is often the case in precision … Read more
Is there a human readable programming language? [closed]
How about LOLCODE? HAI CAN HAS STDIO? VISIBLE “HAI WORLD!” KTHXBYE Simplicity itself!