Modern NLP
for
Pre-Modern Practitioners
Joel Grus @joelgrus
#QConAI #2019
Modern NLP for Pre-Modern Practitioners Joel Grus #QConAI - - PowerPoint PPT Presentation
Modern NLP for Pre-Modern Practitioners Joel Grus #QConAI @joelgrus #2019 "True self-control is waiting until the movie starts to eat your popcorn." the movie True until is waiting self-control starts to eat your popcorn.
#QConAI #2019
starts to eat your popcorn. True self-control is waiting the movie until
Natural Language Understanding is Hard
B u t W e ' r e G e t t i n g B e t t e r A t I t *
* as measured by performance
As Measured by Performance on Tasks We're Getting Better at*
* tasks that would be easy if we were good at natural language understanding and that we therefore use to measure our progress toward natural language understanding
About Me
Obligatory Plug for AllenNLP
A Handful of Tasks That Would Be Easy if We Were Good at Natural Language Understanding
Coreference Resolution
Machine Translation
Summarization
Attend QCon.ai.
Machine Comprehension
Machine Comprehension?
Textual Entailment
Winograd Schemas
Language Modeling
And many others!
If you were good at natural language understanding, you'd also be pretty good at these tasks
(I Am Being Unfair)
Each of these tasks is valuable on its own merits Likely they are getting us closer to actual natural language understanding
Lots of Linguistics
Grammars
S S -> NP VP NP VP VP -> VBZ ADJP NP VBZ ADJP NP -> JJ NN JJ NN VBZ ADJP ADJP -> JJ JJ NN VBZ JJ JJ -> "Artificial" NN -> "intelligence" VBZ -> "is" JJ -> "dangerous" Artificial intelligence is dangerous
Hand-Crafted Features
Rule-Based Systems
Theme 5: Transfer Learning
Word Vectors
1 ... .01 .9 .05 ... .3 .6 .1 .2 2.3
Joel is attending an artificial intelligence conference.
artificial intelligence embedding prediction
Using Word Vectors
? ?
Using Word Vectors
V N
Using Word Vectors
N J
The
department heads all quit .
Using Context for Sequence Labeling
V N
Using Context for Sequence Classification
Recurrent Neural Networks
LSTMs and GRUs
Generative Character-Level Modeling
Convolutional Networks
Sequence-to-Sequence Models
Attention
Large "Unsupervised" Language Models
Contextual Embeddings
Contextual Embeddings The Seahawks football today
ELMo
ELMo
Self-Attention
RNN vs CNN vs Self-Attention
The Transformer ("Attention Is All You Need")
One Model to Rule Them All?
The GLUE Benchmark
BERT
Task 1: Masked Language Modeling
Joel is giving a [MASK] talk at a [MASK] in San Francisco interesting exciting derivative pedestrian snooze-worthy ... conference meetup rave coffeehouse WeWork ...
Task 2: Next Sentence Prediction
[CLS] Joel is giving a talk. [SEP] The audience is enthralled. [SEP] [CLS] Joel is giving a talk. [SEP] The audience is falling asleep. [SEP] 99% is_next_sentence 1% is_not_next_sentence 1% is_next_sentence 99% is_not_next_sentence
BERT for downstream tasks
Is GPT-2 Dangerous?
PRETRAINED LANGUAGE MODEL
Use Pretrained Word Vectors
Better Still, Use Pretrained Contextual Embeddings
Use Pretrained BERT to Build Great Classifiers
PRETRAINED LANGUAGE MODEL
In Conclusion
problems
data and lots of compute power have trained models that you can just download and use
I'm fine-tuning a transformer model!
Thanks!
○ read the speaker notes ○ they have lots of links
References
http://ruder.io/a-review-of-the-recent-history-of-nlp/ https://ankit-ai.blogspot.com/2019/03/future-of-natural-language-processing.html
https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html#openai-gpt