Low-Resource Natural Language Processing
Behnam Sabeti Sharif University of Technology October 2019
Low-Resource Natural Language Processing Behnam Sabeti Sharif - - PowerPoint PPT Presentation
Low-Resource Natural Language Processing Behnam Sabeti Sharif University of Technology October 2019 Who-am-i? Be Behnam m Sabet eti Ph.D. Candidate at Sharif arif Unive versity rsity of Techno chnology logy Project Manager and NPL
Behnam Sabeti Sharif University of Technology October 2019
Who-am-i?
Be Behnam m Sabet eti
Ph.D. Candidate at Sharif arif Unive versity rsity of Techno chnology logy Project Manager and NPL Expert at Miras s Tech chnolog nologies es Intern ternation tional Does all kind of NLP stuff specially on Persi sian an
Sharif Data Talks: Low-Resourced NLP 2
behnamsabeti behnamsabeti
NLP @ Miras
services for Persian:
Sharif Data Talks: Low-Resourced NLP 3
Dataset Size (documents) IMDB 50 K SST 10 K Sentiment140 160 K Amazon Product Data 142.8 M
Sharif Data Talks: Low-Resourced NLP 4
Problem?
Sharif Data Talks: Low-Resourced NLP 5
Sharif Data Talks: Low-Resourced NLP 6
Sharif Data Talks: Low-Resourced NLP 7
Solutions
Sharif Data Talks: Low-Resourced NLP 8
Self Supervision
choice.
Sharif Data Talks: Low-Resourced NLP 9
Case Study: Emotion Analysis
𝐹𝑛𝑝𝑘𝑗 𝑄𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 ⟹ 𝐹𝑛𝑝𝑢𝑗𝑝𝑜 𝐵𝑜𝑏𝑚𝑧𝑡𝑗𝑡
Sharif Data Talks: Low-Resourced NLP 10
Sharif Data Talks: Low-Resourced NLP 11 Image: medium.com/@bjarkefelbo/what- can-we-learn-from-emojis
DeepMoji Model
Sharif Data Talks: Low-Resourced NLP 12
Image: medium.com/huggingface/understand ing-emotions-from-keras-to-pytorch
Sharif Data Talks: Low-Resourced NLP 13
مراد تسود یلیخ ور زییاپ ! لصف هخآ یبوخ نیا هب !رتدوز ایب...
Weak Supervision
Sharif Data Talks: Low-Resourced NLP 14
Case Study: Document Classification
labeling a handful of topics
Sharif Data Talks: Low-Resourced NLP 15
Sharif Data Talks: Low-Resourced NLP 16 Image: m-cacm.acm.org/magazines/2012/4/147361- probabilistic-topic-models
Sharif Data Talks: Low-Resourced NLP 17
رد طیارش یلعف قبط شرازگ دحاو یتاعلبطا ،تسیمونوکا نیرتشیب یکسیر هک داصتقا ناریا ار دیدهت ،دنک یم کسیر شخب یکناب و کسیر یسایس تسا. رصع ؛کناب تنواعم یسررب یاه یداصتقا قاتا یناگرزاب نارهت رد یشرازگ هب یواکاو لدم کسیر یروشک هتخادرپ تسا. ساسارب نیا شرازگ لدم کسیر ،یروشک یلدم تسا هک هب روظنم شجنس و هسیاقم کسیر یرابتعا یاهروشک فلتخم طسوت دحاو یتاعلبطا تسیمونوکا یحارط هدش تسا. نیا رازبا ،یلماعت ناکما یزاس یمک کسیر تلبدابم یلام زا هلمج یاه ماو ،یکناب نیمات یلام یراجت و یراذگ هیامرس رد قاروا راداهب ار مهارف دنک یم…
Transfer Learning
Sharif Data Talks: Low-Resourced NLP 18
Sharif Data Talks: Low-Resourced NLP 19
Image: machinelearningmastery.com/transfer- learning-for-deep-learning
Pre-Trained Models
HUGE datasets.
Sharif Data Talks: Low-Resourced NLP 20
Sharif Data Talks: Low-Resourced NLP 21 Image: jalammar.github.io/illustrated-bert
Sharif Data Talks: Low-Resourced NLP 22
Image: medium.com/huggingface/introducing-fastbert-a- simple-deep-learning-library-for-bert-modelsCase Study: Named Entity Recognition
Sharif Data Talks: Low-Resourced NLP 23
Sharif Data Talks: Low-Resourced NLP 24
Multi-Task Learning
Sharif Data Talks: Low-Resourced NLP 25
Sharif Data Talks: Low-Resourced NLP 26
Image: medium.com/manash-en-blog/multi-task- learning-in-keras-implementation-of-multi-task- classification-lossCase Study: Satire Detection
Sharif Data Talks: Low-Resourced NLP 27
Sharif Data Talks: Low-Resourced NLP 28
Satire re Mod
el Perfo rforman rmance ce (F1) Single task 55 % Multi task 68 %
Sharif Data Talks: Low-Resourced NLP 29
Active Learning
Sharif Data Talks: Low-Resourced NLP 30
Sharif Data Talks: Low-Resourced NLP 31
Sharif Data Talks: Low-Resourced NLP 32
Image: www.datacamp.com/community/tutorials /active-learning
Active Learning
Sharif Data Talks: Low-Resourced NLP 33
Sharif Data Talks: Low-Resourced NLP 34
D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05
Sharif Data Talks: Low-Resourced NLP 35
D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05
Sharif Data Talks: Low-Resourced NLP 36
D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 Least Confident
Sharif Data Talks: Low-Resourced NLP 37
D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05
Sharif Data Talks: Low-Resourced NLP 38
D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 Margin
Sharif Data Talks: Low-Resourced NLP 39
D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 𝐹𝑜𝑢𝑠𝑝𝑞𝑧 𝑄 = − 𝑞𝑗𝑚𝑝𝑞𝑗
𝑗
Sharif Data Talks: Low-Resourced NLP 40
D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 1.23 1.57 0.29 𝐹𝑜𝑢𝑠𝑝𝑞𝑧 𝑄 = − 𝑞𝑗𝑚𝑝𝑞𝑗
𝑗
Sharif Data Talks: Low-Resourced NLP 41
D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 𝐹𝑜𝑢𝑠𝑝𝑞𝑧 𝑄 = − 𝑞𝑗𝑚𝑝𝑞𝑗
𝑗
Entropy
Case Study: Sentiment Analysis
Sharif Data Talks: Low-Resourced NLP 42
Sharif Data Talks: Low-Resourced NLP 43
Summary
Sharif Data Talks: Low-Resourced NLP 44
Natural Language Processing Services for Persian www.getzaal.com
Sharif Data Talks: Low-Resourced NLP 45
Sharif Data Talks: Low-Resourced NLP 46