Low-Resource Natural Language Processing Behnam Sabeti Sharif - - PowerPoint PPT Presentation

low resource natural language processing
SMART_READER_LITE
LIVE PREVIEW

Low-Resource Natural Language Processing Behnam Sabeti Sharif - - PowerPoint PPT Presentation

Low-Resource Natural Language Processing Behnam Sabeti Sharif University of Technology October 2019 Who-am-i? Be Behnam m Sabet eti Ph.D. Candidate at Sharif arif Unive versity rsity of Techno chnology logy Project Manager and NPL


slide-1
SLIDE 1

Low-Resource Natural Language Processing

Behnam Sabeti Sharif University of Technology October 2019

slide-2
SLIDE 2

Who-am-i?

Be Behnam m Sabet eti

Ph.D. Candidate at Sharif arif Unive versity rsity of Techno chnology logy Project Manager and NPL Expert at Miras s Tech chnolog nologies es Intern ternation tional Does all kind of NLP stuff specially on Persi sian an

Sharif Data Talks: Low-Resourced NLP 2

behnamsabeti behnamsabeti

slide-3
SLIDE 3

NLP @ Miras

  • Our focus at Miras NLP team is on developing text processing

services for Persian:

  • Document classification
  • Named entity recognition
  • Sentiment analysis
  • Emotion analysis
  • Challenge:
  • Data!

Sharif Data Talks: Low-Resourced NLP 3

slide-4
SLIDE 4

Dataset Size (documents) IMDB 50 K SST 10 K Sentiment140 160 K Amazon Product Data 142.8 M

Sharif Data Talks: Low-Resourced NLP 4

slide-5
SLIDE 5

Problem?

  • Deep learning models are data hungry
  • Persian NLP community is not large
  • We do not have enough public resources
  • Funding is also limited so we can’t afford building huge resources either

Sharif Data Talks: Low-Resourced NLP 5

slide-6
SLIDE 6

Get More Date Get Better Data Use Related Data Problem Modeling

Sharif Data Talks: Low-Resourced NLP 6

slide-7
SLIDE 7

Get Better Data Use Related Data Problem Modeling

Sharif Data Talks: Low-Resourced NLP 7

Better Related Modeling

slide-8
SLIDE 8

Solutions

  • Self Supervision
  • Emotion Analysis
  • Weak Supervision
  • Document Classification
  • Transfer Learning
  • Named Entity Recognition
  • Multi-Task Learning
  • Satire Detection
  • Active Learning
  • Sentiment Analysis

Sharif Data Talks: Low-Resourced NLP 8

slide-9
SLIDE 9

Self Supervision

  • Straight forward (document, label) modeling is not always your best

choice.

  • Model your problem in an easy-to-acquire-label setting:
  • Self-supervision
  • Labels are already in your data:
  • Language modeling
  • Word embedding
  • Emotion Analysis

Sharif Data Talks: Low-Resourced NLP 9

Modeling

slide-10
SLIDE 10

Case Study: Emotion Analysis

  • Emoji is a good indicator of emotion
  • Instead of manually label your data use emoji
  • Your dataset needs no hand-labeling effort!

𝐹𝑛𝑝𝑘𝑗 𝑄𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 ⟹ 𝐹𝑛𝑝𝑢𝑗𝑝𝑜 𝐵𝑜𝑏𝑚𝑧𝑡𝑗𝑡

Sharif Data Talks: Low-Resourced NLP 10

Modeling

slide-11
SLIDE 11

Sharif Data Talks: Low-Resourced NLP 11 Image: medium.com/@bjarkefelbo/what- can-we-learn-from-emojis

slide-12
SLIDE 12

DeepMoji Model

  • Predict Emoji
  • Map Emoji to Emotion

Sharif Data Talks: Low-Resourced NLP 12

Modeling

Image: medium.com/huggingface/understand ing-emotions-from-keras-to-pytorch

slide-13
SLIDE 13

Sharif Data Talks: Low-Resourced NLP 13

مراد تسود یلیخ ور زییاپ ! لصف هخآ یبوخ نیا هب !رتدوز ایب...

slide-14
SLIDE 14

Weak Supervision

  • Provide noisy labels using a set of heuristics or domain knowledge
  • Use other weak classifiers
  • Constraints
  • Data transformation
  • Think of a transformation on your data:
  • Reduce the effort in annotation process

Sharif Data Talks: Low-Resourced NLP 14

Modeling

slide-15
SLIDE 15

Case Study: Document Classification

  • Latent Dirichlet Allocation is a generative model for topic modeling:
  • computes a set of topics: each topic is a distribution on words
  • Computes the distribution of each document on topics
  • Instead of manually labeling documents, annotate topics!
  • With this transformation you can get a pretty good result by just

labeling a handful of topics

Sharif Data Talks: Low-Resourced NLP 15

Modeling

slide-16
SLIDE 16

Sharif Data Talks: Low-Resourced NLP 16 Image: m-cacm.acm.org/magazines/2012/4/147361- probabilistic-topic-models

slide-17
SLIDE 17

Sharif Data Talks: Low-Resourced NLP 17

رد طیارش یلعف قبط شرازگ دحاو یتاعلبطا ،تسیمونوکا نیرتشیب یکسیر هک داصتقا ناریا ار دیدهت ،دنک یم کسیر شخب یکناب و کسیر یسایس تسا. رصع ؛کناب تنواعم یسررب یاه یداصتقا قاتا یناگرزاب نارهت رد یشرازگ هب یواکاو لدم کسیر یروشک هتخادرپ تسا. ساسارب نیا شرازگ لدم کسیر ،یروشک یلدم تسا هک هب روظنم شجنس و هسیاقم کسیر یرابتعا یاهروشک فلتخم طسوت دحاو یتاعلبطا تسیمونوکا یحارط هدش تسا. نیا رازبا ،یلماعت ناکما یزاس یمک کسیر تلبدابم یلام زا هلمج یاه ماو ،یکناب نیمات یلام یراجت و یراذگ هیامرس رد قاروا راداهب ار مهارف دنک یم…

slide-18
SLIDE 18

Transfer Learning

  • Train on a task for which you have enough data
  • Fine-Tune the trained model on a new task (for which limited data is available)
  • The source and target tasks need to have common characteristics:
  • Source: Language modeling, Target: Document Classification
  • Source: Emotion Detection, Target: Satire Detection
  • Source: Document Classification, Target: Sentiment Analysis
  • Document Classification: word based task
  • Sentiment Analysis: Phrase level and semantic based task

Sharif Data Talks: Low-Resourced NLP 18

Related

slide-19
SLIDE 19

Sharif Data Talks: Low-Resourced NLP 19

Related

Image: machinelearningmastery.com/transfer- learning-for-deep-learning

slide-20
SLIDE 20

Pre-Trained Models

  • Train your own model on a source task Or use a Pre-trained model
  • Pre-Trained model are a good choice because they are trained on

HUGE datasets.

  • Language modeling pre-trained models:
  • BERT
  • GPT
  • XLNet
  • XLM
  • CTRL

Sharif Data Talks: Low-Resourced NLP 20

Related

slide-21
SLIDE 21

Sharif Data Talks: Low-Resourced NLP 21 Image: jalammar.github.io/illustrated-bert

slide-22
SLIDE 22

Sharif Data Talks: Low-Resourced NLP 22

Image: medium.com/huggingface/introducing-fastbert-a- simple-deep-learning-library-for-bert-models
slide-23
SLIDE 23

Case Study: Named Entity Recognition

  • Target Task: Named Entity Recognition
  • Extract locations, persons, organizations, events and times from text
  • Source: Multilingual BERT model
  • Data: 50K hand labeled sentences with NER tags

Sharif Data Talks: Low-Resourced NLP 23

Related

slide-24
SLIDE 24

Sharif Data Talks: Low-Resourced NLP 24

slide-25
SLIDE 25

Multi-Task Learning

  • Train multiple tasks together
  • More data
  • Synergic effects in training
  • Tasks: tweet reconstruction + emoji prediction + satire detection
  • General features
  • Emotion features
  • Satire features
  • Entails multi objective loss functions

Sharif Data Talks: Low-Resourced NLP 25

Related

slide-26
SLIDE 26

Sharif Data Talks: Low-Resourced NLP 26

Image: medium.com/manash-en-blog/multi-task- learning-in-keras-implementation-of-multi-task- classification-loss
slide-27
SLIDE 27

Case Study: Satire Detection

  • Satire dataset: 2K tweets
  • Emotion dataset: 300K tweets
  • Reconstruction Tweets: as much as you have! (200M)

Sharif Data Talks: Low-Resourced NLP 27

Related

slide-28
SLIDE 28

Sharif Data Talks: Low-Resourced NLP 28

Related

slide-29
SLIDE 29

Satire re Mod

  • del

el Perfo rforman rmance ce (F1) Single task 55 % Multi task 68 %

Sharif Data Talks: Low-Resourced NLP 29

slide-30
SLIDE 30

Active Learning

  • How to select samples for annotation?
  • Random
  • Annotate as much as you can
  • Smart
  • Annotate “Better” samples

Sharif Data Talks: Low-Resourced NLP 30

Better

slide-31
SLIDE 31

Sharif Data Talks: Low-Resourced NLP 31

Better

slide-32
SLIDE 32

Sharif Data Talks: Low-Resourced NLP 32

Better

Image: www.datacamp.com/community/tutorials /active-learning

slide-33
SLIDE 33

Active Learning

  • How to select samples for annotation
  • Random
  • Smart
  • Select samples that current model is uncertain about (LC)
  • Select samples with low margin between category labels (Margin)
  • Select samples with the highest entropy (Entropy)
  • Get better performance with fewer samples

Sharif Data Talks: Low-Resourced NLP 33

Better

slide-34
SLIDE 34

Sharif Data Talks: Low-Resourced NLP 34

Better

D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05

slide-35
SLIDE 35

Sharif Data Talks: Low-Resourced NLP 35

Better

D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05

slide-36
SLIDE 36

Sharif Data Talks: Low-Resourced NLP 36

Better

D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 Least Confident

slide-37
SLIDE 37

Sharif Data Talks: Low-Resourced NLP 37

Better

D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05

slide-38
SLIDE 38

Sharif Data Talks: Low-Resourced NLP 38

Better

D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 Margin

slide-39
SLIDE 39

Sharif Data Talks: Low-Resourced NLP 39

Better

D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 𝐹𝑜𝑢𝑠𝑝𝑞𝑧 𝑄 = − 𝑞𝑗𝑚𝑝𝑕𝑞𝑗

𝑗

slide-40
SLIDE 40

Sharif Data Talks: Low-Resourced NLP 40

Better

D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 1.23 1.57 0.29 𝐹𝑜𝑢𝑠𝑝𝑞𝑧 𝑄 = − 𝑞𝑗𝑚𝑝𝑕𝑞𝑗

𝑗

slide-41
SLIDE 41

Sharif Data Talks: Low-Resourced NLP 41

Better

D1 D2 D3 Current Model Positive Neutral Negative 0.5 0.45 0.05 0.4 0.3 0.3 0.95 0.05 𝐹𝑜𝑢𝑠𝑝𝑞𝑧 𝑄 = − 𝑞𝑗𝑚𝑝𝑕𝑞𝑗

𝑗

Entropy

slide-42
SLIDE 42

Case Study: Sentiment Analysis

  • Model: LSTM
  • Embedding: embedding layer
  • Data: 100K Hand-Labeled Digikala comments (positive, neutral, negative)
  • Test Scenarios:
  • Train on all data
  • Active Learning
  • Entropy
  • Margin
  • Least Confident
  • Random

Sharif Data Talks: Low-Resourced NLP 42

Better

slide-43
SLIDE 43

Sharif Data Talks: Low-Resourced NLP 43

Better

slide-44
SLIDE 44

Summary

  • Problem Modeling
  • Self Supervision
  • Model your problem in a way that labels are easy to get (usually available alongside your data)
  • Weak Supervision
  • Transform data into a new space for less annotation effort
  • Use Related Data
  • Transfer Learning
  • Transfer model knowledge between tasks
  • Multi-Task Learning
  • Use related tasks for more data and synergic effects
  • Get Better Data
  • Active Learning
  • Smart selection of samples for annotation

Sharif Data Talks: Low-Resourced NLP 44

slide-45
SLIDE 45

ZAAL ZAAL

Natural Language Processing Services for Persian www.getzaal.com

Sharif Data Talks: Low-Resourced NLP 45

slide-46
SLIDE 46

Thank You

Sharif Data Talks: Low-Resourced NLP 46