For Text xt Cla lassif ificatio ion Masters Thesis by Shaour - - PowerPoint PPT Presentation

for text xt cla lassif ificatio ion
SMART_READER_LITE
LIVE PREVIEW

For Text xt Cla lassif ificatio ion Masters Thesis by Shaour - - PowerPoint PPT Presentation

Few-Shot Learnin ing For Text xt Cla lassif ificatio ion Masters Thesis by Shaour Haider First Referee : Prof. Dr. Benno Stein Second Referee : Prof. Dr. Volker Rodehorst 1 Overview Introduction Approaches And Results


slide-1
SLIDE 1

Few-Shot Learnin ing For Text xt Cla lassif ificatio ion

Master’s Thesis by Shaour Haider First Referee : Prof. Dr. Benno Stein Second Referee : Prof. Dr. Volker Rodehorst

1

slide-2
SLIDE 2

Overview

  • Introduction
  • Approaches And Results
  • Related Work
  • Future Work

2

slide-3
SLIDE 3

What is is text xt cla lassif ification?

  • For given input:
  • a paragraph A
  • a fixed set of classes C = {c1, c2,…, cn}
  • Output: a predicted class c ∈ C

Why Text Classification?

3

Introduction

slide-4
SLIDE 4
  • Sentiment Analysis

4

  • Spam Detection
  • Topic Classification

Image: Sentiment Analysis Image: Spam Detection Image: Topic Classification

Introduction

slide-5
SLIDE 5

Situation

  • Limited data

Few-shot learning aims to learning a classifier with limited amount of labeled examples (<10)

5

Few-Shot Learning

Few-shot task

Class: Paragraph

Video: Video provides a powerful way to help you prove your point. Document: You can also type a keyword to search online for the video that

best fits your document.

Themes: Themes and styles also help keep your document coordinated. Design: When you click Design and choose a new Theme, the pictures, charts,

and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme. Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout

  • ptions appears next to it.

When you work on a table, click where you want to add a row or a column, and then click the plus sign.

Train Set 4-way 1-shot task Test Set Introduction

slide-6
SLIDE 6

Datasets

6

Terminologies

  • Train Set

t (f (few-shot tr training set) t)

  • Tes

est t Set t ( ( tes estin ting set) t)

Ba Base Dataset: Addit

itional dataset t th that t is is dis isjoint to

  • tr

train in and tes est t set t of

  • f target dataset

Target Dataset:

Introduction

slide-7
SLIDE 7

7

Target Training Data (Few) Feature Extraction Classifier Target Loss & Update Target Testing Data Feature Extraction Classifier Target Accuracy Assessment

Train Test

Fixed Weights

Target Dataset Bag of words

Let's 's Im Implement

Approaches And Results

slide-8
SLIDE 8

K=1 K=3 K=9 BOW 0.48 0.58 0.74

8 0.48 0.58 0.74 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 K=1 K=3 K=9 BOW

Baseline Bag of words

Approaches And Results

slide-9
SLIDE 9

Problem wit ith the bag of words!

  • Overfitting

9

  • Vocabulary mismatch

Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal.

Sports

[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 2 0 1 0 0 0]

While football continued to be played in various forms throughout Britain, its public schools (equivalent to private schools in other countries) are widely credited with four key achievements in the creation of modern football codes.

[0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 2 1 1 0 0 0 3 0 0 1 1 0 0 1 1 0 1 1 1 1 2 0 0 0 0 1 1 2 1 0 1 1 1]

Baseball evolved from older bat-and-ball games already being played in England by the mid-18th century.

[1 0 1 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] [0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 2 1 1 0 0 0 3 0 0 1 1 0 0 1 1 0 1 1 1 1 2 0 0 0 0 1 1 2 1 0 1 1 1] Approaches And Results

slide-10
SLIDE 10

Better representations

10

Target Training Data (Few) Feature Extraction Model Target Loss & Update Target Testing Data Feature Extraction Classifier Target Accuracy Assessment

Train Test

Fixed Weights

Target Dataset Pre-Trained FastText Or Bert Model

Approaches And Results

slide-11
SLIDE 11

K=1 K=3 K=9 BOW 0.48 0.58 0.74 FastText 0.66 (+ 0.18) 0.78 (+ 0.20) 0.84 (+ 0.10) Bert 0.73 (+ 0.25) 0.84 (+ 0.26) 0.89 (+ 0.15)

11 0.48 0.58 0.74 0.66 0.78 0.84 0.73 0.84 0.89 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K=1 K=3 K=9 BOW FastText Bert

Baseline FastText & Bert

Approaches And Results

slide-12
SLIDE 12

Can we im improve any further?

12

Image: Transfer Learning

Approaches And Results

slide-13
SLIDE 13

13

Target Training Data (Few) Feature Extraction Classifier Target Loss & Update Target Testing Data Feature Extraction Classifier Target Accuracy Assessment

Train Test

Fixed Weights

Target Dataset

Pre-Training Data (Many) Feature Extraction Model Pre-Training Loss & Update Model Pre-Training

Base Dataset

Model Pre-Training

Fixed Weights Fixed Weights

Bag of words

Approach: Transfer Learning

Approaches And Results

slide-14
SLIDE 14

Model

14

Standard

Approaches And Results

slide-15
SLIDE 15

Results

15

BOW Transfer Learning (Standard) K=1 K=3 K=9 BOW 0.49 (+ 0.01) 0.52 (- 0.06) 0.71 (- 0.03)

0.49 0.52 0.71 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 K=1 K=3 K=9 BOW

Transfer Learning - Standard Model

Approaches And Results

slide-16
SLIDE 16

16

Target Training Data (Few) Feature Extraction Classifier Target Loss & Update Target Testing Data Feature Extraction Classifier Target Accuracy Assessment

Train Test

Fixed Weights

Target Dataset

Pre-Training Data (Many) Feature Extraction Model Pre-Training Loss & Update Model Pre-Training

Base Dataset

Model Pre-Training

Fixed Weights Fixed Weights

Pretrained FastText & Bert Model

Approaches And Results

slide-17
SLIDE 17

Results

17

BOW

0.48 0.58 0.74 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 K=1 K=3 K=9 BOW

Transfer Learning - Standard Model

FastText And Bert

0.62 0.75 0.81 0.73 0.84 0.88 0.2 0.4 0.6 0.8 1 K=1 K=3 K=9 FastText Bert

Transfer Learning (Standard) K=1 K=3 K=9 BOW 0.49 (+ 0.01) 0.52 (- 0.06) 0.71 (- 0.03) FastText 0.62 (- 0.04) 0.75 (- 0.03) 0.81 (- 0.03) Bert 0.73 ( 0.00) 0.84 ( 0.00) 0.88 ( -0.01) Approaches And Results

slide-18
SLIDE 18

Model

18

Modified

Approaches And Results

slide-19
SLIDE 19

Results

19

Transfer Learning - Modified Model

Transfer Learning (Modified) K=1 K=3 K=9 BOW 0.68 (+ 0.20) 0.75 (+ 0.17) 0.84 (+ 0.10) FastText 0.69 (+ 0.03) 0.78 ( 0.00) 0.83 (- 0.01) Bert 0.73 ( 0.00) 0.81 (- 0.03) 0.87 (- 0.02)

0.68 0.75 0.84 0.69 0.78 0.83 0.73 0.81 0.87 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K=1 K=3 K=9 BOW FastText Bert

Approaches And Results

slide-20
SLIDE 20

Complete Results

0.48 0.58 0.74 0.66 0.78 0.84 0.73 0.84 0.89 0.49 0.52 0.71 0.62 0.75 0.81 0.73 0.84 0.88 0.68 0.75 0.84 0.69 0.78 0.83 0.73 0.81 0.87 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K=1 K=3 K=9 BOW- Baseline FastText- Baseline Bert- Baseline BOW- Standard Transfer Leaning FastText- Standard Transfer Leaning Bert- Standard Transfer Leaning BOW- Modified Transfer Learning FastText- Modified Transfer Learning Bert- Modified Transfer Learning 20

Approaches And Results

slide-21
SLIDE 21

Results Summary ry

  • An average improvement of 10-20% in the modified transfer learning

using bow representations as compared to the baseline scores of the bow model.

  • A general increase in the accuracy with the increase in the size of

training task.

  • No real improvements when fine-tuning the representations from

both the advanced pre-trained models fasttext and bert.

  • Bow representation can be improved by pre-training on Wikipedia

section heading classification task.

21

Approaches And Results

slide-22
SLIDE 22

Few-shot learning approaches:

  • Metric Learning
  • Meta Learning

22

Related Work

slide-23
SLIDE 23

Relation Network Advances in few-shot learning Siamese

Metric Learning

Related Work

slide-24
SLIDE 24

Meta Learning

MAML

Related Work

slide-25
SLIDE 25
  • Using other few-shot learning approaches such as meta learning and

metric learning.

  • Increasing the dataset by not just limiting to the level 2 section

heading- Would require having increased computation resources.

  • Using bert-large model instead of bert-base.
  • Finding peak accuracy score for bert model.
  • Testing the trained classifier on topic classification data other than

Wikipedia.

25

Future Work

slide-26
SLIDE 26

Thank you

26

slide-27
SLIDE 27

Additional Slides

42

slide-28
SLIDE 28

Additional Slides

43

slide-29
SLIDE 29

Related Work: Metric Learning

  • Siamese

44

Neural Network Neural Network Input 1 Input 2 Distance Metric

  • Matching Networks

Support Set Instances Query Set Instance

slide-30
SLIDE 30

Related Work: Metric Learning

  • Prototypical Networks & Relation Networks

45

slide-31
SLIDE 31

Related Work: Meta Learning

46

W Θ

slide-32
SLIDE 32

Related Work: Transfer Learning

47

  • Baseline