GPT3 - AtishyaJain Thecontent of this presentation has beensourced - - PowerPoint PPT Presentation

gpt3
SMART_READER_LITE
LIVE PREVIEW

GPT3 - AtishyaJain Thecontent of this presentation has beensourced - - PowerPoint PPT Presentation

GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper Let s Dissectit Let s Dissectit Let s Dissectit Mergeaa Mergeaa Mergeab Mergeaa Mergeab MergeZY Let


slide-1
SLIDE 1

GPT3

  • AtishyaJain

Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper

slide-2
SLIDE 2

Let’s Dissectit

slide-3
SLIDE 3

Let’s Dissectit

slide-4
SLIDE 4

Let’s Dissectit

slide-5
SLIDE 5
slide-6
SLIDE 6

Mergeaa

slide-7
SLIDE 7

Mergeaa Mergeab

slide-8
SLIDE 8

Mergeaa Mergeab MergeZY

slide-9
SLIDE 9

Let’s Dissectit

slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

GPT uses Decoder part

  • nly

BERT uses Encoder part

  • nly
slide-17
SLIDE 17

Architecture

slide-18
SLIDE 18

Architecture

slide-19
SLIDE 19

Architecture

slide-20
SLIDE 20

Let’s Dissectit

slide-21
SLIDE 21

355 Yearson fastestV100

$4,600,000 On lowest GPU cloud provider

slide-22
SLIDE 22

Let’s understand Few ShotLearning

slide-23
SLIDE 23

There is a Dairy Cow

Zero Shot Learning

slide-24
SLIDE 24

There is a Horse

Zero Shot Learning

slide-25
SLIDE 25

Zebra is a horse with Dairy Cow’s color

Zero Shot Learning

slide-26
SLIDE 26

You are better than a CNN !!

Zero Shot Learning

Dad, Its a Zebra

slide-27
SLIDE 27

There is a Monkey

One Shot Learning

slide-28
SLIDE 28

You are better than a CNN !!

One Shot Learning

Dad, Its a Monkey

slide-29
SLIDE 29

There is a Dog

Few Shot Learning

slide-30
SLIDE 30

There is another Dog

Few Shot Learning

slide-31
SLIDE 31

You are better than a CNN !!

Few Shot Learning

Dad, Its a Dog

slide-32
SLIDE 32

Few Shot Learning

slide-33
SLIDE 33

Few Shot Learning

slide-34
SLIDE 34

Few Shot Learning

slide-35
SLIDE 35

Compute Power

slide-36
SLIDE 36

Transformer Variants

slide-37
SLIDE 37

Training Dataset

slide-38
SLIDE 38

Training Dataset

  • Filtering
slide-39
SLIDE 39

Training Dataset

  • Filtering
  • Fuzzy Deduplication
slide-40
SLIDE 40

Training Dataset

  • Filtering
  • Fuzzy Deduplication
  • Adding high quality dataset
slide-41
SLIDE 41

Training Dataset

  • Filtering
  • Fuzzy Deduplication
  • Adding high quality dataset
  • Overlapping Test Set
slide-42
SLIDE 42

Evaluations

slide-43
SLIDE 43

Language Modelling

  • SOTA on PTB
  • Omit the 4 Wikipedia-related tasks and one-billion word

benchmark

slide-44
SLIDE 44

LAMBDA

slide-45
SLIDE 45

TriviaQA

slide-46
SLIDE 46

Translation

slide-47
SLIDE 47

Synthetic and Qualitative Tasks

  • Arithmetic
  • Word Scrambling and Manipulation
  • SAT Analogies
  • News Article Generation
  • Learning and Using NovelWords
  • Correcting English Grammar
slide-48
SLIDE 48

Arithmetic

slide-49
SLIDE 49

Word Scramble and Manipulation

slide-50
SLIDE 50

News Generation

slide-51
SLIDE 51

Limitations

  • Lowperformanceinsome NLPtasks
  • Starts to lose coherenceoversufficientlylarge passages
  • Special difficulty with“commonsense physics” like “IfI putcheese

infridge,will it melt?”

  • Architecturaldrawbackis doesn’thave bidirectionalinfo and

denoisingobjectives

slide-52
SLIDE 52

Limitations

  • Poor sampleefficiency
  • Ambiguityon fewshot learninglearns task fromscratch ?
  • Difficult inferencing, hugemodel
  • Lack of structuredknowledge
slide-53
SLIDE 53

Fairness and Bias

slide-54
SLIDE 54

Fairness and Bias

Race

slide-55
SLIDE 55

Fairness and Bias

Race Religion

slide-56
SLIDE 56

Demos

slide-57
SLIDE 57

GPT3 : Demos

slide-58
SLIDE 58

GPT3 : Interaction with your own AR bot

https://twitter.com/i/status/1294380308209508359

slide-59
SLIDE 59

GPT3 : Animate Your Maths From English

https://twitter.com/i/status/1294652394739912704

slide-60
SLIDE 60

GPT3 : Building aWebsite

https://youtu.be/LOhIS7kiKvM

slide-61
SLIDE 61

GPT3 : Context BasedDictionary

https://twitter.com/i/status/1294631853224206339

slide-62
SLIDE 62

GPT3 : Describe YourDesign

slide-63
SLIDE 63

W eaknesses

  • Fails miserably on reasoning tasks, so in essence,

GPT-3 is not a very good reasoning module at all (Vipul)

  • No saturation yet (Vipul)
slide-64
SLIDE 64

W eaknesses

  • Fails miserably on reasoning tasks, so in essence,

GPT-3 is not a very good reasoning module at all (Vipul)

  • No saturation yet (Vipul)
  • In zero-shot or one-shot, choice of words for task

description in context learning can introduce variance (Shantanu)

  • Limited context window of 2048 (Shantanu)
slide-65
SLIDE 65

E xtensions

  • A bidirectional model with similar size and

experiments (Vipul, Shantanu)

slide-66
SLIDE 66

E xtensions

  • A bidirectional model with similar size and

experiments (Vipul, Shantanu)

  • Explainable few-shot learning and analysis to see if

GTP-3 is actually learning (Vipul, Shantanu)

slide-67
SLIDE 67

E xtensions

  • A bidirectional model with similar size and

experiments (Vipul, Shantanu)

  • Explainable few-shot learning and analysis to see if

GTP-3 is actually learning (Vipul, Shantanu)

  • A distilled version of GPT-3 (Shantanu)
slide-68
SLIDE 68

E xtensions

  • A bidirectional model with similar size and

experiments (Vipul, Shantanu)

  • Explainable few-shot learning and analysis to see if

GTP-3 is actually learning (Vipul, Shantanu)

  • A distilled version of GPT-3 (Shantanu)
  • Limited context window of 2048 (Shantanu)
slide-69
SLIDE 69

E xtensions

  • A bidirectional model with similar size and

experiments (Vipul, Shantanu)

  • Explainable few-shot learning and analysis to see if

GTP-3 is actually learning (Vipul, Shantanu)

  • A distilled version of GPT-3 (Shantanu)
  • Limited context window of 2048 (Shantanu)
  • Adversarial experiments to tweak the training

samples articulately and present the adversarial examples to it at test time for inference. (Vipul)

slide-70
SLIDE 70

Thankyou

slide-71
SLIDE 71

R eferences

  • https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a
  • https://www.youtube.com/watch?v=SY5PvZrJhLE
  • https://jalammar.github.io/how-gpt3-works-visualizations-animations/
  • https://www.youtube.com/watch?v=8psgEDhT1MM&vl=en
  • https://www.youtube.com/watch?v=7qPDwsCLbZc&t=3959s
  • Language Models are Few-Shot Learners (Brown et. al)
  • https://www.youtube.com/watch?v=Mq97CF02sRY