GPT3 - AtishyaJain Thecontent of this presentation has beensourced - PowerPoint PPT Presentation

GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper

Let ’ s Dissectit

Mergeaa

Mergeaa Mergeab

Mergeaa Mergeab MergeZY

Let ’ s Dissectit

BERT uses GPT uses Encoder part Decoder part only only

Architecture

Let ’ s Dissectit

355 Yearson fastestV100 $4,600,000 On lowest GPU cloud provider

Let ’ s understand Few ShotLearning

Zero Shot Learning There is a Dairy Cow

Zero Shot Learning There is a Horse

Zero Shot Learning Zebra is a horse with Dairy Cow ’ s color

Zero Shot Learning You are Dad, Its a better than a Zebra CNN !!

One Shot Learning There is a Monkey

One Shot Learning You are Dad, Its a better than a Monkey CNN !!

Few Shot Learning There is a Dog

Few Shot Learning There is another Dog

Few Shot Learning You are Dad, Its a better than a Dog CNN !!

Few Shot Learning

Compute Power

Transformer Variants

Training Dataset

Training Dataset - Filtering

Training Dataset - Filtering - Fuzzy Deduplication

Training Dataset - Filtering - Fuzzy Deduplication - Adding high quality dataset

Training Dataset - Filtering - Fuzzy Deduplication - Adding high quality dataset - Overlapping Test Set

Evaluations

Language Modelling - SOTA on PTB - Omit the 4 Wikipedia-related tasks and one-billion word benchmark

LAMBDA

TriviaQA

Translation

Synthetic and Qualitative Tasks - Arithmetic - Word Scrambling and Manipulation - SAT Analogies - News Article Generation - Learning and Using NovelWords - Correcting English Grammar

Arithmetic

Word Scramble and Manipulation

News Generation

Limitations - Lowperformanceinsome NLPtasks Starts to lose coherenceoversufficientlylarge passages - Special difficulty with “common sense physics” like “If I putcheese - infridge,will it melt ?” Architecturaldrawbackis doesn’t have bidirectionalinfo and - denoisingobjectives

Limitations - Poor sampleefficiency Ambiguityon fewshot learninglearns task fromscratch ? - Difficult inferencing, hugemodel - Lack of structuredknowledge -

Fairness and Bias

Fairness and Bias Race

Fairness and Bias Race Religion

GPT3 : Demos

GPT3 : Interaction with your own AR bot https://twitter.com/i/status/1294380308209508359

GPT3 : Animate Your Maths From English https://twitter.com/i/status/1294652394739912704

GPT3 : Building aWebsite https://youtu.be/LOhIS7kiKvM

GPT3 : Context BasedDictionary https://twitter.com/i/status/1294631853224206339

GPT3 : Describe YourDesign

W eaknesses - Fails miserably on reasoning tasks, so in essence, GPT-3 is not a very good reasoning module at all (Vipul) - No saturation yet (Vipul)

W eaknesses - Fails miserably on reasoning tasks, so in essence, GPT-3 is not a very good reasoning module at all (Vipul) - No saturation yet (Vipul) - In zero-shot or one-shot, choice of words for task description in context learning can introduce variance (Shantanu) - Limited context window of 2048 (Shantanu)

E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu)

E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu)

E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu)

E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu) - Limited context window of 2048 (Shantanu)

E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu) - Limited context window of 2048 (Shantanu) - Adversarial experiments to tweak the training samples articulately and present the adversarial examples to it at test time for inference. (Vipul)

Thankyou

R eferences - https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a - https://www.youtube.com/watch?v=SY5PvZrJhLE - https://jalammar.github.io/how-gpt3-works-visualizations-animations/ - https://www.youtube.com/watch?v=8psgEDhT1MM&vl=en - https://www.youtube.com/watch?v=7qPDwsCLbZc&t=3959s - Language Models are Few-Shot Learners (Brown et. al) - https://www.youtube.com/watch?v=Mq97CF02sRY

GPT3 - AtishyaJain Thecontent of this presentation has beensourced - PowerPoint PPT Presentation

GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper Let s Dissectit Let s Dissectit Let s Dissectit Mergeaa Mergeaa Mergeab Mergeaa Mergeab MergeZY Let

Background The many dimensions of searching and indexing video collections hard tasks:

Recurrent Concept Drift in Data Streams YUN SING KOH ykoh@cs.auckland.ac.nz

Pr ogr amme r 's Doze n T hir te e n R e c omme ndations for R e vie wing, R R e fac

Deep learning 8.4. Networks for semantic segmentation Fran cois Fleuret

Crowdsourcing, computer vision, and data science for ecology and conservation Tanya Ber anya

Bag-of-features for category classification Cordelia Schmid Category recognition Image

Implementation of Business Linux Routers Presenter: Joseph Flasch jpflasch@gmail.com Why Use

Decrystallization of Adult Birdsong Anatomy of the song system by Perturbation of Auditory

Science for All Learners 2020 Invasive Species Invasive Species: Kudzu Photo credit: Science

(& Philosophy) David Pierre Leibovitz September 26, 2008 26 September 2008 David Pierre

Modern Process Management with SOA, BAM und CEP From static process models to executable

C ross- L ingual M achine R eading C omprehension Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin,

Introduction of INPAC @ SJTU Haijun Yang (SJTU) KIT, Germany, Sept. 6-8, 2017 Page . 2

Name Institution City Country AVERETT, Todd W&amp;M Williamsburg UNITED STATES OF

Sampling Representative Users from Large Social Networks Jie Tang, Chenhui Zhang Tsinghua

New Era of Particle Physics In past two decades or so, many new physics (NP) models have been

1 / 15 Tensor Low-Rank Reconstruction for Semantic Segmentation Wanli Chen 1 , Xinge Zhu 1 ,

Computer Vision: from Recognition to Geometry Shao-Yi Chien

Conditional Restricted Boltzmann Machine for Item Recommendation Zixiang Chen a, b, c , Wanqi Ma

Networked World 1.3 billion users 700 billion minutes/month 280 million users 80%

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

Reshaping the Global Economy Through Constructive Engagement By Jeffrey A. Sheehan Associate

The current status of NRQCD descriptions of J / and system Jian-Xiong Wang Institute of

a On the Management of Vehicular Traffic HYP2012 Massimiliano D. Rosini mrosini@icm.edu.pl

GPT3 - AtishyaJain Thecontent of this presentation has beensourced - PowerPoint PPT Presentation

GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper Let s Dissectit Let s Dissectit Let s Dissectit Mergeaa Mergeaa Mergeab Mergeaa Mergeab MergeZY Let

Background The many dimensions of searching and indexing video collections hard tasks:

Recurrent Concept Drift in Data Streams YUN SING KOH ykoh@cs.auckland.ac.nz

Pr ogr amme r 's Doze n T hir te e n R e c omme ndations for R e vie wing, R R e fac

Deep learning 8.4. Networks for semantic segmentation Fran cois Fleuret

Crowdsourcing, computer vision, and data science for ecology and conservation Tanya Ber anya

Bag-of-features for category classification Cordelia Schmid Category recognition Image

Implementation of Business Linux Routers Presenter: Joseph Flasch jpflasch@gmail.com Why Use

Decrystallization of Adult Birdsong Anatomy of the song system by Perturbation of Auditory

Science for All Learners 2020 Invasive Species Invasive Species: Kudzu Photo credit: Science

(&amp; Philosophy) David Pierre Leibovitz September 26, 2008 26 September 2008 David Pierre

Modern Process Management with SOA, BAM und CEP From static process models to executable

C ross- L ingual M achine R eading C omprehension Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin,

Introduction of INPAC @ SJTU Haijun Yang (SJTU) KIT, Germany, Sept. 6-8, 2017 Page . 2

Name Institution City Country AVERETT, Todd W&amp;amp;M Williamsburg UNITED STATES OF

Sampling Representative Users from Large Social Networks Jie Tang, Chenhui Zhang Tsinghua

New Era of Particle Physics In past two decades or so, many new physics (NP) models have been

1 / 15 Tensor Low-Rank Reconstruction for Semantic Segmentation Wanli Chen 1 , Xinge Zhu 1 ,

Computer Vision: from Recognition to Geometry Shao-Yi Chien

Conditional Restricted Boltzmann Machine for Item Recommendation Zixiang Chen a, b, c , Wanqi Ma

Networked World 1.3 billion users 700 billion minutes/month 280 million users 80%

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

Reshaping the Global Economy Through Constructive Engagement By Jeffrey A. Sheehan Associate

The current status of NRQCD descriptions of J / and system Jian-Xiong Wang Institute of

a On the Management of Vehicular Traffic HYP2012 Massimiliano D. Rosini mrosini@icm.edu.pl

(& Philosophy) David Pierre Leibovitz September 26, 2008 26 September 2008 David Pierre

Name Institution City Country AVERETT, Todd W&M Williamsburg UNITED STATES OF