gpt3
play

GPT3 - AtishyaJain Thecontent of this presentation has beensourced - PowerPoint PPT Presentation

GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper Let s Dissectit Let s Dissectit Let s Dissectit Mergeaa Mergeaa Mergeab Mergeaa Mergeab MergeZY Let


  1. GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper

  2. Let ’ s Dissectit

  3. Let ’ s Dissectit

  4. Let ’ s Dissectit

  5. Mergeaa

  6. Mergeaa Mergeab

  7. Mergeaa Mergeab MergeZY

  8. Let ’ s Dissectit

  9. BERT uses GPT uses Encoder part Decoder part only only

  10. Architecture

  11. Architecture

  12. Architecture

  13. Let ’ s Dissectit

  14. 355 Yearson fastestV100 $4,600,000 On lowest GPU cloud provider

  15. Let ’ s understand Few ShotLearning

  16. Zero Shot Learning There is a Dairy Cow

  17. Zero Shot Learning There is a Horse

  18. Zero Shot Learning Zebra is a horse with Dairy Cow ’ s color

  19. Zero Shot Learning You are Dad, Its a better than a Zebra CNN !!

  20. One Shot Learning There is a Monkey

  21. One Shot Learning You are Dad, Its a better than a Monkey CNN !!

  22. Few Shot Learning There is a Dog

  23. Few Shot Learning There is another Dog

  24. Few Shot Learning You are Dad, Its a better than a Dog CNN !!

  25. Few Shot Learning

  26. Few Shot Learning

  27. Few Shot Learning

  28. Compute Power

  29. Transformer Variants

  30. Training Dataset

  31. Training Dataset - Filtering

  32. Training Dataset - Filtering - Fuzzy Deduplication

  33. Training Dataset - Filtering - Fuzzy Deduplication - Adding high quality dataset

  34. Training Dataset - Filtering - Fuzzy Deduplication - Adding high quality dataset - Overlapping Test Set

  35. Evaluations

  36. Language Modelling - SOTA on PTB - Omit the 4 Wikipedia-related tasks and one-billion word benchmark

  37. LAMBDA

  38. TriviaQA

  39. Translation

  40. Synthetic and Qualitative Tasks - Arithmetic - Word Scrambling and Manipulation - SAT Analogies - News Article Generation - Learning and Using NovelWords - Correcting English Grammar

  41. Arithmetic

  42. Word Scramble and Manipulation

  43. News Generation

  44. Limitations - Lowperformanceinsome NLPtasks Starts to lose coherenceoversufficientlylarge passages - Special difficulty with “common sense physics” like “If I putcheese - infridge,will it melt ?” Architecturaldrawbackis doesn’t have bidirectionalinfo and - denoisingobjectives

  45. Limitations - Poor sampleefficiency Ambiguityon fewshot learninglearns task fromscratch ? - Difficult inferencing, hugemodel - Lack of structuredknowledge -

  46. Fairness and Bias

  47. Fairness and Bias Race

  48. Fairness and Bias Race Religion

  49. Demos

  50. GPT3 : Demos

  51. GPT3 : Interaction with your own AR bot https://twitter.com/i/status/1294380308209508359

  52. GPT3 : Animate Your Maths From English https://twitter.com/i/status/1294652394739912704

  53. GPT3 : Building aWebsite https://youtu.be/LOhIS7kiKvM

  54. GPT3 : Context BasedDictionary https://twitter.com/i/status/1294631853224206339

  55. GPT3 : Describe YourDesign

  56. W eaknesses - Fails miserably on reasoning tasks, so in essence, GPT-3 is not a very good reasoning module at all (Vipul) - No saturation yet (Vipul)

  57. W eaknesses - Fails miserably on reasoning tasks, so in essence, GPT-3 is not a very good reasoning module at all (Vipul) - No saturation yet (Vipul) - In zero-shot or one-shot, choice of words for task description in context learning can introduce variance (Shantanu) - Limited context window of 2048 (Shantanu)

  58. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu)

  59. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu)

  60. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu)

  61. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu) - Limited context window of 2048 (Shantanu)

  62. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu) - Limited context window of 2048 (Shantanu) - Adversarial experiments to tweak the training samples articulately and present the adversarial examples to it at test time for inference. (Vipul)

  63. Thankyou

  64. R eferences - https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a - https://www.youtube.com/watch?v=SY5PvZrJhLE - https://jalammar.github.io/how-gpt3-works-visualizations-animations/ - https://www.youtube.com/watch?v=8psgEDhT1MM&vl=en - https://www.youtube.com/watch?v=7qPDwsCLbZc&t=3959s - Language Models are Few-Shot Learners (Brown et. al) - https://www.youtube.com/watch?v=Mq97CF02sRY

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend