CSE 5194.01: OpenAI and ONNX
John Herwig
CSE 5194.01: OpenAI and ONNX John Herwig CSE 5194.01 OpenAI What - - PowerPoint PPT Presentation
CSE 5194.01: OpenAI and ONNX John Herwig CSE 5194.01 OpenAI What is OpenAI? According to their website: What does a Google Search of OpenAI return? 2 CSE 5194.01 Open AI 3 CSE 5194.01 OpenAI 4 CSE 5194.01 OpenAI OpenAI: A Quick
CSE 5194.01: OpenAI and ONNX
John Herwig
2
What is OpenAI? According to their website:
What does a Google Search of OpenAI return?
CSE 5194.01
OpenAI
3
CSE 5194.01 – Open AI
4
CSE 5194.01
OpenAI
5
OpenAI: A Quick Glance
Sutskever and others
1 billion dollar investment in 2019
CSE 5194.01
OpenAI
6
OpenAI Projects
CSE 5194.01
OpenAI
7
What is GPT?
data and then fine-tune
preceding the given word, outputting one word at a time
CSE 5194.01
OpenAI
Gif from http://jalammar.github.io/illustrated-gpt2/
8
What is a Decoder Block?
CSE 5194.01
OpenAI
Image from http://jalammar.github.io/illustrated-gpt2/
9
Decoder Block: Masked Self Attention
CSE 5194.01
OpenAI
Image from http://jalammar.github.io/illustrated-gpt2/
10
Stack only Transformer Decoder Blocks and remove Encoder-Decoder layer
CSE 5194.01
OpenAI
Image from http://jalammar.github.io/illustrated-gpt2/
11
Simplest way to Allow GPT to operate: Let it “ramble”
CSE 5194.01
OpenAI
Image from http://jalammar.github.io/illustrated-gpt2/
12
Add 1st output to our input and predict the 2nd token:
CSE 5194.01
OpenAI
Image from http://jalammar.github.io/illustrated-gpt2/
13
Slight Differences: GPT-2 vs. GPT
residual network)
final self-attention block
the accumulation on the residual path with model depth is used.
CSE 5194.01
OpenAI
14
CSE 5194.01
OpenAI
Image from Improving Language Understanding by Generative Pre-Training
Original GPT:
15
CSE 5194.01
OpenAI
Image from Improving Language Understanding by Generative Pre-Training
4 different sizes of GPT-2:
16
CSE 5194.01
OpenAI
Differences between GPT-2 and GPT-3:
banded sparse attention patterns in the layers of the transformer
estimated to cost $4.6 million and take 355 years.
17
CSE 5194.01
OpenAI
Zero-shot vs. One-shot vs. Few-shot
many demonstrations are provided that will fit into a context-window (between 10-100 in GPT-3)
provided in addition to natural language instructions
language are provided
18
CSE 5194.01
OpenAI
Image from Language Models are Few Shot Learners
Results of GPT-3 on Lambada
19
20
CSE 5194.01
OpenAI
Quick Intro to Image GPT
to generate images?
predict pixels instead of language tokens
used to optimize a classification objective and adapts all weights.
21
CSE 5194.01
OpenAI
Image GPT Approach Overview
22
CSE 5194.01
OpenAI
Quick Intro to Jukebox
loss function designed to retain the maximum amount of musical information, while doing so at increasing levels of compression
shorter-length discrete latent encoding using a vector quantization
Generating Diverse High Fidelity Images
23
CSE 5194.01
OpenAI
Quick Intro to Jukebox (continued)
at compression) have one billion parameters and are trained on 128 V100s for 2 weeks, and
samples) has 5 billion parameters and is trained
24
CSE 5194.01
OpenAI
Jukebox Approach Overview
25
What is ONNX? According to their website:
We believe there is a need for greater interoperability in the AI tools
in enabling more of these tools to work together by allowing them to share models.
CSE 5194.01
ONNX
26
Background on ML frameworks
accomplished through computation over dataflow graphs.
Representation (IR) that
specific devices (CPU, GPU, FPGA, etc.).
CSE 5194.01
ONNX
27
Why do we need ONNX?
representation of these dataflow graphs
capabilities:
stage of a project’s development, another stage may require a different framework
CSE 5194.01
ONNX
28
How does ONNX do this?
computation graph model, as well as definitions of built-in operators and standard data types.
structured as a list of nodes that form an acyclic graph.
CSE 5194.01
ONNX
29
How does ONNX do this? (continued)
more outputs.
document its purpose, author, etc.
graph, but the set of built-in operators are portable across frameworks.
CSE 5194.01
ONNX
30
How does ONNX do this? (continued)
provide implementations of these operators
CSE 5194.01
ONNX
31
Example from keras to ONNX:
CSE 5194.01
ONNX
32
Open AI Links
OpenAI API request GPT-3 wrote this short film GPT-3 writes Guardian article GPT-3 Reddit account Write with Transformer (hugging face) AllenNLP (generate sentences using GPT-2) Text Generation API (generate more text) OpenAI Soundcloud https://jukebox.openai.com/ OpenAI github
CSE 5194.01
OpenAI
33
ONNX Links
ONNX github ONNX website ONNX tutorials
CSE 5194.01
OpenAI