full stack deep learning
play

Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh - PowerPoint PPT Presentation

Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh Tobin, Sergey Karayev, Pieter Abbeel Lifecycle of a ML project Cross-project Per-project infrastructure activities Planning & Team & hiring project setup Data


  1. Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 35

  2. 1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 36

  3. 1. Start simple Demystifying architecture selection Start here Consider using this later Images LeNet-like architecture ResNet Images LSTM with one hidden Attention model or Sequences layer (or temporal convs) WaveNet-like model Fully connected neural net Problem-dependent Other with one hidden layer Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 37

  4. 1. Start simple Dealing with multiple input modalities Input 1 Input 2 “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 38

  5. 1. Start simple Dealing with multiple input modalities 1. Map each into a lower dimensional feature space Input 1 Input 2 “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 39

  6. 1. Start simple Dealing with multiple input modalities 1. Map each into a lower dimensional feature space ConvNet Flatten (64-dim) Input 1 (72-dim) Input 2 LSTM (48-dim) “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 40

  7. 1. Start simple Dealing with multiple input modalities 2. Concatenate ConvNet Flatten Con“cat” (64-dim) Input 1 (72-dim) Input 2 LSTM (48-dim) “This” “is” Input 3 “a” (184-dim) “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 41

  8. 1. Start simple Dealing with multiple input modalities 3. Pass through fully connected layers to output ConvNet Flatten FC FC Output Concat (64-dim) Input 1 (72-dim) Input 2 T/F LSTM (48-dim) “This” “is” Input 3 “a” (184-dim) (128-dim) “cat” (256-dim) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 42

  9. 1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 43

  10. 1. Start simple Recommended network / optimizer defaults • Optimizer: Adam optimizer with learning rate 3e-4 • Activations: relu (FC and Conv models), tanh (LSTMs) • Initialization: He et al. normal (relu), Glorot normal (tanh) • Regularization: None • Data normalization : None Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 44

  11. <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> 
 
 
 <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> 1. Start simple Definitions of recommended initializers • (n is the number of inputs, m is the number of outputs) • He et al. normal (used for ReLU) 
 ! r 2 N 0 , n • Glorot normal (used for tanh) ! r 2 N 0 , n + m Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 45

  12. 1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 46

  13. 1. Start simple Important to normalize scale of input data • Subtract mean and divide by variance • For images, fine to scale values to [0, 1] or [-0.5, 0.5] 
 (e.g., by dividing by 255) 
 [Careful, make sure your library doesn’t do it for you!] Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 47

  14. 1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 48

  15. 1. Start simple Consider simplifying the problem as well • Start with a small training set (~10,000 examples) • Use a fixed number of objects, classes, image size, etc. • Create a simpler synthetic training set Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 49

  16. 1. Start simple Simplest model for pedestrian detection Running example • Start with a subset of 10,000 images for training, 1,000 for val, and 500 for test • Use a LeNet architecture with sigmoid cross-entropy loss • Adam optimizer with LR 3e-4 • No regularization 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 50

  17. 1. Start simple Starting simple Steps Summary • LeNet, LSTM, or Fully Choose a simple Connected a architecture • Adam optimizer & no regularization b Use sensible defaults • Subtract mean and divide by std, or just divide by c Normalize inputs 255 (ims) • Start with a simpler version of your problem d Simplify the problem (e.g., smaller dataset) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 51

  18. Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 52

  19. 2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 53

  20. 2. Implement & debug Preview: the five most common DL bugs • Incorrect shapes for your tensors 
 Can fail silently! E.g., accidental broadcasting: x.shape = (None,), y.shape = (None, 1), (x+y).shape = (None, None) • Pre-processing inputs incorrectly 
 E.g., Forgetting to normalize, or too much pre-processing • Incorrect input to your loss function 
 E.g., softmaxed outputs to a loss that expects logits • Forgot to set up train mode for the net correctly 
 E.g., toggling train/eval, controlling batch norm dependencies • Numerical instability - inf/NaN 
 Often stems from using an exp, log, or div operation Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 54

  21. 2. Implement & debug General advice for implementing your model Lightweight implementation • Minimum possible new lines of code for v1 • Rule of thumb: <200 lines • (Tested infrastructure components are fine) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 55

  22. 2. Implement & debug General advice for implementing your model Lightweight implementation Use o ff -the-shelf components, e.g., • • Minimum possible new lines of code Keras for v1 • tf.layers.dense(…) 
 • Rule of thumb: <200 lines instead of 
 tf.nn.relu(tf.matmul(W, x)) • (Tested infrastructure components are fine) • tf.losses.cross_entropy(…) 
 instead of writing out the exp Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 56

  23. 2. Implement & debug General advice for implementing your model Lightweight implementation Use o ff -the-shelf components, e.g., • • Minimum possible new lines of code Keras for v1 • tf.layers.dense(…) 
 • Rule of thumb: <200 lines instead of 
 tf.nn.relu(tf.matmul(W, x)) • (Tested infrastructure components are fine) • tf.losses.cross_entropy(…) 
 instead of writing out the exp Build complicated data pipelines later • Start with a dataset you can load into memory Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 57

  24. 2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 58

  25. 2. Implement & debug Implementing bug-free DL models Common Recommended resolution issues Get your model to a run Shape mismatch Step through model creation and inference in a debugger Casting issue Scale back memory intensive OOM operations one-by-one Standard debugging toolkit (Stack Other Overflow + interactive debugger) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 59

  26. 2. Implement & debug Implementing bug-free DL models Common Recommended resolution issues Get your model to a run Shape mismatch Step through model creation and inference in a debugger Casting issue Scale back memory intensive OOM operations one-by-one Standard debugging toolkit (Stack Other Overflow + interactive debugger) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 60

  27. 
 2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier 
 Option 1: step through graph creation Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 61

  28. 
 2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier 
 Option 2: step into training loop Evaluate tensors using sess.run(…) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 62

  29. 
 2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier 
 Option 3: use tfdb python -m tensorflow.python.debug.examples.debug_mnist --debug Stops execution at each sess.run(…) and lets you inspect Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 63

  30. 2. Implement & debug Implementing bug-free DL models Common Recommended resolution issues Get your model to a run Shape mismatch Step through model creation and inference in a debugger Casting issue Scale back memory intensive OOM operations one-by-one Standard debugging toolkit (Stack Other Overflow + interactive debugger) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 64

  31. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Shape • Confusing tensor.shape, tf.shape(tensor), mismatch Undefined tensor.get_shape() shapes • Reshaping things to a shape of type Tensor (e.g., when loading data from a file) • Flipped dimensions when using tf.reshape(…) Incorrect • Took sum, average, or softmax over wrong shapes dimension • Forgot to flatten after conv layers • Forgot to get rid of extra “1” dimensions (e.g., if shape is (None, 1, 1, 4) • Data stored on disk in a di ff erent dtype than loaded (e.g., stored a float64 numpy array, and loaded it as a float32) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 65

  32. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Casting issue • Forgot to cast images from uint8 to float32 Data not in • Generated data using numpy in float64, forgot to float32 cast to float32 Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 66

  33. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues • Too large a batch size for your model (e.g., OOM Too big a during evaluation) tensor • Too large fully connected layers • Loading too large a dataset into memory, rather Too much than using an input queue data • Allocating too large a bu ff er for dataset creation • Memory leak due to creating multiple models in Duplicating the same session operations • Repeatedly creating an operation (e.g., in a function that gets called over and over again) • Other processes running on your GPU Other processes Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 67

  34. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Other common • Forgot to initialize variables errors Other bugs • Forgot to turn o ff bias when using batch norm • “Fetch argument has invalid type” - usually you overwrote one of your ops with an output during training Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 68

  35. 2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 69

  36. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 70

  37. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b • Flipped the sign of the loss function / gradient batch • Learning rate too high Error goes up • Softmax taken over wrong dimension Error explodes Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 71

  38. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up • Numerical issue. Check all exp, log, and div operations Error explodes • Learning rate too high Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 72

  39. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes • Data or labels corrupted (e.g., zeroed, incorrectly shu ffl ed, or preprocessed incorrectly) Error oscillates • Learning rate too high Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 73

  40. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes Error oscillates • Learning rate too low • Gradients not flowing through the whole model • Too much regularization Error plateaus • Incorrect input to loss function (e.g., softmax instead of logits, accidentally add ReLU on output) • Data or labels corrupted Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 74

  41. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b • Flipped the sign of the loss function / gradient batch • Learning rate too high Error goes up • Softmax taken over wrong dimension • Numerical issue. Check all exp, log, and div operations Error explodes • Learning rate too high • Data or labels corrupted (e.g., zeroed or incorrectly shu ffl ed) Error oscillates • Learning rate too high • Learning rate too low • Gradients not flowing through the whole model • Too much regularization Error plateaus • Incorrect input to loss function (e.g., softmax instead of logits) • Data or labels corrupted Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 75

  42. 2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 76

  43. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful You can: 
 • Walk through code line-by-line and ensure you have the same output • Ensure your performance is up to par with expectations Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 77

  44. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) You can: 
 • Walk through code line-by-line and ensure you have the same output Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 78

  45. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) You can: 
 • Results from your model on a benchmark dataset (e.g., • Same as before, but with lower MNIST) confidence • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 79

  46. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from a paper (with no code) You can: 
 • Ensure your performance is up to par with expectations Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 80

  47. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) You can: 
 • Uno ffi cial model implementation • Make sure your model performs well in a simpler setting • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 81

  48. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation You can: 
 • Results from the paper (with no code) • Get a general sense of what kind of performance can be expected • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 82

  49. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) You can: 
 • Results from your model on a benchmark dataset (e.g., • Make sure your model is learning MNIST) anything at all • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 83

  50. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 84

  51. 2. Implement & debug Summary: how to implement & debug Steps Summary • Step through in debugger & watch out Get your model to for shape, casting, and OOM errors a run • Look for corrupted data, over- Overfit a single regularization, broadcasting errors b batch • Keep iterating until model performs Compare to a c up to expectations known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 85

  52. Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 86

  53. 3. Evaluate Bias-variance decomposition Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 87

  54. 3. Evaluate Bias-variance decomposition Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 88

  55. 3. Evaluate Bias-variance decomposition Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 89

  56. 3. Evaluate Bias-variance decomposition Breakdown of test error by source 40 2 34 35 5 32 30 2 27 25 25 20 15 10 5 0 r r r r g o o o o n s r r r r , i r r r r a . t e e e e e t i i b . f e n l t ) i r a g ( s e l e i b V a e e n v l i b r c T i ) o c t T g a n t u i n t d a f d e i r i i t e r s e o t a i d r v f V l r r a A n I e V u v o , . e . i ( Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 90

  57. 3. Evaluate Bias-variance decomposition Test error = irreducible error + bias + variance + val overfitting This assumes train, val, and test all come from the same distribution. What if not? Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 91

  58. 3. Evaluate Handling distribution shift Train data Test data Use two val sets: one sampled from training distribution and one from test distribution Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 92

  59. 3. Evaluate The bias-variance tradeoff Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 93

  60. 3. Evaluate Bias-variance with distribution shift Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 94

  61. 3. Evaluate Bias-variance with distribution shift Breakdown of test error by source 40 2 34 35 3 32 2 29 30 2 27 25 25 20 15 10 5 0 r r e r t r g r o f o o o o c n i s h r r r r r ) n i r r r a g r r s t e e e e e a t i n i b i n f e n r l l t i r t a a s a o e e l t i b a v v e V i i v l f t b r i T r u o c n t T e a s b u i d a l e d i a d r n r T i V t T e o u s r v i r D , A I . e . i ( Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 95

  62. 3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train - goal = 19% 
 (under-fitting) Train error 20% Validation error 27% 0 (no pedestrian) 1 (yes pedestrian) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 96

  63. 3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train error 20% Val - train = 7% 
 (over-fitting) Validation error 27% 0 (no pedestrian) 1 (yes pedestrian) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 97

  64. 3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train error 20% Validation error 27% Test - val = 1% 
 0 (no pedestrian) 1 (yes pedestrian) (looks good!) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 98

  65. 3. Evaluate Summary: evaluating model performance Test error = irreducible error + bias + variance 
 + distribution shift + val overfitting Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 99

  66. Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend