AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret - PowerPoint PPT Presentation

✞ ✞ ✙ ✑ ✎ ➾ ✓ ❝ ❜ ê ✞ ❜ ✬ ❜ ❜ ❝ ✞ ✞ ✞ ✞ å ✞ ✞ ❜ ✞ � ê LeNet-5 �✂✁☎✄✝✆✟✞✠✄☛✡✌☞✎✍✟✏✒✑✓✏✂✏✂✏✎✔✖✕☛✄☎✗☛✏✙✘✛✚✙✏✂✁✢✜✤✣✥✣✧✦ C3: f. maps 16@10x10 C1: feature maps S4: f. maps 16@5x5 INPUT 6@28x28 32x32 S2: f. maps C5: layer OUTPUT F6: layer 6@14x14 120 84 10 Full connection Gaussian connections Subsampling Subsampling Convolutions Convolutions Full connection ✁ ❼✿▲❍✪❦✪❾★❦➝❱❜✸✶❆✶✯✪✿ ✵✶✰❅❆r✵✻✳✪✸✶✰❷✷✴⑥✦P❩✰❺❤❀✰r✵✻❑ ✂✁ ★❚♦✱✎❻✇✷✹❈❯▼❯✷✹❴▲✳★✵✶✿▲✷✹❈✪✱✴❴✦❤❳✰❅✳✪✸✶✱✴❴☛❤❳✰r✵●✽❢✷✹✸✻❣✑❚★✯✪✰❅✸✶✰❜⑥✙✷✹✸❢❉✪✿▲❍✹✿ ✵✻✺✏✸✶✰❅❆❅✷✹❍✹❈✪✿ ✵✶✿▲✷✹❈➎❦✥❧✈✱✴❆❖✯✞❃★❴➂✱✴❈✪✰❨✿▲✺✏✱➜⑥✙✰❇✱❘✵✶✳★✸✶✰❷❋✛✱✴❃❩❚✄✿✁❦ ✰✹❦✪✱➜✺✶✰r✵✏✷✴⑥❼✳✪❈✪✿ ✵✻✺ ✽❳✯★✷✹✺✶✰❨✽❢✰❅✿▲❍✹✯❯✵✶✺❜✱✴✸✶✰❨❆❅✷✹❈✪✺ ✵✶✸✶✱✴✿▲❈✪✰❅❉✞✵✶✷✎◗✑✰❷✿▲❉★✰❇❈❯✵✻✿▲❆❺✱✴❴✁❦ (leCun et al., 1998) å➈ê➞è✧ç✙ï✶ëíï✖å➄è✧ÿ✙ä✥ï➽î➓å➄æ☎ê✒ø➑é➷è✥ç✙ï✶æ☎ä✧ï✖ú✐ø➀ì➈ÿ✎ê✌û➀å ✪✘ ❛ï➊ä❞ð✫ã✛ç✙ï➓è✧ä❿å➄ø➀é☎å ✑✡ ✙û➑ï ✛❊✢ ➳❺➸➑➳❯➺ ❩✭ ✝✆ � ❶ì➇ï ✱✯ ➵� ➊ø➑ï✖é✐è✒å➄é☎ù ➒✡ ✙ø✓å➈ê ✌� ➊ì➈é✐è✧ä✥ì➈û❦è✥ç✙ï➓ï ❯➘ ✟ï ✒� ❶è➞ì➄ë➵è✧ç✙ï✶ê✧ø ✠✂ ➈î➻ì➈ø✓ù❖é☎ì➈é ✦✝ ã✛ç✙ø➀ê➉ê✤ï ★� ↔è✧ø➀ì➈é✫ù✂ï❞ê ❺� ➊ä✧ø ✠✡ ◆ï✖ê❨ø➀é✺î❭ì❛ä✧ï✌ù✂ï➊è✥å➈ø➑û❀è✧ç✙ï➞å➈ä ❘� ❿ç✙ø➑è✧ï ★� ↔è✥ÿ✙ä✧ï➤ì➈ë û➀ø➑é✙ï❞å➄ä✥ø➩è ❅✘ ❛ð ➃➏ ⑥ë❦è✥ç✙ï ➉� ❶ì➇ï ✱✯ ➵� ➊ø➑ï✖é✐è➔ø➀ê♣ê✤î➓å➄û➀û ✶➌ ✙è✧ç☎ï➊é✺è✧ç☎ï➞ÿ✙é✙ø➑è➉ì➈æ◆ï➊ä❿å✠è✥ï✖ê ✗ ✝ï❞ñ➔ï➊è ❇✝ ✙ ✦➌ ❨è✥ç✙ï ①☞ ✇ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û➳ñ➔ï➊ÿ☎ä✥å➈û♣ñ➔ï➊è④ó✇ì❛ä✧ô➘ÿ☎ê✧ï✖ù✻ø➑é✲è✧ç✙ï ø➀é✺å ➵➍ ✐ÿ☎å❛ê✤ø ✙✝ ⑥û➑ø➀é✙ï✖å➈ä✛î❭ì✂ù✂ï ➎➌ ☎å➄é✎ù✶è✥ç✙ï➞ê✧ÿ ❼✡✦✝ ➠ê✧å➈î❭æ☎û➑ø➀é ❼✂ ➓û✓å ✪✘ ➈ï➊ä✛î➻ï➊ä✥ï➊û ✠✘ ï ❯↔ ✂æ◆ï➊ä✥ø➑î➻ï✖é❛è❿ê➊ð ✗ ❀ï✖ñ➔ï➊è ❇✝ ✙ ➉� ➊ì➈î➻æ✙ä✥ø➀ê✧ï✖ê ✔ ➞û➀å ✪✘ ❛ï➊ä❿ê ✄➌ ✐é✙ì➄è ✩� ❶ì❛ÿ✙é✐è✧ø➀é ❼✂ ❙è✧ç✙ï ✡ ✙û➀ÿ✙ä❿ê➻è✧ç✙ï❖ø➑é✙æ☎ÿ✂è✖ð ➏ ⑥ë➳è✥ç✙ï ↕� ❶ì➇ï ✱✯ ➵� ➊ø➑ï✖é✐è✶ø➀ê➽û➀å➈ä ❺✂ ❛ï ✑➌ ✇ê✧ÿ ❼✡✦✝ ➠ê✥å➄î➻æ✙û➀ø➑é ❼✂ ø➀é✙æ✙ÿ✂è ★➌ ☎å➄û➀û✟ì➄ë❀ó❨ç✙ø ✁� ❿ç ➙� ➊ì➈é✐è✥å➈ø➑é✶è✧ä❿å➄ø➀é☎å ❄✡ ☎û➑ï➳æ☎å➄ä❿å➄î➻ï➊è✧ï➊ä❿ê ✬➪ ⑨ó✇ï✖ø ✙✂ ❛ç✐è✥ê ✴➶ ↔ð ÿ✙é✙ø➑è✥ê ④� ➊å➈é ✆✡ ✎ï✒ê✧ï➊ï✖é✫å➈ê❨æ◆ï➊ä✧ëíì➈ä✥î➻ø➑é ❼✂ ➽å ☛ ✤é✙ì❛ø➀ê ❺✘ ✿ý ✞✍ ✌ ➻ì❛ä➉å ▲☛ ✤é✙ì❛ø➀ê ❺✘ ã✛ç✙ï➏ø➀é✙æ✙ÿ✙è❀ø➀ê❀å ❙❜ ❜✵✑ ❫æ☎ø ➟↔ ✂ï➊û➈ø➑î➓å ❄✂ ❛ï➈ð❀ã✛ç☎ø➀ê❀ø➀ê✝ê✧ø ✠✂ ➈é✙ø ✙➞✥� ➊å➈é✐è✧û ✠✘ ➉û✓å➄ä ❘✂ ➈ï➊ä ✕ ➉ñ ④✧ ✟✌ ➳ëíÿ☎é ✗� ↔è✥ø➑ì❛é➻ù✙ï➊æ◆ï➊é☎ù✂ø➀é ❼✂ ➤ì➈é❙è✧ç☎ï❨úrå➈û➑ÿ☎ï➵ì➈ë☎è✥ç✙ï ✔✡ ✙ø➀å❛ê➊ð❣þ➇ÿ ✗�✒� ❶ï❞ê ❅✝ ✑ ✪↔ è✧ç✎å➄é✺è✧ç☎ï➞û➀å➈ä ❺✂ ❛ï✖ê✤è ✬� ❿ç☎å➄ä❿å ✑� ❶è✧ï✖ä❨ø➑é✺è✥ç✙ï✒ù✙å➄è✥å ❄✡ ✎å➈ê✧ï ➣➪ ⑨å✠è♣î❭ì✐ê④è ✒✑ ❆✘ ♦↔ ✤✑ ✻✘ ê✧ø➑ú❛ï❙û✓å ✪✘ ➈ï✖ä✥ê♣ì➄ë ✔� ❶ì➈é➇ú❛ì➈û➀ÿ✂è✧ø➀ì➈é☎ê➳å➄é✎ù❺ê✧ÿ ❼✡✦✝ ➠ê✧å➈î➻æ✙û➑ø➀é ❼✂ ✺å➄ä✥ï✒è ❅✘ ➇æ✙ø ✁� ➊å➈û➑û ✠✘ æ✙ø ✙↔ ✂ï➊û✓ê ➓� ❶ï✖é❛è✥ï➊ä✥ï✖ù❺ø➑é ➞ ☎ï➊û✓ù ✗➶ ❶ð✶ã✛ç✙ï➓ä✥ï✖å➈ê✧ì➈é ø✓ê➤è✧ç☎å➄è➞ø➩è➞ø✓ê å➄û➑è✧ï✖ä✧é✎å✠è✧ï❞ù ❢➌ ☎ä✥ï✖ê✧ÿ✙û➑è✧ø➀é ❼✂ ➽ø➀é❖å ✏☛ ❺✡ ✙ø ➟✝ ⑥æ ☛✘ ➇ä✥å➈î❭ø✓ù ✌ ❇✰ ✇å✠è♣ï✖å ✑� ❿ç✫û✓å ✪✘ ➈ï✖ä ✒➌ ✙è✧ç✙ï Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction å ❈✑ ✻✺ ♦↔ ✤✑ ❆✺ 13 / 57 ù✂ï❞ê✤ø➀ä✥å ✑✡ ✙û➀ï✌è✧ç☎å➄è♣æ◆ì➄è✥ï➊é✐è✧ø✓å➄û❦ù✂ø✓ê④è✥ø➑é ✗� ❶è✧ø➀ú➈ï✒ëíï✖å✠è✥ÿ✙ä✥ï✖ê➉ê✧ÿ ✗� ❿ç➲å❛ê♣ê✤è✧ä✥ì➈ô❛ï é➇ÿ✙î ➉✡ ◆ï➊ä➳ì➄ë❣ëíï❞å✠è✥ÿ✙ä✧ï❙î➓å➄æ☎ê➉ø➀ê♣ø➑é ✥� ❶ä✥ï✖å➈ê✧ï✖ù✫å❛ê➔è✥ç✙ï❙ê✧æ☎å➄è✧ø✓å➄û❯ä✧ï❞ê✤ì❛û➑ÿ ✦✝ ï➊é✎ù ☛✝ ♠æ◆ì➈ø➀é✐è✥ê➵ì➈ä ✎� ❶ì❛ä✧é✙ï✖ä ✎� ✖å➄é✶å➈æ✙æ◆ï✖å➄ä ❊✣ ●➨✆➺➭➥❼➳ ❄ ✴➳✄➨✗➺r➳❯➡ ❣ì➈ë✝è✧ç☎ï♣ä✥ï ✒� ➊ï➊æ ✦✝ è✧ø➀ì➈é❭ø✓ê❯ù✙ï ✒� ❶ä✥ï✖å❛ê✤ï❞ù☛ð ✭ ✇å ✑� ❿ç➞ÿ☎é✙ø➩è❣ø➀é✒è✥ç✙ï➵è✧ç✙ø➀ä❿ù✒ç✙ø✓ù✙ù✂ï✖é❙û✓å ✪✘ ➈ï✖ä❇ø➀é ➑➞✗✂❄✝ è✧ø➀ú➈ï ✞➞ ☎ï✖û➀ù✿ì➈ë❀è✧ç☎ï✌ç✙ø ✙✂ ❛ç✙ï✖ê✤è ❇✝ ⑥û➀ï➊ú➈ï✖û◆ëíï✖å✠è✥ÿ✙ä✥ï➤ù✙ï❶è✧ï ★� ↔è✥ì➈ä❿ê➊ð ❨➏ ➠é ❖✗ ✝ï❞ñ➔ï❶è ❺✝ ✝✙ ÿ✙ä✥ï ✄✑ ➵î➓å ✪✘ ➉ç☎årú❛ï❣ø➑é☎æ✙ÿ✂è ❜� ➊ì➈é✙é☎ï ✒� ↔è✥ø➑ì❛é☎ê✟ëíä✥ì➈î ê✤ï✖ú➈ï✖ä✥å➈ûrëíï✖å✠è✥ÿ✙ä✥ï➏î➓å➄æ☎ê è✧ç☎ï♣ê✧ï❶è✇ì➈ë ✏� ❶ï➊é✐è✥ï➊ä❿ê➏ì➄ë☛è✥ç✙ï➉ä✥ï ✒� ➊ï➊æ✂è✥ø➑ú❛ï ✩➞ ☎ï✖û➀ù✙ê✇ì➄ë✟è✥ç✙ï➉û✓å➈ê✤è ✎� ➊ì➈é➇ú➈ì❛û➑ÿ ✦✝ ø➀é✢è✥ç✙ï✌æ✙ä✥ï➊ú➇ø➀ì➈ÿ☎ê✛û✓å ✪✘ ➈ï✖ä✖ð❦ã✛ç✙ï ➓� ❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é ☎✄ ✠ê✧ÿ ❼✡✦✝ ➠ê✧å➈î➻æ✙û➑ø➀é ❼✂➚� ❶ì➈î ➚✝ è✧ø➀ì➈é✎å➄û✂û✓å ✪✘ ➈ï➊ä ✛➪❖☞ ❀❜ ❼➌ ➈ê✧ï➊ï ✩✡ ◆ï➊û➀ì✠ó ✬➶ ❀ëíì➈ä✥î➶å ✹✑ ❆✘ ❄↔ ✤✑ ❆✘ ➳å➄ä✥ï✖å♣ø➑é❭è✧ç☎ï ✬� ❶ï✖é❛è✥ï➊ä ✡ ✙ø➀é☎å✠è✥ø➑ì❛é ✏➌ ☛ø➀é☎ê✤æ☎ø➑ä✥ï✖ù ✆✡☛✘ ✫õ➔ÿ ✗✡ ✎ï✖û➏å➄é☎ù ø➑ï❞ê✤ï✖û ✡❁ ê➉é☎ì➄è✧ø➀ì➈é✎ê➉ì➈ë ☛ ✧ê✧ø➑î ➚✝ ì➄ë❀è✧ç✙ï ✹❜ ✑ ➤ø➀é✙æ✙ÿ✂è❞ð❫ã✛ç✙ï➳ú✠å➄û➀ÿ✙ï✖ê➵ì➄ë❇è✧ç✙ï➳ø➀é✙æ✙ÿ✂è➔æ✙ø ✙↔ ✂ï➊û✓ê✛å➄ä✥ï➉é✙ì❛ä ❇✝ æ✙û➀ï ✖✌ ➤å➄é✎ù ☛ ❘� ❶ì❛î❭æ☎û➑ï ✄↔ ✔✌ ✞� ❶ï✖û➑û✓ê ✒➌ ➄ó➵å❛ê❦ø➑î➻æ✙û➀ï➊î➻ï✖é❛è✥ï✖ù❭ø➀é ➚✜ ☎ÿ✙ô➇ÿ☎ê✤ç☎ø➑î➓å ❂❁ ✑ ♦↔ î➓å➄û➀ø ✙➽ ✖ï✖ù❺ê✤ì✢è✧ç☎å➄è➤è✧ç✙ï ➝✡ ☎å ➎� ❿ô ❩✂ ❛ä✧ì❛ÿ✙é☎ù✫û➀ï➊ú❛ï➊û ✩➪ íó❨ç☎ø➩è✥ï ★➶t� ❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✙ê ñ➔ï✖ì ✦� ❶ì ✑✂ ❛é✙ø➑è✧ä✥ì➈é ✑✆✠ ❖➌ ✎è✧ç✙ì❛ÿ ❼✂ ➈ç✫é✙ì ➙✂ ➈û➀ì ✑✡ ☎å➈û➑û ✠✘ ✺ê✤ÿ✙æ◆ï➊ä✥ú➇ø➀ê✧ï✖ù✺û➀ï✖å➈ä✧é☎ø➑é ❼✂ è✧ì✫å✿ú✠å➄û➀ÿ✙ï➻ì➄ë ➃✝ ✘ ☎ð ✙➾ ❭å➄é☎ù❖è✧ç✙ï➻ëíì❛ä✧ï ✒✂ ➈ä✥ì➈ÿ✙é✎ù ✢➪➭✡ ✙û✓å ✑� ❿ô ❼➶✞� ❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✙ê æ✙ä✥ì ✦� ❶ï✖ù✙ÿ✙ä✧ï✌ê✧ÿ ✗� ❿ç✺å➈ê ✎✡ ☎å ✑� ❿ô ❩✝ ⑥æ✙ä✥ì➈æ☎å ✑✂ ❛å✠è✥ø➑ì❛é➓ó➵å❛ê❨årú✠å➄ø➀û➀å ✑✡ ✙û➑ï♣è✧ç☎ï➊é✝ð ❹✕ è✧ì ①➾ ❛ð ✙➾ ✍✔ ✬✙ ✙ð➻ã✛ç☎ø➀ê✌î➓å➈ô➈ï✖ê➳è✥ç✙ï➓î➻ï✖å➈é ø➀é✙æ✙ÿ✙è✒ä✧ì❛ÿ ❼✂ ➈ç☎û ✙✘ ✦✘ ✗➌ ❇å➈é☎ù❺è✧ç✙ï û✓å➄ä ❘✂ ➈ï✒ù✂ï ✒✂ ➈ä✥ï➊ï➞ì➈ë❫ø➀é✐ú✠å➈ä✧ø✓å➄é ✗� ➊ï✌è✧ì ➙✂ ➈ï✖ì➈î➻ï❶è✥ä✧ø ✁� ➤è✧ä❿å➄é✎ê④ëíì❛ä✧î➓å✠è✥ø➑ì❛é☎ê❨ì➈ë ú✠å➄ä✥ø➀å➈é ✗� ❶ï➳ä✥ì➈ÿ ❼✂ ❛ç✙û ✙✘➔➾ ➳ó❨ç☎ø ✠� ❿ç✺å ✑�✒� ❶ï➊û➀ï➊ä❿å✠è✥ï✖ê➵û➀ï✖å➄ä✥é✙ø➀é ❼✂ ❖✞ ✠ ⑥ð è✧ç☎ï❭ø➀é✙æ✙ÿ✙è ✌� ✖å➄é ➛✡ ✎ï➓å ✑� ❿ç☎ø➑ï✖ú➈ï✖ù➲ó❨ø➩è✥ç❖è✥ç✙ø➀ê➤æ☎ä✧ì ➎✂ ➈ä✥ï✖ê✥ê✤ø➀ú➈ï➞ä✥ï✖ù✙ÿ ✗� ↔è✥ø➑ì❛é ➏ ➠é➤è✧ç✙ï❫ëíì➈û➀û➑ì✠ó❨ø➀é ❼✂✗➌✪� ❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û✠û✓å ✪✘ ➈ï✖ä✥ê✝å➈ä✧ï❣û✓å ❄✡ ◆ï➊û➀ï✖ù ➉☞➜↔❢➌ ✠ê✧ÿ ❼✡✦✝ ì➄ë✝ê✧æ☎å✠è✥ø➀å➈û✙ä✥ï✖ê✧ì➈û➀ÿ✂è✧ø➀ì➈é ➵� ❶ì❛î➻æ✎ï✖é☎ê✧å➄è✧ï❞ù ➝✡❩✘ ❭å✌æ✙ä✥ì ✑✂ ❛ä✧ï❞ê✧ê✧ø➑ú❛ï❫ø➀é ✗� ❶ä✥ï✖å❛ê✤ï ê✥å➄î➻æ✙û➀ø➑é ❼✂ ✢û➀å ✪✘ ❛ï➊ä❿ê➉å➈ä✧ï➞û✓å ❄✡ ◆ï➊û➀ï✖ù þ ❩↔❢➌ ☛å➄é☎ù➲ëíÿ✙û➑û ✠✘❩✝❖� ➊ì➈é✙é☎ï ✒� ↔è✥ï✖ù✫û✓å ✪✘ ➈ï✖ä✥ê ì➄ë◆è✧ç☎ï➔ä✧ø ✁� ❿ç✙é✙ï❞ê✧ê❯ì➈ë✎è✥ç✙ï➔ä✥ï➊æ✙ä✥ï✖ê✧ï➊é✐è✥å➄è✧ø➀ì➈é ➐➪ ➺è✧ç☎ï➔é➇ÿ✙î ➉✡ ◆ï➊ä➏ì➈ë✎ëíï❞å✠è✥ÿ✙ä✧ï å➄ä✥ï➤û➀å ✑✡ ✎ï✖û➑ï❞ù ➙✜❳↔❢➌ ✙ó❨ç✙ï✖ä✧ï ✞↔ ✿ø➀ê➵è✥ç✙ï✌û➀å ✪✘ ❛ï➊ä✛ø➀é☎ù✂ï ❯↔ ☛ð î➓å➄æ☎ê ✴➶ ↔ð ✗ ❀å ✪✘ ➈ï✖ä ➑☞④➾ ➽ø✓ê✒å ✆� ❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û❣û✓å ✪✘ ➈ï✖ä➞ó❨ø➩è✥ç ✮✓ ✺ëíï✖å➄è✧ÿ✙ä✥ï➓î➓å➄æ☎ê✖ð þ➇ø➑é ✥� ❶ï✌å➄û➀û☎è✥ç✙ï➤ó✇ï✖ø ✙✂ ❛ç✐è✥ê➵å➄ä✥ï➉û➀ï✖å➄ä✥é✙ï❞ù➽ó❨ø➑è✧ç ➐✡ ☎å ✑� ❿ô ❩✝ ⑥æ✙ä✧ì❛æ☎å ❄✂ ✐å✠è✥ø➑ì❛é ✏➌ ✭ ❫å ➎� ❿ç➻ÿ✙é✙ø➑è➵ø➀é➓ï✖å ➎� ❿ç➻ëíï✖å✠è✥ÿ✙ä✥ï➔î➓å➄æ➽ø✓ê ➜� ➊ì➈é✙é✙ï ★� ↔è✥ï✖ù➻è✥ì✒å ✙ ✪↔ ✤✙ ➤é☎ï➊ø ✠✂ ➈ç ✦✝ � ❶ì❛é➇ú➈ì➈û➀ÿ✂è✥ø➑ì❛é☎å➄û✛é✙ï➊è④ó✇ì❛ä✧ô✂ê ➝� ➊å➈é ⑧✡ ◆ï➲ê✤ï✖ï➊é➘å➈ê➓ê ❺✘ ➇é❛è✥ç✙ï✖ê✧ø ✠➽ ➊ø➀é ❼✂ ❖è✥ç✙ï➊ø➀ä ✡ ◆ì➈ä✥ç✙ì➇ì➇ù✒ø➑é✒è✧ç✙ï✛ø➀é✙æ✙ÿ✂è❞ð❯ã✛ç☎ï✛ê✧ø ✠➽ ➊ï❫ì➈ë☎è✥ç✙ï❫ëíï❞å✠è✥ÿ✙ä✧ï➵î➓å➄æ☎ê❯ø✓ê ì✠ó❨é ëíï✖å➄è✧ÿ✙ä✥ï➓ï ❯↔ ➇è✧ä❿å ✑� ❶è✧ì❛ä✖ð➽ã✛ç✙ï✶ó✇ï✖ø ✙✂ ❛ç❛è➞ê✧ç☎å➄ä✥ø➀é ❼✂ ✿è✧ï ✒� ❿ç☎é✙ø ✠➍ ✐ÿ✙ï✶ç☎å➈ê ✑ ❆✺ ♦↔ ✤✑ ✻✺ ó❨ç✙ø ✁� ❿ç❖æ✙ä✥ï➊ú❛ï➊é✐è✥ê t� ➊ì➈é✙é✙ï ★� ↔è✥ø➑ì❛é➲ëíä✥ì➈î è✧ç☎ï❭ø➀é✙æ✙ÿ✙è✌ëíä✧ì❛î ë⑨å➄û➀û➑ø➀é ❼✂ ✿ì ❄➘ è✧ç☎ï✢ø➀é✐è✧ï➊ä✥ï✖ê✤è✧ø➀é ❼✂ ê✤ø✓ù✂ï✿ï ❯➘ ✟ï ✒� ↔è➻ì➄ë➔ä✥ï✖ù✙ÿ ✗� ❶ø➀é ❼✂ ➲è✥ç✙ï✿é✐ÿ☎î ➉✡ ◆ï➊ä➻ì➄ë❨ëíä✥ï➊ï è✧ç☎ï ➑✡ ◆ì➈ÿ✙é✎ù✙å➄ä ❘✘ ➈ð ➓☞④➾➑� ➊ì➈é✐è✥å➈ø➑é☎ê ➉➾ ✙ ✻✓ ➻è✧ä❿å➄ø➀é☎å ❄✡ ☎û➑ï❙æ☎å➄ä❿å➄î➻ï❶è✥ï➊ä❿ê ✄➌ ◆å➄é☎ù æ☎å➈ä✥å➈î❭ï➊è✧ï✖ä✥ê ✒➌ ✛è✧ç✙ï✖ä✧ï ✒✡☛✘ ä✥ï✖ù✙ÿ ✗� ❶ø➀é ❼✂ è✥ç✙ï ☛ ❺� ➊å➈æ☎å ✑� ➊ø➩è ❅✘ ✔✌ ➷ì➈ë➞è✧ç✙ï î➓å ♦✝ ❜ ✻✘ ✫❝ ➑� ❶ì❛é✙é✙ï ★� ↔è✧ø➀ì➈é✎ê➊ð � ❿ç✙ø➀é✙ï➉å➈é☎ù➻ä✧ï❞ù✂ÿ ✗� ➊ø➑é ❼✂ ➤è✥ç✙ï ✛✂ ✐å➄æ ➝✡ ✎ï➊è④ó✇ï✖ï➊é➻è✥ï✖ê✤è❫ï➊ä✥ä✧ì❛ä➏å➄é☎ù❭è✥ä✥å➈ø➑é☎ø➑é ❼✂ ✑ ✦➌ ✗ ❀å ✪✘ ➈ï✖ä➔þ ✑ ❭ø➀ê➉å➓ê✤ÿ ❼✡❼✝ ⑥ê✥å➄î➻æ✙û➀ø➑é ✗✂ ❭û✓å ✪✘ ➈ï✖ä❨ó❨ø➩è✥ç ✓ ❙ëíï✖å✠è✥ÿ✙ä✥ï✌î➻å➈æ☎ê❨ì➈ë ï➊ä✥ä✥ì➈ä ❜✫❝✬✠ ⑥ð➽ã✛ç✙ï➓é✙ï➊è④ó✇ì❛ä✧ô➲ø➀é ➒➞✗✂ ➈ÿ☎ä✧ï ❵✑ ✫� ❶ì❛é✐è✥å➄ø➀é☎ê ✹❜✫❝ ✗✘ ❼➌ � ➊ì➈é ✦✝ ❀✻✘✗✺ é✙ï ★� ↔è✧ø➀ì➈é✎ê ✄➌♦✡ ✙ÿ✙è❦ì❛é✙û ✠✘ ✘✻✘✗✘ ✛è✧ä❿å➄ø➀é☎å ❄✡ ☎û➑ï❫ëíä✧ï✖ï❫æ☎å➈ä✥å➈î❭ï➊è✧ï✖ä✥ê ❳✡ ✎ï ★� ➊å➈ÿ☎ê✤ï ê✧ø ✙➽ ✖ï ✞➾ ✖❝ ❄↔❢➾ ❪❝ ✎ð ★✭ ❫å ➎� ❿ç✒ÿ✙é☎ø➩è❣ø➀é❭ï❞å ✑� ❿ç✒ëíï✖å✠è✥ÿ✙ä✥ï➵î➓å➈æ❙ø✓ê ❷� ❶ì❛é✙é✙ï ✒� ❶è✧ï❞ù➞è✥ì➳å ✓✗✘ ❼➌ ì➄ë❯è✧ç☎ï✌ó✇ï✖ø ✙✂ ❛ç❛è➔ê✧ç☎å➄ä✥ø➀é ❼✂ ☎ð ✑ ✪↔ ✤✑ ➤é☎ï➊ø ✠✂ ➈ç ☛✡ ✎ì❛ä✧ç☎ì✐ì✂ù➻ø➀é➓è✧ç☎ï ④� ❶ì❛ä✧ä✥ï✖ê✧æ◆ì➈é☎ù✂ø➀é ❼✂ ➤ëíï❞å✠è✧ÿ☎ä✧ï♣î➻å➈æ➓ø➑é ✫☞④➾ ➈ð ã✛ç✙ï➳ëíì➈ÿ☎ä❨ø➑é✙æ☎ÿ✂è✥ê✛è✥ì➓å✒ÿ✙é☎ø➩è➔ø➀é➲þ ✑ ✒å➈ä✧ï✌å❛ù✙ù✂ï✖ù ✏➌ ➇è✧ç✙ï✖é✢î❙ÿ✙û➩è✥ø➑æ☎û➑ø➀ï✖ù ✜ ❇ø ✙↔ ✂ï✖ù ☛✝ ➠ê✧ø ✙➽ ✖ï ❭☞ ✇ì❛é✐ú❛ì➈û➀ÿ✂è✧ø➀ì➈é✎å➄û❙ñ➉ï❶è④ó➵ì➈ä✥ô✂ê❖ç☎årú❛ï ➹✡ ◆ï➊ï✖é➶å➄æ☎æ✙û➑ø➀ï✖ù è✧ì î➓å➈é ❩✘ ❑å➄æ✙æ✙û➀ø ✁� ➊å✠è✥ø➑ì❛é☎ê ✒➌ ➵å➈î❭ì❛é ❼✂ ì➄è✧ç☎ï➊ä➽ç☎å➈é☎ù✂ó❨ä✥ø➩è✥ø➑é ✗✂ ä✥ï ✒� ➊ì ✑✂ ❛é✙ø ➟✝ å✫è✧ä❿å➄ø➀é☎å ✑✡ ✙û➑ï ➐� ❶ì➇ï ❈✯ ➣� ➊ø➑ï✖é❛è ★➌ ➏å➄é☎ù❑å➈ù✙ù✙ï✖ù➷è✥ì å➲è✧ä❿å➄ø➀é☎å ❄✡ ✙û➀ï ➣✡ ✙ø✓å➈ê✖ð ✡☛✘ ã✛ç✙ï➲ä✥ï✖ê✧ÿ✙û➑è➽ø➀ê✶æ☎å➈ê✥ê✧ï✖ù❑è✧ç✙ä✥ì➈ÿ ✗✂ ➈ç❍å ê✤ø ✠✂ ➈î➻ì❛ø➀ù✙å➈û✛ëíÿ✙é ✗� ❶è✧ø➀ì➈é✝ð ã✛ç✙ï è✧ø➀ì➈é ✩✞ ✠ ❖➌ ☛î➓å ✑� ❿ç✙ø➀é✙ï ❯✝ ⑥æ✙ä✥ø➑é✐è✥ï✖ù ➔� ❿ç✎å➄ä❿å ✑� ↔è✥ï➊ä♣ä✧ï ★� ❶ì ✑✂ ❛é✙ø➑è✧ø➀ì➈é ✙✆✠ ❖➌ ✔ ✆✠ ✶➌ ❜ ✗✓ ✑ ✪↔ ✤✑ ✶ä✧ï ★� ❶ï✖æ✂è✧ø➀ú➈ï ➑➞ ☎ï➊û✓ù✙ê✌å➈ä✧ï❙é✙ì➈é ✦✝ ⑥ì✠ú➈ï✖ä✧û✓å➄æ☎æ✙ø➑é ✗✂✗➌ ✎è✧ç✙ï✖ä✧ï➊ëíì➈ä✥ï✒ëíï❞å✠è✧ÿ☎ä✧ï ì➈é ❼✝ ♠û➀ø➑é☎ï✾ç☎å➈é☎ù✂ó❨ä✥ø➩è✥ø➑é ✗✂ ➶ä✥ï ✒� ➊ì ✑✂ ➈é☎ø➩è✥ø➑ì❛é ✠ ❖➌ ➲å➄é☎ù✴ë⑨å ✑� ➊ï✾ä✥ï ✒� ➊ì ✑✂ ❛é✙ø ➟✝ ❜ ✗✺ î➓å➄æ☎ê✒ø➑é❤þ ✤✑ ✺ç☎årú➈ï➻ç☎å➈û➩ë✛è✥ç✙ï✶é➇ÿ✙î ➑✡ ✎ï✖ä✒ì➄ë✛ä✥ì✠ó➔ê➞å➄é☎ù ①� ➊ì➈û➀ÿ✙î➻é å➈ê è✧ø➀ì➈é ✠ ⑥ð ✜ ❇ø ✙↔ ✂ï✖ù ☛✝ ➠ê✧ø ✙➽ ✖ï ➹� ❶ì❛é✐ú❛ì➈û➀ÿ✂è✧ø➀ì➈é✎å➄û➞é✙ï➊è④ó✇ì❛ä✧ô✂ê✿è✥ç☎å✠è➷ê✤ç☎å➈ä✧ï ❜ ✗❀ ëíï✖å➄è✧ÿ✙ä✥ï➤î➓å➄æ☎ê❨ø➀é ➔☞④➾ ➈ð ✗ ❇å ✪✘ ➈ï➊ä❨þ ✑ ❭ç☎å❛ê t➾ ✑ ✒è✧ä❿å➄ø➀é☎å ✑✡ ✙û➑ï➤æ☎å➈ä✥å➈î➻ï❶è✧ï✖ä✥ê ó➵ï➊ø ✠✂ ➈ç✐è✥ê➻å➄û➀ì➈é ✗✂ ê✤ø➀é ❼✂ ❛û➑ï✿è✧ï✖î➻æ✎ì❛ä✥å➈û✛ù✙ø➑î➻ï➊é✎ê✤ø➀ì➈é➘å➄ä✥ï✢ô➇é✙ì✠ó❨é å➈ê å➄é✎ù ✳✙ ❼➌ ✺✗✺✻✘ ➚� ❶ì➈é☎é✙ï ✒� ❶è✧ø➀ì➈é☎ê✖ð ã✛ø➀î❭ï ✄✝❖✧ ♣ï➊û✓å ✪✘ ♣ñ➉ï➊ÿ✙ä❿å➄û✐ñ➔ï➊è④ó✇ì❛ä✧ô✂ê ❷➪ ⑨ã ✩✧ ➳ñ➉ñ♣ê ❘➶ ❶ð❯ã ✩✧ ➳ñ➉ñ♣ê❀ç✎årú➈ï ❷✡ ◆ï➊ï➊é ÿ☎ê✧ï✖ù❖ø➑é➲æ☎ç✙ì➈é✙ï✖î➻ï❙ä✥ï ✒� ➊ì ✑✂ ➈é☎ø➩è✥ø➑ì❛é ✢➪ íó❨ø➑è✧ç✙ì❛ÿ✂è➤ê✧ÿ ❼✡✦✝ ➠ê✥å➄î➻æ✙û➀ø➑é ❼✂☛➶ ✜✞ ✗ ❀å ✪✘ ➈ï✖ä ✞☞ ❀❜ ➻ø✓ê♣å ➣� ➊ì➈é➇ú➈ì❛û➑ÿ✙è✧ø➀ì➈é☎å➈û☛û➀å ✪✘ ❛ï➊ä➔ó❨ø➑è✧ç ✢➾ ✒✓ ❭ëíï✖å➄è✧ÿ✙ä✥ï➞î➓å➄æ☎ê✖ð ❝ ✗✘ ✬✠ ✶➌ ❝ ✗➾ ✡✠ ❖➌ ➻ê✤æ◆ì➈ô❛ï➊éòó➵ì➈ä❿ù ä✥ï ✒� ❶ì ➎✂ ➈é✙ø➑è✧ø➀ì➈é ➪ ⑨ó❨ø➩è✥ç✭ê✧ÿ ❼✡✦✝ ➠ê✥å➄î➻æ✙û➀ø➑é ❼✂☛➶ ✭ ❫å ➎� ❿ç✺ÿ✙é✙ø➑è➉ø➑é✫ï❞å ✑� ❿ç✿ëíï✖å➄è✧ÿ✙ä✥ï➞î➓å➄æ✺ø✓ê ✬� ➊ì➈é✙é✙ï ★� ↔è✥ï✖ù✢è✧ì✶ê✧ï➊ú❛ï➊ä❿å➄û ❝✵✑ ✠ ✶➌ ✙ ♦↔ ✠ ❖➌ ✌ì➈é ❼✝ ♠û➀ø➑é☎ï ä✧ï ★� ❶ì ✑✂ ❛é✙ø➑è✧ø➀ì➈é ì➄ë➻ø➀ê✧ì➈û✓å✠è✥ï✖ù❲ç☎å➈é☎ù✂ó❨ä✥ø➩è✧è✧ï➊é ②� ❿ç✎å➄ä❿å ✑�✹✝ é✙ï✖ø ✙✂ ❛ç ❩✡ ◆ì➈ä✥ç✙ì➇ì✂ù✙ê✺å✠è✫ø✓ù✂ï➊é✐è✧ø ✁� ➊å➈û➤û➑ì ✦� ➊å➄è✧ø➀ì➈é☎ê✺ø➀é å❑ê✧ÿ ❼✡ ☎ê✧ï❶è➲ì➈ë➻þ ✑ ❂❁ è✧ï✖ä✥ê ✹✞ ❝✫❝✫✠ ✶➌ ✙å➈é☎ù✺ê✤ø ✠✂ ➈é✎å✠è✧ÿ☎ä✧ï➳ú➈ï✖ä✧ø ✙➞✥� ➊å➄è✧ø➀ì➈é ❝✦✙✆✠ ⑥ð ëíï✖å➄è✧ÿ✙ä✥ï✿î➻å➈æ☎ê✖ð❑ã❯å ❄✡ ✙û➀ï ➙➏ ❭ê✤ç☎ì✠ó➔ê✒è✥ç✙ï✫ê✤ï➊è❭ì➈ë➤þ ✑ ➲ëíï✖å➄è✧ÿ✙ä✥ï✿î➻å➈æ☎ê

AlexNet (Krizhevsky et al., 2012) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 14 / 57

GoogLeNet softmax2 SoftmaxActivation FC AveragePool 7x7+1(V) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat Conv Conv Conv Conv softmax1 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool SoftmaxActivation 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool FC 3x3+2(S) DepthConcat FC Conv Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) 1x1+1(S) Conv Conv MaxPool AveragePool 1x1+1(S) 1x1+1(S) 3x3+1(S) 5x5+3(V) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat softmax0 Conv Conv Conv Conv SoftmaxActivation 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool FC 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat FC Conv Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) 1x1+1(S) Conv Conv MaxPool AveragePool 1x1+1(S) 1x1+1(S) 3x3+1(S) 5x5+3(V) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool 3x3+2(S) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) DepthConcat Conv Conv Conv Conv 1x1+1(S) 3x3+1(S) 5x5+1(S) 1x1+1(S) Conv Conv MaxPool 1x1+1(S) 1x1+1(S) 3x3+1(S) MaxPool 3x3+2(S) LocalRespNorm Conv 3x3+1(S) Conv 1x1+1(V) LocalRespNorm MaxPool 3x3+2(S) Conv 7x7+2(S) (Szegedy et al., 2015) input Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 15 / 57

image Resnet 7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000 (He et al., 2015) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 16 / 57

Deep learning is built on a natural generalization of a neural network: a graph of tensor operators , taking advantage of • the chain rule (aka “back-propagation”), • stochastic gradient decent, • convolutions, • parallel operations on GPUs. This does not differ much from networks from the 90s Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 17 / 57

This generalization allows to design complex networks of operators dealing with images, sound, text, sequences, etc. and to train them end-to-end. (Yeung et al., 2015) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 18 / 57

CIFAR10 32 × 32 color images, 50k train samples, 10k test samples. (Krizhevsky, 2009, chap. 3) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 19 / 57

Performance on CIFAR10 100 Real et al. (2018) Graham (2015) 95 Human performance Krizhevsky et al. (2012) Accuracy (%) 90 85 80 75 2010 2012 2014 2016 2018 Year Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 20 / 57

method top-1 err. top-5 err. 8.43 † VGG [41] (ILSVRC’14) - GoogLeNet [44] (ILSVRC’14) - 7.89 VGG [41] (v5) 24.4 7.1 PReLU-net [13] 21.59 5.71 BN-inception [16] 21.99 5.81 ResNet-34 B 21.84 5.71 ResNet-34 C 21.53 5.60 ResNet-50 20.74 5.25 ResNet-101 19.87 4.60 19.38 4.49 ResNet-152 Table 4. Error rates (%) of single-model results on the ImageNet validation set (except † reported on the test set). method top-5 err. ( test ) VGG [41] (ILSVRC’14) 7.32 GoogLeNet [44] (ILSVRC’14) 6.66 VGG [41] (v5) 6.8 PReLU-net [13] 4.94 BN-inception [16] 4.82 ResNet (ILSVRC’15) 3.57 Table 5. Error rates (%) of ensembles . The top-5 error is on the test set of ImageNet and reported by the test server. (He et al., 2015) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 21 / 57

Current application domains Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 22 / 57

Object detection and segmentation (Pinheiro et al., 2016) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 23 / 57

Human pose estimation (Wei et al., 2016) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 24 / 57

Image generation (Radford et al., 2015) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 25 / 57

Reinforcement learning Self-trained, plays 49 games at human level. (Mnih et al., 2015) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 26 / 57

Strategy games March 2016, 4-1 against a 9-dan professional without handicap. (Silver et al., 2016) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 27 / 57

Translation “The reason Boeing are doing this is to cram more seats in to make their plane more competitive with our products,” said Kevin Keniston, head of passenger comfort at Europe’s Airbus. “La raison pour laquelle Boeing fait cela est de cr´ eer plus de si` eges pour rendre son avion plus comp´ etitif avec nos produits”, a d´ eclar´ e Kevin Keniston, chef ➙ du confort des passagers chez Airbus. When asked about this, an official of the American administration replied: “The United States is not conducting electronic surveillance aimed at offices of the World Bank and IMF in Washington.” Interrog´ e ` a ce sujet, un fonctionnaire de l’administration am´ ericaine a r´ epondu: “Les ´ Etats-Unis n’effectuent pas de surveillance ´ electronique ` a l’intention des ➙ bureaux de la Banque mondiale et du FMI ` a Washington” (Wu et al., 2016) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 28 / 57

Auto-captioning (Vinyals et al., 2015) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 29 / 57

Question answering I: Jane went to the hallway. I: Mary walked to the bathroom. I: Sandra went to the garden. I: Daniel went back to the garden. I: Sandra took the milk there. Q: Where is the milk? A: garden I: It started boring, but then it got interesting. Q: What’s the sentiment? A: positive (Kumar et al., 2015) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 30 / 57

Why does it work now? Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 31 / 57

The success of deep learning is multi-factorial: • Five decades of research in machine learning, • CPUs/GPUs/storage developed for other purposes, • lots of data from “the internet”, • tools and culture of collaborative and reproducible science, • resources and efforts from large corporations. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 32 / 57

Five decades of research in ML provided • a taxonomy of ML concepts (classification, generative models, clustering, kernels, linear embeddings, etc.), • a sound statistical formalization (Bayesian estimation, PAC), • a clear picture of fundamental issues (bias/variance dilemma, VC dimension, generalization bounds, etc.), • a good understanding of optimization issues, • efficient large-scale algorithms. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 33 / 57

From a practical perspective, deep learning • lessens the need for a deep mathematical grasp, • makes the design of large learning architectures a system/software development task, • allows to leverage modern hardware (clusters of GPUs), • does not plateau when using more data, • makes large trained networks a commodity. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 34 / 57

10 12 10 9 Flops per USD 10 6 10 3 10 0 10 -3 1960 1970 1980 1990 2000 2010 2020 (Wikipedia “FLOPS”) TFlops ( 10 12 ) Price GFlops per $ Intel i7-6700K 0 . 2 $344 0 . 6 AMD Radeon R-7 240 0 . 5 $55 9 . 1 NVIDIA GTX 750 Ti 1 . 3 $105 12 . 3 AMD RX 480 5 . 2 $239 21 . 6 NVIDIA GTX 1080 8 . 9 $699 12 . 7 Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 35 / 57

10 12 10 9 Bytes per USD 10 6 10 3 1980 1990 2000 2010 2020 (John C. McCallum) The typical cost of a 4Tb hard disk is $120 (Dec 2016). Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 36 / 57

80 80 Inception-v3 ResNet-101 ResNet-50 VGG-19 VGG-16 75 75 ResNet-34 Top-1 accuracy [%] Top-1 accuracy [%] 70 70 ResNet-18 GoogLeNet 65 65 BN-NIN 5M 35M 65M 95M 125M 155M 60 60 BN-AlexNet 55 55 AlexNet 50 50 t t N t 8 6 9 4 0 1 3 0 5 10 15 20 25 30 35 40 e e e I 1 1 1 3 5 0 v N N N N - - - - - 1 - n x x - e t G G t t e e e - N t o Operations [G-Ops] e e L G G N N N e l l B g i A A V V N t s s s p o - e e e s N o e R R R e c G B R n (Canziani et al., 2016) I Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 37 / 57 600 BN-NIN BN-NIN 500 GoogLeNet GoogLeNet Inception-v3 Inception-v3 500 AlexNet AlexNet 200 BN-AlexNet BN-AlexNet Foward time per image [ms] Foward time per image [ms] VGG-16 VGG-16 400 VGG-19 VGG-19 100 ResNet-18 ResNet-18 ResNet-34 ResNet-34 300 ResNet-50 ResNet-50 50 ResNet-101 ResNet-101 200 20 100 10 0 5 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Batch size [ / ] Batch size [ / ]

Implementing a deep network, PyTorch Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 38 / 57

Deep-learning development is usually done in a framework: Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley A fast, low-level, compiled backend to access computation devices, combined with a slow, high-level, interpreted language. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 39 / 57

We will use PyTorch for our examples. http://pytorch.org “PyTorch is a python package that provides two high-level features: • Tensor computation (like numpy) with strong GPU acceleration • Deep Neural Networks built on a tape-based autograd system You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed.” Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 40 / 57

MNIST data-set 28 × 28 grayscale images, 60k train samples, 10k test samples. (leCun et al., 1998) Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 41 / 57

class Net(nn.Module): def __init__(self): super(Net , self).__init__ () self.conv1 = nn.Conv2d (1, 32, kernel_size =5) self.conv2 = nn.Conv2d (32, 64, kernel_size =5) self.fc1 = nn.Linear (256 , 200) self.fc2 = nn.Linear (200 , 10) def forward(self , x): x = F.relu(F. max_pool2d (self.conv1(x), kernel_size =3)) x = F.relu(F. max_pool2d (self.conv2(x), kernel_size =2)) x = x.view(-1, 256) x = F.relu(self.fc1(x)) x = self.fc2(x) return x model = Net () mu , std = train_input .data.mean (), train_input .data.std () train_input .data.sub_(mu).div_(std) optimizer = optim.SGD(model. parameters (), lr = 1e -1) criterion , bs = nn. CrossEntropyLoss (), 100 model.cuda () criterion.cuda () train_input , train_target = train_input .cuda (), train_target .cuda () for e in range (10): for b in range(nb_train_samples , bs): output = model( train_input .narrow (0, b, bs)) loss = criterion(output , train_target .narrow (0, b, bs)) model.zero_grad () loss.backward () optimizer.step () ≃ 7s on a GTX1080, ≃ 1% test error Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 42 / 57

Learning from data Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 43 / 57

The general objective of machine learning is to capture regularity in data to make predictions. In our regression example, we modeled age and blood pressure as being linearly related, to predict the latter from the former. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 44 / 57

The general objective of machine learning is to capture regularity in data to make predictions. In our regression example, we modeled age and blood pressure as being linearly related, to predict the latter from the former. There are multiple types of inference that we can roughly split into three categories: • Classification ( e.g. object recognition, cancer detection, speech processing), • regression ( e.g. customer satisfaction, stock prediction, epidemiology), and • density estimation ( e.g. outlier detection, data visualization, sampling/synthesis). Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 44 / 57

Learning consists of finding in a set F of functionals a “good” f ∗ (or its parameters’ values) usually defined through a loss l : F × Z → R such that l ( f , z ) increases with how wrong f is on z . Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 45 / 57

Learning consists of finding in a set F of functionals a “good” f ∗ (or its parameters’ values) usually defined through a loss l : F × Z → R such that l ( f , z ) increases with how wrong f is on z . E.g. • for classification: l ( f , ( x , y )) = 1 { f ( x ) � = y } , • for regression: l ( f , ( x , y )) = ( f ( x ) − y ) 2 , • for density estimation l ( q , z ) = − log q ( z ). Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 45 / 57

Learning consists of finding in a set F of functionals a “good” f ∗ (or its parameters’ values) usually defined through a loss l : F × Z → R such that l ( f , z ) increases with how wrong f is on z . E.g. • for classification: l ( f , ( x , y )) = 1 { f ( x ) � = y } , • for regression: l ( f , ( x , y )) = ( f ( x ) − y ) 2 , • for density estimation l ( q , z ) = − log q ( z ). We are looking for an f with a small empirical loss : N L ( f ) = 1 � l ( f , Z n ) . N n =1 However, it may reflect poorly the “true” loss on test data. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 45 / 57

Capacity Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 46 / 57

Consider a polynomial model D � α d x d . ∀ x , α 0 , . . . , α D ∈ R , f ( x ; α ) = d =0 Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

Consider a polynomial model D � α d x d . ∀ x , α 0 , . . . , α D ∈ R , f ( x ; α ) = d =0 and training points ( x n , y n ) ∈ R 2 , n = 1 , . . . , N , minimize the quadratic loss � ( f ( x n ; α ) − y n ) 2 L ( α ) = n Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

Consider a polynomial model D � α d x d . ∀ x , α 0 , . . . , α D ∈ R , f ( x ; α ) = d =0 and training points ( x n , y n ) ∈ R 2 , n = 1 , . . . , N , minimize the quadratic loss � ( f ( x n ; α ) − y n ) 2 L ( α ) = n � D � 2 � � α d x d = n − y n n d =0 Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

Consider a polynomial model D � α d x d . ∀ x , α 0 , . . . , α D ∈ R , f ( x ; α ) = d =0 and training points ( x n , y n ) ∈ R 2 , n = 1 , . . . , N , minimize the quadratic loss � ( f ( x n ; α ) − y n ) 2 L ( α ) = n � D � 2 � � α d x d = n − y n n d =0 2 x 0 x D y 1 �       � . . . α 0 1 1 � � � . . . . � = . . .  − . �       � . . . . . �      � � � x 0 x D y N � . . . α D � N N Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

Consider a polynomial model D � α d x d . ∀ x , α 0 , . . . , α D ∈ R , f ( x ; α ) = d =0 and training points ( x n , y n ) ∈ R 2 , n = 1 , . . . , N , minimize the quadratic loss � ( f ( x n ; α ) − y n ) 2 L ( α ) = n � D � 2 � � α d x d = n − y n n d =0 2 x 0 x D y 1 �       � . . . α 0 1 1 � � � . . . . � = . . .  − . �       � . . . . . �      � � � x 0 x D y N � . . . α D � N N This is a standard quadratic problem, for which we have efficient algorithms. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 47 / 57

def fit_polynomial (D, x, y): N = x.size (0) X = Tensor(N, D + 1) Y = Tensor(N, 1) # Exercise: avoid the n loop for n in range(N): for d in range(D + 1): X[n, d] = x[n]**d Y[n, 0] = y[n] # LAPACK ’s GEneralized Least -Square alpha , _ = torch.gels(Y, X) return alpha Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 48 / 57

1.5 Data 1 0.5 0 -0.5 0 0.2 0.4 0.6 0.8 1 Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

Degree D=0 1.5 Data f * 1 0.5 0 -0.5 0 0.2 0.4 0.6 0.8 1 Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 49 / 57

Train Test 10 -1 Error (MSE) 10 -2 10 -3 0 1 2 3 4 5 6 7 8 9 Degree Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 50 / 57

We define the capacity of a set of predictors as its ability to model an arbitrary functional. Although it is difficult to define precisely, it is quite clear in practice how to increase or decrease it for a given class of models. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 51 / 57

We define the capacity of a set of predictors as its ability to model an arbitrary functional. Although it is difficult to define precisely, it is quite clear in practice how to increase or decrease it for a given class of models. • If the capacity is too low, the predictor does not fit the data. The training error is high, and reflects the test error. ⇒ Under-fitting . Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 51 / 57

We define the capacity of a set of predictors as its ability to model an arbitrary functional. Although it is difficult to define precisely, it is quite clear in practice how to increase or decrease it for a given class of models. • If the capacity is too low, the predictor does not fit the data. The training error is high, and reflects the test error. ⇒ Under-fitting . • If the capacity is too high, the predictor fits well the data including noise. The training error is low, and does not reflect the test error. ⇒ Over-fitting . Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 51 / 57

Proper evaluation protocols Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 52 / 57

Learning algorithms, in particular deep-learning ones, require the tuning of many meta-parameters. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 53 / 57

Learning algorithms, in particular deep-learning ones, require the tuning of many meta-parameters. These parameters have a strong impact on the performance, resulting in a “meta” over-fitting through experiments. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 53 / 57

Learning algorithms, in particular deep-learning ones, require the tuning of many meta-parameters. These parameters have a strong impact on the performance, resulting in a “meta” over-fitting through experiments. We must be extra careful with performance estimation. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 53 / 57

Learning algorithms, in particular deep-learning ones, require the tuning of many meta-parameters. These parameters have a strong impact on the performance, resulting in a “meta” over-fitting through experiments. We must be extra careful with performance estimation. Running 100 times our experiment on MNIST, with randomized weights, we get: Worst Median Best 1 . 3% 1 . 0% 0 . 82% Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 53 / 57

The ideal development cycle is Write code Train Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

The ideal development cycle is Write code Train Test Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

The ideal development cycle is Write code Train Test Results Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

The ideal development cycle is Write code Train Test Results or in practice something like Write code Train Test Results Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

The ideal development cycle is Write code Train Test Results or in practice something like Write code Train Test Results There may be over-fitting, but it does not bias the final performance evaluation. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 54 / 57

Unfortunately, it often looks like Write code Train Test Results Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

Unfortunately, it often looks like Write code Train Test Results This should be avoided at all costs. The standard strategy is to have a � separate validation set for the tuning. Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

Unfortunately, it often looks like Write code Train Test Results This should be avoided at all costs. The standard strategy is to have a � separate validation set for the tuning. Write code Train Validation Test Results Fran¸ cois Fleuret AMLD – Deep Learning in PyTorch / 1. Introduction 55 / 57

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret - PowerPoint PPT Presentation

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret http://fleuret.org/amld/ February 10, 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE Why learning Fran cois Fleuret AMLD Deep Learning in PyTorch / 1.

Introduction to PyTorch Outline Deep Learning RNN CNN Attention

Comparing TensorFlow 2.0 with PyTorch and PyTorch JIT Tim Lazarus 29 November, 2019 Comparing

PyTorch Review Session CS330: Deep Multi-task and Meta Learning 10/29/2020 Rafael Rafailov

How PyTorch Scales Deep Learning from Experimentation to Production Vincent Quenneville-Blair,

How PyTorch Optimizes Deep Learning Computations Vincent Quenneville-Blair, PhD. Facebook AI.

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Taking Advantage of Low Precision to Accelerate Training and Inference Using PyTorch Presented

How to train an image classifier using PyTorch Rogier van der Geer -- GoDataDriven What is

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Introduction to Deep Learning Outline Deep Learning RNN CNN Attention

Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020 Outline Deep Learning

Introduction Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures Parallel

CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming

Parametric Curves CS 318 Interactive Computer Graphics John C. Hart Linear Interpolation p 1 =(

Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size,

Filling multiples of embedded curves Robert Young University of Toronto Aug. 2013 Filling area

DEC PERLE Board as Board as DEC PERLE an EXAMPLE of an EXAMPLE of RECONFIGURABLE

Implementing AES on a Bunch of Processors ECRYPT AES Day Bruges, Belgium Tim Gneysu

Distributed HPC Systems ASD Distributed Memory HPC Workshop Computer Systems Group Research

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret - PowerPoint PPT Presentation

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret http://fleuret.org/amld/ February 10, 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE Why learning Fran cois Fleuret AMLD Deep Learning in PyTorch / 1.

Introduction to PyTorch Outline Deep Learning RNN CNN Attention

Comparing TensorFlow 2.0 with PyTorch and PyTorch JIT Tim Lazarus 29 November, 2019 Comparing

PyTorch Review Session CS330: Deep Multi-task and Meta Learning 10/29/2020 Rafael Rafailov

How PyTorch Scales Deep Learning from Experimentation to Production Vincent Quenneville-Blair,

How PyTorch Optimizes Deep Learning Computations Vincent Quenneville-Blair, PhD. Facebook AI.

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

&gt;&gt;&gt; ELEG5491: Introduction to Deep Learning &gt;&gt;&gt; PyTorch Tutorials Name: GE

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Taking Advantage of Low Precision to Accelerate Training and Inference Using PyTorch Presented

How to train an image classifier using PyTorch Rogier van der Geer -- GoDataDriven What is

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Introduction to Deep Learning Outline Deep Learning RNN CNN Attention

Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020 Outline Deep Learning

Introduction Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures Parallel

CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming

Parametric Curves CS 318 Interactive Computer Graphics John C. Hart Linear Interpolation p 1 =(

Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size,

Filling multiples of embedded curves Robert Young University of Toronto Aug. 2013 Filling area

DEC PERLE Board as Board as DEC PERLE an EXAMPLE of an EXAMPLE of RECONFIGURABLE

Implementing AES on a Bunch of Processors ECRYPT AES Day Bruges, Belgium Tim Gneysu

Distributed HPC Systems ASD Distributed Memory HPC Workshop Computer Systems Group Research

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE