[PPT] - Backpack Strawberry Flute Traffic light Backpack PowerPoint Presentation

SLIDE 1

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Large ¡Scale ¡Visual ¡

Recogni1on ¡Challenge ¡(ILSVRC) ¡

h7p://image-‑net.org/challenges/LSVRC/ ¡

Olga ¡Russakovsky ¡ (Stanford ¡U.) ¡ Jia ¡Deng ¡ ¡ (U. ¡of ¡Michigan) ¡ Alexander ¡Berg ¡ (UNC ¡Chapel ¡Hill) ¡ Fei-‑Fei ¡Li ¡ (Stanford ¡U.) ¡ Sean ¡Ma ¡ ¡(Stanford ¡U.) ¡ Jonathan ¡Krause ¡ (Stanford ¡U.) ¡ Hao ¡Su ¡ ¡(Stanford ¡U.) ¡ Sanjeev ¡Satheesh ¡ (Stanford ¡U.) ¡ Zhiheng ¡Huang ¡ (Stanford ¡U.) ¡ Andrej ¡Karpathy ¡ (Stanford ¡U.) ¡ Aditya ¡Khosla ¡ (MIT) ¡ Michael ¡Bernstein ¡ (Stanford ¡U.) ¡

SLIDE 2

Backpack ¡

SLIDE 3

Backpack ¡ Flute ¡ Strawberry ¡ Traffic ¡light ¡ Bathing ¡cap ¡ Matchs1ck ¡ Racket ¡ Sea ¡lion ¡

SLIDE 4

Large-‑scale ¡recogni.on ¡

SLIDE 5

Large-‑scale ¡recogni.on ¡

Need ¡benchmark ¡datasets ¡

SLIDE 6

PASCAL ¡VOC ¡2005-‑2012 ¡

Classifica.on: ¡person, ¡motorcycle ¡ Detec1on ¡ Segmenta1on ¡

Person ¡ Motorcycle ¡

Ac.on: ¡riding ¡bicycle ¡

Everingham, ¡Van ¡Gool, ¡Williams, ¡Winn ¡and ¡Zisserman. ¡ The ¡PASCAL ¡Visual ¡Object ¡Classes ¡(VOC) ¡Challenge. ¡IJCV ¡2010. ¡

20 ¡object ¡classes ¡ ¡22,591 ¡images ¡

SLIDE 7

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Large ¡Scale ¡Visual ¡

Recogni1on ¡Challenge ¡(ILSVRC) ¡2010-‑2014 ¡

20 ¡object ¡classes ¡ ¡22,591 ¡images ¡ ¡ ¡200 ¡object ¡classes ¡ ¡ ¡ ¡ ¡517,840 ¡ ¡images ¡ ¡DET ¡ ¡ ¡1000 ¡object ¡classes ¡ ¡1,431,167 ¡images ¡ ¡CLS-‑LOC ¡ ¡ ¡

Person ¡

h7p://image-‑net.org/challenges/LSVRC/ ¡

Person ¡

Dog ¡

Person ¡Person ¡

SLIDE 8

Variety ¡of ¡object ¡classes ¡in ¡ILSVRC ¡

Olga ¡Russakovsky, ¡Jia ¡Deng, ¡Zhiheng ¡Huang, ¡Alex ¡Berg, ¡Li ¡Fei-‑Fei ¡ Detec1ng ¡avocados ¡to ¡zucchinis: ¡what ¡have ¡we ¡done, ¡and ¡where ¡are ¡we ¡going? ¡ICCV ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡h7p://image-‑net.org/challenges/LSVRC/2012/analysis ¡

SLIDE 9

Variety ¡of ¡object ¡classes ¡in ¡ILSVRC ¡

ILSVRC ¡detec1on ¡

ILSVRC ¡classifica1on ¡and ¡localiza1on ¡

SLIDE 10

Challenge ¡procedure ¡every ¡year ¡

1. Training ¡and ¡valida1on ¡data ¡released: ¡images ¡and ¡

annota1ons ¡

2. Test ¡data ¡released: ¡images ¡only ¡(annota1ons ¡hidden) ¡
3. Par1cipants ¡train ¡their ¡models ¡on ¡train ¡& ¡valida1on ¡data ¡
4. Submit ¡text ¡file ¡with ¡predic1ons ¡on ¡test ¡images ¡
5. We ¡evaluate ¡and ¡release ¡results, ¡and ¡run ¡a ¡workshop ¡

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡

SLIDE 11

Par1cipa1on ¡in ¡ILSVRC ¡over ¡the ¡years ¡

Year ¡ Number ¡of ¡entries ¡

3 ¡years: ¡ 2010-‑2012 ¡ Last ¡year: ¡ 2013 ¡

ILSVRC ¡2010 ¡ ILSVRC ¡2011 ¡ ILSVRC ¡2012 ¡ ILSVRC ¡2013: ¡ 81 ¡entries ¡ 120 ¡ 100 ¡ 80 ¡ 60 ¡ 40 ¡ 20 ¡ 0 ¡

SLIDE 12

Par1cipa1on ¡in ¡ILSVRC ¡over ¡the ¡years ¡

Year ¡ Number ¡of ¡entries ¡

3 ¡years: ¡ 2010-‑2012 ¡ Last ¡year: ¡ 2013 ¡

ILSVRC ¡2010 ¡ ILSVRC ¡2011 ¡ ILSVRC ¡2012 ¡ ILSVRC ¡2013: ¡ 81 ¡entries ¡

This ¡year: ¡ 2014 ¡

120 ¡ 100 ¡ 80 ¡ 60 ¡ 40 ¡ 20 ¡ 0 ¡ ILSVRC ¡2014: ¡ 123 ¡entries ¡

SLIDE 13

Experiment ¡this ¡year: ¡

pen ¡vs ¡closed ¡submissions ¡
Offered ¡all ¡teams ¡an ¡op1on: ¡

– Open ¡= ¡promise ¡to ¡reveal ¡their ¡method ¡ – Closed ¡= ¡par1cipate ¡without ¡revealing ¡the ¡method ¡

Almost ¡all ¡teams ¡chose ¡to ¡be ¡“open” ¡(31/36) ¡

– And ¡2 ¡of ¡the ¡“closed” ¡teams ¡s1ll ¡presented ¡spotlights ¡and ¡ posters ¡at ¡the ¡workshop ¡ ¡

SLIDE 14

ILSVRC ¡in ¡detail: ¡ history ¡and ¡current ¡state-‑of-‑the-‑art ¡

ImageNet ¡Large ¡Scale ¡Visual ¡Recogni.on ¡Challenge ¡ Olga ¡Russakovsky*, ¡Jia ¡Deng*, ¡Hao ¡Su, ¡Jonathan ¡Krause, ¡Sanjeev ¡ Satheesh, ¡Sean ¡Ma, ¡Zhiheng ¡Huang, ¡Andrej ¡Karpathy, ¡Aditya ¡Khosla, ¡ Michael ¡Bernstein, ¡Alexander ¡Berg, ¡Li ¡Fei-‑Fei ¡ ¡ ¡ ¡h7p://arxiv.org/abs/1409.0575 ¡

Describes ¡the ¡construc1on ¡of ¡the ¡ILSVRC ¡datasets ¡
Highlights ¡the ¡most ¡successful ¡algorithms ¡
Provides ¡sta1s1cal ¡analysis ¡of ¡the ¡results ¡through ¡ILSVRC2014 ¡
Compares ¡computer ¡vision ¡accuracy ¡with ¡human-‑level ¡accuracy ¡

SLIDE 15

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni1on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡be7er ¡than ¡computers ¡at ¡recogni1on? ¡

SLIDE 16

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni.on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡be7er ¡than ¡computers ¡at ¡recogni1on? ¡

SLIDE 17

ILSVRC ¡image ¡classifica1on ¡task ¡

Steel ¡drum ¡

SLIDE 18

Output: ¡ Scale ¡ T-‑shirt ¡ Steel ¡drum ¡ Drums1ck ¡ Mud ¡turtle ¡

Steel ¡drum ¡

✔ ¡ ✗ ¡

Output: ¡ Scale ¡ T-‑shirt ¡ Giant ¡panda ¡ Drums1ck ¡ Mud ¡turtle ¡

ILSVRC ¡image ¡classifica1on ¡task ¡

SLIDE 19

Output: ¡ Scale ¡ T-‑shirt ¡ Steel ¡drum ¡ Drums1ck ¡ Mud ¡turtle ¡

Steel ¡drum ¡

✔ ¡ ✗ ¡

Output: ¡ Scale ¡ T-‑shirt ¡ Giant ¡panda ¡ Drums1ck ¡ Mud ¡turtle ¡

Error ¡= ¡ ¡ ¡

Σ ¡

100,000 ¡ images ¡

1[incorrect ¡on ¡image ¡i] ¡

1 ¡ 100,000 ¡

ILSVRC ¡image ¡classifica1on ¡task ¡

SLIDE 20

ILSVRC2014 ¡classifica1on ¡results ¡

Team ¡Name ¡ Error ¡(%) ¡ GoogLeNet ¡ 6.7 ¡ VGG ¡ 7.3 ¡ MSRA ¡Visual ¡compu1ng ¡ 8.1 ¡ Andrew ¡Howard ¡ ¡ 8.1 ¡ DeeperVision ¡ 9.5 ¡ NUS-‑BST ¡ 9.8 ¡ TTIC_ECP ¡– ¡Epitomic ¡Vision ¡ 10.2 ¡ XYZ ¡ 11.2 ¡

Adobe-‑UIUC, ¡BDC-‑I2R-‑UPMC, ¡BREIL_KAIST, ¡Brno ¡ University, ¡CASIA_CRIPAC_Weak_Supervision, ¡Cldi-‑ KAIST, ¡DeepCNet, ¡Fengjun ¡Lv, ¡libccv, ¡MIL, ¡Orange-‑BUPT, ¡ PassBy, ¡SCUT_GLH, ¡SYSU_Vision, ¡Trimps-‑Soushen, ¡UI, ¡ UvA-‑Euvision ¡

VGG: ¡

Karen ¡Simonyan, ¡ Andrew ¡Zisserman ¡ University ¡of ¡Oxford ¡

GoogLeNet: ¡

Chris1an ¡Szegedy, ¡Wei ¡Liu, ¡Yangqing ¡ Jia, ¡Pierre ¡Sermanet, ¡Sco7 ¡Reed, ¡ Drago ¡Anguelov, ¡Dumitru ¡Erhan, ¡ Andrew ¡Rabinovich ¡ Google ¡

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡

SLIDE 21

ILSVRC ¡over ¡the ¡years ¡

0.28 ¡ 0.26 ¡ 0.16 ¡ 0.12 ¡ 0.07 ¡

1.7x ¡reduc1on ¡in ¡ classifica1on ¡error ¡ ¡ since ¡last ¡year ¡ ¡ 4.2x ¡reduc1on ¡in ¡ classifica1on ¡error ¡ since ¡2010 ¡ ¡ ¡ ¡

¡ ¡ ¡h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 22

What ¡changed ¡in ¡ILSVRC ¡classifica1on? ¡

Convolu1on ¡ Pooling ¡ Sovmax ¡ Other ¡

GoogLeNet ¡ VGG ¡ MSRA ¡ SuperVision ¡

[Krizhevsky ¡NIPS ¡2012] ¡

Year ¡2012 ¡ Year ¡2014 ¡ Year ¡2010 ¡

Dense ¡grid ¡descriptor: ¡ HOG, ¡LBP ¡ Coding: ¡local ¡coordinate, ¡ super-‑vector ¡ Pooling, ¡SPM ¡ Linear ¡SVM ¡

NEC-‑UIUC ¡

[Lin ¡CVPR ¡2011] ¡ [Szegedy ¡arxiv ¡2014] ¡ [Simonyan ¡arxiv ¡2014] ¡ [He ¡arxiv ¡2014] ¡

SLIDE 23

Incep1on ¡module ¡ ¡[Going ¡deeper ¡with ¡convolu1ons. ¡Szegedy ¡et ¡al. ¡2014] ¡ ¡ ¡ ¡ ¡

¡ ¡ ¡ ¡ ¡

More ¡layers ¡but ¡with ¡smaller ¡kernels ¡(3x3 ¡convolu1on, ¡2x2 ¡pooling) ¡

¡[Very ¡Deep ¡Convolu1onal ¡Networks. ¡Simonyan ¡and ¡Zisserman. ¡2014] ¡

¡ ¡

1. ¡Networks ¡became ¡deeper ¡

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡

What ¡changed ¡in ¡ILSVRC2014 ¡classifica9on? ¡

SLIDE 24

2. ¡Fully ¡connected ¡layers ¡were ¡

(some1mes) ¡removed ¡

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡

¡Network ¡in ¡Network. ¡Min ¡Lin, ¡Qiang ¡Chen ¡and ¡Shicheng ¡Yan. ¡ICLR ¡2014 ¡ ¡Also ¡used ¡in ¡GoogLeNet ¡and ¡others ¡

What ¡changed ¡in ¡ILSVRC2014 ¡classifica9on? ¡

SLIDE 25

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡

3. ¡Almost ¡all ¡successful ¡systems ¡used ¡
Extensive ¡data ¡augmenta1on ¡
Mul1scale ¡training ¡across ¡more ¡scales ¡
Network ¡fusion ¡

And, ¡most ¡importantly, ¡… ¡

What ¡changed ¡in ¡ILSVRC2014 ¡classifica9on? ¡

SLIDE 26

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡

3. ¡Almost ¡all ¡successful ¡systems ¡used ¡
Extensive ¡data ¡augmenta1on ¡
Mul1scale ¡training ¡across ¡more ¡scales ¡
Network ¡fusion ¡

And, ¡most ¡importantly, ¡… ¡

¡ ¡Caffe! ¡

What ¡changed ¡in ¡ILSVRC2014 ¡classifica9on? ¡

SLIDE 27

Are ¡the ¡winning ¡classifica1on ¡systems ¡ really ¡significantly ¡more ¡accurate? ¡

Year ¡ Team ¡name ¡ Error ¡(percent) ¡ 99.9% ¡confidence ¡interval ¡ 2014 ¡ GoogLeNet ¡ 6.66 ¡ 6.40 ¡-‑ ¡6.92 ¡ 2014 ¡ VGG ¡ 7.32 ¡ 7.05 ¡-‑ ¡7.60 ¡ 2014 ¡ MSRA ¡ 8.06 ¡ 7.78 ¡-‑ ¡8.34 ¡ 2014 ¡ AHoward ¡ 8.11 ¡ 7.83 ¡-‑ ¡8.39 ¡ ¡ … ¡ 2013 ¡ Clarifai ¡ 11.20 ¡ 10.87 ¡-‑ ¡11.53 ¡ … ¡ 2012 ¡ SuperVision ¡ 15.32 ¡ 14.94 ¡-‑ ¡15.69 ¡

¡ ¡ ¡h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 28

ILSVRC ¡classifica1on ¡+ ¡localiza1on ¡task ¡

Steel ¡drum ¡

SLIDE 29

✔ ¡

Folding ¡ chair ¡ Persian ¡ cat ¡ Loud ¡ speaker ¡

Steel ¡ drum ¡

Picket ¡ fence ¡

Output ¡

Steel ¡drum ¡

ILSVRC ¡classifica1on ¡+ ¡localiza1on ¡task ¡

SLIDE 30

✔ ¡

Folding ¡ chair ¡ Persian ¡ cat ¡ Loud ¡ speaker ¡

Steel ¡ drum ¡

Picket ¡ fence ¡

Output ¡

✗ ¡

Folding ¡ chair ¡ Persian ¡ cat ¡ Loud ¡ speaker ¡

Steel ¡ drum ¡

Picket ¡ fence ¡

Output ¡(bad ¡localiza1on) ¡

✗ ¡

Folding ¡ chair ¡ Persian ¡ cat ¡ Loud ¡ speaker ¡ Picket ¡ fence ¡

King ¡ penguin ¡ Output ¡(bad ¡classifica1on) ¡

Steel ¡drum ¡

ILSVRC ¡classifica1on ¡+ ¡localiza1on ¡task ¡

SLIDE 31

✔ ¡

Folding ¡ chair ¡ Persian ¡ cat ¡ Loud ¡ speaker ¡

Steel ¡ drum ¡

Picket ¡ fence ¡

Output ¡

Steel ¡drum ¡

ILSVRC ¡classifica1on ¡+ ¡localiza1on ¡task ¡

Error ¡= ¡ ¡ ¡

Σ ¡

100,000 ¡ images ¡

1[incorrect ¡on ¡image ¡i] ¡

1 ¡ 100,000 ¡

SLIDE 32

ILSVRC2014 ¡localiza1on ¡results ¡

Team ¡Name ¡ Error ¡(%) ¡ VGG ¡ 25.3 ¡ GoogLeNet ¡ 26.4 ¡ Adobe-‑UIUC ¡ 30.1 ¡ SYSU_Vision ¡ 31.9 ¡ MIL ¡ 33.7 ¡ MSRA ¡Visual ¡compu1ng ¡ 35.5 ¡ Trimps-‑Soushen ¡ 42.2 ¡ ORANGE-‑BUPT ¡ 42.7 ¡

Andrew ¡Howard, ¡BDC-‑I2R-‑UPMC, ¡BREIL_KAIST, ¡Brno ¡ University ¡of ¡Technology, ¡Cldi-‑KAIST, ¡DeepCNet, ¡ DeeperVision, ¡Fengjun ¡Lv, ¡libccv, ¡NUS-‑BST, ¡PassBy, ¡ SCUT_GLH, ¡TTIC_ECP ¡– ¡Epitomic ¡Vision, ¡UI, ¡UvA-‑ Euvision, ¡XYZ ¡

VGG: ¡

Karen ¡Simonyan, ¡ Andrew ¡Zisserman ¡ University ¡of ¡Oxford ¡

GoogLeNet: ¡

Chris1an ¡Szegedy, ¡Wei ¡Liu, ¡Yangqing ¡ Jia, ¡Pierre ¡Sermanet, ¡Sco7 ¡Reed, ¡ Drago ¡Anguelov, ¡Dumitru ¡Erhan, ¡ Andrew ¡Rabinovich ¡ Google ¡

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡

SLIDE 33

Fully ¡annotated ¡200 ¡object ¡classes ¡across ¡120,000 ¡images ¡ ¡ Allows ¡evalua1on ¡of ¡generic ¡object ¡detec1on ¡ in ¡clu7ered ¡scenes ¡at ¡scale ¡

Person ¡ Car ¡ Motorcycle ¡ Helmet ¡

ILSVRC ¡object ¡detec1on ¡task ¡

Modeled ¡aver ¡PASCAL ¡VOC ¡

SLIDE 34

ILSVRC ¡object ¡detec1on ¡data ¡

SLIDE 35 Person ¡ Car ¡ Motorcycle ¡ Helmet ¡

Evalua.on ¡modeled ¡a^er ¡PASCAL ¡VOC: ¡ ¡

Algorithm ¡outputs ¡a ¡list ¡of ¡bounding ¡box ¡

detec1ons ¡with ¡confidences ¡ ¡

A ¡detec1on ¡is ¡considered ¡correct ¡if ¡IOU ¡with ¡

ground ¡truth ¡> ¡threshold ¡

Evaluated ¡by ¡average ¡precision ¡per ¡object ¡

class ¡

Winners ¡of ¡challenge ¡is ¡the ¡team ¡that ¡wins ¡

the ¡most ¡object ¡categories ¡

Everingham, ¡Van ¡Gool, ¡Williams, ¡Winn ¡and ¡Zisserman. ¡The ¡PASCAL ¡Visual ¡Object ¡Classes ¡(VOC) ¡Challenge. ¡IJCV ¡2010. ¡

ILSVRC ¡object ¡detec1on ¡task ¡

All ¡instances ¡of ¡all ¡target ¡object ¡classes ¡expected ¡to ¡be ¡localized ¡on ¡all ¡test ¡images ¡

SLIDE 36

Rich ¡feature ¡hierarchies ¡for ¡accurate ¡object ¡detec1on ¡ Ross ¡Girshick, ¡Jeff ¡Donahue, ¡Trevor ¡Darrell, ¡Jitendra ¡Malik ¡ h7p://arxiv.org/abs/1311.2524 ¡

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡

ILSVRC2014 ¡object ¡detec9on ¡approach ¡#1 ¡

R-‑CNN: ¡Regions ¡with ¡CNN ¡features ¡

SLIDE 37

SPP-‑net: ¡Spa1al ¡Pyramid ¡Pooling ¡ ¡

h7p://image-‑net.org/challenges/LSVRC/2014/eccv2014 ¡ Spa1al ¡Pyramid ¡Pooling ¡in ¡Deep ¡Convolu1onal ¡Networks ¡for ¡Visual ¡Recogni1on ¡ Kaiming ¡He, ¡Xiangyu ¡Zhang, ¡ ¡Shaoqing ¡Ren, ¡Jian ¡Sun ¡ h7p://arxiv.org/abs/1406.4729 ¡ ¡ ¡

ILSVRC2014 ¡object ¡detec9on ¡approach ¡#2 ¡

SLIDE 38

ILSVRC ¡detec1on ¡since ¡2013 ¡

n ¡image ¡classifica1on ¡

23% ¡ 44% ¡

1.9x ¡increase ¡in ¡object ¡detec1on ¡ average ¡precision ¡in ¡one ¡year ¡ ¡

¡ ¡ ¡h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 39

ILSVRC ¡detec1on ¡since ¡2013 ¡

n ¡image ¡classifica1on ¡

23% ¡ 44% ¡

1.9x ¡increase ¡in ¡object ¡detec1on ¡ average ¡precision ¡in ¡one ¡year ¡ ¡

¡ ¡ ¡h7p://arxiv.org/abs/1409.0575 ¡

~3% ¡due ¡to ¡more ¡data ¡ ~18% ¡due ¡to ¡be7er ¡methods ¡

SLIDE 40

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni.on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡be7er ¡than ¡computers ¡at ¡recogni1on? ¡

SLIDE 41

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni.on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡be7er ¡than ¡computers ¡at ¡recogni1on? ¡

CNNs ¡are ¡geBng ¡deeper, ¡accuracy ¡is ¡geBng ¡beFer. ¡

Yes! ¡ ¡

SLIDE 42

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni1on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡be7er ¡than ¡computers ¡at ¡recogni1on? ¡

Yes! ¡ ¡ CNNs ¡are ¡geBng ¡deeper, ¡accuracy ¡is ¡geBng ¡beFer. ¡

SLIDE 43

Easiest ¡classifica1on ¡categories ¡

(Highest ¡accuracy ¡in ¡percent ¡achieved ¡by ¡any ¡method ¡in ¡ILSVRC2012-‑ILSVRC2014) ¡ ¡

… ¡and ¡111 ¡more ¡categories ¡with ¡100% ¡accuracy! ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 44

Hardest ¡classifica1on ¡categories ¡

(Highest ¡accuracy ¡in ¡percent ¡achieved ¡by ¡any ¡method ¡in ¡ILSVRC2012-‑ILSVRC2014) ¡ ¡ h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 45

Easiest ¡localiza1on ¡ ¡categories ¡

(Highest ¡accuracy ¡in ¡percent ¡achieved ¡by ¡any ¡method ¡in ¡ILSVRC2012-‑ILSVRC2014) ¡ ¡ h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 46

Hardest ¡localiza1on ¡ ¡categories ¡

(Highest ¡accuracy ¡in ¡percent ¡achieved ¡by ¡any ¡method ¡in ¡ILSVRC2012-‑ILSVRC2014) ¡ ¡ h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 47

Easiest ¡object ¡detec1on ¡categories ¡

(Highest ¡average ¡precision ¡in ¡percent ¡achieved ¡by ¡any ¡method ¡in ¡ILSVRC2013 ¡and ¡ILSVRC2014) ¡ ¡ h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 48

Hardest ¡object ¡detec1on ¡categories ¡

(Highest ¡average ¡precision ¡in ¡percent ¡achieved ¡by ¡any ¡method ¡in ¡ILSVRC2013 ¡and ¡ILSVRC2014) ¡ ¡ h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 49

Smaller ¡objects ¡not ¡necessarily ¡harder ¡

Each ¡dot ¡is ¡an ¡object ¡class ¡
X-‑axis: ¡average ¡frac1on ¡of ¡image ¡area ¡occupied ¡by ¡an ¡instance ¡of ¡that ¡class ¡on ¡the ¡

valida1on ¡set ¡

Y-‑axis: ¡highest ¡accuracy ¡achieved ¡by ¡any ¡method ¡in ¡ILSVRC2012-‑ILSVRC2014 ¡

Le7er ¡opener ¡ Restaurant ¡ Hook ¡ Basketball ¡ Website ¡ Dalma1an ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 50

Smaller ¡objects ¡not ¡necessarily ¡harder ¡

Each ¡dot ¡is ¡an ¡object ¡class ¡
X-‑axis: ¡average ¡frac1on ¡of ¡image ¡area ¡occupied ¡by ¡an ¡instance ¡of ¡that ¡class ¡on ¡the ¡

valida1on ¡set ¡

Y-‑axis: ¡highest ¡accuracy ¡achieved ¡by ¡any ¡method ¡in ¡ILSVRC2012-‑ILSVRC2014 ¡

Le7er ¡opener ¡ Restaurant ¡ Hook ¡ Basketball ¡ Website ¡ Dalma1an ¡ Space ¡bar ¡ Ladle ¡ Restaurant ¡ Spider ¡web ¡ Bearskin ¡ Website ¡ Dalma1an ¡

h7p://arxiv.org/abs/1409.0575 ¡

sofa ¡ lion ¡ basketball ¡ volleyball ¡ rubber ¡eraser ¡

Average ¡scale ¡of ¡object ¡

Average ¡precision ¡ 0 ¡ 0.5 ¡

0 ¡

0.5 ¡

1 ¡

SLIDE 51

Manually ¡annotated ¡object ¡class ¡ proper1es ¡

Olga ¡Russakovsky, ¡Jia ¡Deng, ¡Zhiheng ¡Huang, ¡Alex ¡Berg, ¡Li ¡Fei-‑Fei ¡ Detec1ng ¡avocados ¡to ¡zucchinis: ¡what ¡have ¡we ¡done, ¡and ¡where ¡are ¡we ¡going? ¡ICCV ¡2013 ¡

SLIDE 52

Textured ¡objects ¡are ¡easier ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 53

Textured ¡objects ¡are ¡easier ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 54

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni1on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡be7er ¡than ¡computers ¡at ¡recogni1on? ¡

Yes! ¡ ¡ CNNs ¡are ¡geBng ¡deeper, ¡accuracy ¡is ¡geBng ¡beFer. ¡

SLIDE 55

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni1on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡be7er ¡than ¡computers ¡at ¡recogni1on? ¡

Thin ¡and ¡untextured ¡objects ¡are ¡s9ll ¡hard ¡for ¡computers. ¡ ¡

Yes! ¡ ¡

No. ¡ ¡

CNNs ¡are ¡geBng ¡deeper, ¡accuracy ¡is ¡geBng ¡beFer. ¡

SLIDE 56

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni1on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡beaer ¡than ¡computers ¡at ¡recogni.on? ¡

Thin ¡and ¡untextured ¡objects ¡are ¡s9ll ¡hard ¡for ¡computers. ¡ ¡

Yes! ¡ ¡

No. ¡ ¡

CNNs ¡are ¡geBng ¡deeper, ¡accuracy ¡is ¡geBng ¡beFer. ¡

SLIDE 57

Current ¡crowdsourcing ¡annota1on ¡interface ¡

What ¡is ¡human ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on? ¡

Andrej ¡Karpathy ¡

Is ¡this ¡a ¡badger? ¡Yes ¡or ¡No ¡

But ¡data ¡is ¡manually ¡annotated, ¡isn’t ¡human ¡accuracy ¡100%? ¡

Very ¡different ¡from ¡ ¡ Which ¡one ¡of ¡1000 ¡classes ¡is ¡this? ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 58

New ¡web-‑based ¡annota1on ¡interface ¡ with ¡1000 ¡object ¡classes ¡

h7p://bit.ly/ilsvrclabel ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 59

Compared ¡expert ¡human ¡annotators ¡with ¡winning ¡GoogLeNet ¡entry ¡
Annotator ¡1 ¡achieved ¡be7er ¡accuracy ¡than ¡GoogLeNet ¡by ¡1.7% ¡(p ¡= ¡0.022) ¡
Task ¡required ¡significant ¡amount ¡of ¡training ¡for ¡humans ¡

Annotator ¡1 ¡ Annotator ¡2 ¡ Total ¡number ¡of ¡images ¡ 1500 ¡ 258 ¡ GoogLeNet ¡classifica1on ¡error ¡ 6.8% ¡ 5.8% ¡ Human ¡classifica1on ¡error ¡ 5.1% ¡ 12.0% ¡

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 60

Compared ¡expert ¡human ¡annotators ¡with ¡winning ¡GoogLeNet ¡entry ¡
Annotator ¡1 ¡achieved ¡be7er ¡accuracy ¡than ¡GoogLeNet ¡by ¡1.7% ¡(p ¡= ¡0.022) ¡
Task ¡required ¡significant ¡amount ¡of ¡training ¡for ¡humans ¡

Annotator ¡1 ¡ Annotator ¡2 ¡ Total ¡number ¡of ¡images ¡ 1500 ¡ 258 ¡ GoogLeNet ¡classifica1on ¡error ¡ 6.8% ¡ 5.8% ¡ Human ¡classifica1on ¡error ¡ 5.1% ¡ 12.0% ¡

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 61

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

GoogLeNet ¡correct ¡ GoogLeNet ¡wrong ¡ Human ¡correct ¡ Human ¡wrong ¡

1352/1500 ¡ 72/1500 ¡ 46/1500 ¡ 30/1500 ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 62

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

GoogLeNet ¡correct ¡ GoogLeNet ¡wrong ¡ Human ¡correct ¡ Human ¡wrong ¡

1352/1500 ¡ 72/1500 ¡

Objects ¡very ¡small ¡or ¡thin ¡
Abstract ¡representa1ons ¡
Image ¡filters ¡

46/1500 ¡ 30/1500 ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 63

46/1500 ¡ 30/1500 ¡

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

GoogLeNet ¡correct ¡ GoogLeNet ¡wrong ¡ Human ¡correct ¡ Human ¡wrong ¡

1352/1500 ¡ 72/1500 ¡

Objects ¡very ¡small ¡or ¡thin ¡
Abstract ¡representa1ons ¡
Image ¡filters ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 64

30/1500 ¡

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

GoogLeNet ¡correct ¡ GoogLeNet ¡wrong ¡ Human ¡correct ¡ Human ¡wrong ¡

1352/1500 ¡ 72/1500 ¡ 46/1500 ¡

Fine-‑grained ¡

recogni1on ¡

Class ¡unawareness ¡
Insufficient ¡training ¡

data ¡

Objects ¡very ¡small ¡or ¡thin ¡
Abstract ¡representa1ons ¡
Image ¡filters ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 65

30/1500 ¡

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

GoogLeNet ¡correct ¡ GoogLeNet ¡wrong ¡ Human ¡correct ¡ Human ¡wrong ¡

1352/1500 ¡ 72/1500 ¡ 46/1500 ¡

Fine-‑grained ¡

recogni1on ¡

Class ¡unawareness ¡
Insufficient ¡training ¡

data ¡

Objects ¡very ¡small ¡or ¡thin ¡
Abstract ¡representa1ons ¡
Image ¡filters ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 66

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

GoogLeNet ¡correct ¡ GoogLeNet ¡wrong ¡ Human ¡correct ¡ Human ¡wrong ¡

1352/1500 ¡ 72/1500 ¡ 46/1500 ¡

Fine-‑grained ¡

recogni1on ¡

Class ¡unawareness ¡
Insufficient ¡training ¡

data ¡

Mul1ple ¡objects ¡
Incorrect ¡annota1ons ¡

30/1500 ¡

Objects ¡very ¡small ¡or ¡thin ¡
Abstract ¡representa1ons ¡
Image ¡filters ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 67

Human ¡vs ¡computer ¡accuracy ¡on ¡ ILSVRC2014 ¡classifica1on ¡

GoogLeNet ¡correct ¡ GoogLeNet ¡wrong ¡ Human ¡correct ¡ Human ¡wrong ¡

1352/1500 ¡ 72/1500 ¡ 46/1500 ¡

Fine-‑grained ¡

recogni1on ¡

Class ¡unawareness ¡
Insufficient ¡training ¡

data ¡

Mul1ple ¡objects ¡
Incorrect ¡annota1ons ¡

30/1500 ¡

Objects ¡very ¡small ¡or ¡thin ¡
Abstract ¡representa1ons ¡
Image ¡filters ¡

h7p://arxiv.org/abs/1409.0575 ¡

SLIDE 68

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni1on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡beaer ¡than ¡computers ¡at ¡recogni.on? ¡

Thin ¡and ¡untextured ¡objects ¡are ¡s9ll ¡hard ¡for ¡computers. ¡ ¡

Yes! ¡ ¡

No. ¡ ¡

CNNs ¡are ¡geBng ¡deeper, ¡accuracy ¡is ¡geBng ¡beFer. ¡

SLIDE 69

Some ¡ques1ons ¡for ¡today ¡

1. ¡Are ¡computers ¡good ¡at ¡large-‑scale ¡recogni1on? ¡

¡

2. ¡Are ¡all ¡objects ¡equally ¡easy ¡for ¡computers? ¡

¡

3. ¡Are ¡we ¡beaer ¡than ¡computers ¡at ¡recogni.on? ¡

Thin ¡and ¡untextured ¡objects ¡are ¡s9ll ¡hard ¡for ¡computers. ¡ ¡

Yes! ¡ ¡

No. ¡ ¡

Not ¡always… ¡ ¡

CNNs ¡are ¡geBng ¡deeper, ¡accuracy ¡is ¡geBng ¡beFer. ¡ We ¡are ¡worse ¡than ¡computers ¡at ¡ large-‑scale ¡fine-‑grained ¡classifica9on. ¡

SLIDE 70

So ¡have ¡we ¡solved ¡computer ¡vision? ¡

SLIDE 71

Person ¡ Person ¡ Table ¡ Table ¡ TV ¡ Backpack ¡

So ¡have ¡we ¡solved ¡computer ¡vision? ¡

RCNN ¡output: ¡

SLIDE 72

Male, ¡ brown ¡ hair ¡ Tall ¡ male, ¡ wearing ¡ pants ¡

Wooden ¡table ¡

Wooden ¡ ¡ table ¡ TV ¡facing ¡ away ¡from ¡ camera ¡

Black ¡ backpack ¡

So ¡have ¡we ¡solved ¡computer ¡vision? ¡

SLIDE 73

So ¡have ¡we ¡solved ¡computer ¡vision? ¡

SLIDE 74

So ¡have ¡we ¡solved ¡computer ¡vision? ¡

Not quite yet

SLIDE 75

So ¡have ¡we ¡solved ¡computer ¡vision? ¡

Not quite yet

ImageNet ¡Large ¡Scale ¡Visual ¡Recogni.on ¡Challenge ¡

O. ¡Russakovsky*, ¡J. ¡Deng*, ¡H. ¡Su, ¡J. ¡Krause, ¡S. ¡Satheesh, ¡S. ¡Ma, ¡
Z. ¡Huang, ¡A. ¡Karpathy, ¡A. ¡Khosla, ¡M. ¡Bernstein, ¡A. ¡Berg, ¡L. ¡Fei-‑Fei ¡

h7p://arxiv.org/abs/1409.0575 ¡

But you should still read our paper: