Where have we been? Where are we going?
LI F E I – F EI
Where have we been? Where are we going? LI F E I F EI The - - PowerPoint PPT Presentation
Where have we been? Where are we going? LI F E I F EI The Beginning: CVPR 2009 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, Im a g eNet: A La rg e-Sca le Hiera rchica l Im a g e Da ta b a se. IEEE Com puter Vision and
LI F E I – F EI
IEEE Com puter Vision and Pattern Recognition (CVPR), 2009.
Citations
Citations
W hy Deep Lea rning is Sud d enly Cha ng ing Your Life
By Roger Parloff
The Grea t Artificia l Intellig ence Aw a kening
By Gideon Lew is-Kraus
The d a ta tha t tra nsform ed AI resea rch—a nd p ossib ly the w orld
By Dave Gershgorn
SpaceNet
DigitalGlobe, CosmiQ Works, NVIDIA
ShapeNet
A.Chang et al, 2015
MusicNet
EventNet
Medical Im ageNet
Stanford Radiology, 2017
ActivityNet
Hosted Datasets
Commercial Competitions
Data Scientists
ML Models Submitted
Student Competitions
A L E X A N D E R W I S S N E R - G R O S S Edge.org, 2016
Lotus Hill (20 0 7)
Yao et al, 2007
ESP (20 0 6 )
Ahn et al, 2006
LabelMe (20 0 5)
Russell et al, 2005
MSRC (20 0 6 )
Shotton et al. 2006
CalTech 10 1/ 256 (20 0 5)
Fei-Fei et al, 2004 GriffIn et al, 2007
TinyIm age (20 0 8 )
Torralba et al. 2008
PASCAL (20 0 7)
Everingham et al, 2009
CAVIAR Tracking (20 0 5)
Middlebury Stereo (20 0 2)
UIUC Cars (20 0 4 )
FERET Faces (19 9 8 )
Huang, P. Raus
CMU/ VASC Faces (19 9 8 )
MNIST digits (19 9 8 -10 )
Y LeCun & C. Cortes
COIL Objects (19 9 6 )
3D Textures (20 0 5)
CuRRET Textures (19 9 9 )
KTH hum an action (20 0 4 )
Sign Language (20 0 8 )
Zisserman
Segm entation (20 0 1)
Underfitting Zone Overfitting Zone
Generalization Error Generalization Gap Training Error
Error Capacity Optim al Capacity
Fei-Fei et al, 2003, 2004
Fei-Fei et al, 2003, 2004
Underfitting Zone Overfitting Zone
Generalization Error Generalization Gap Training Error
Error Capacity Optim al Capacity
15,000 Global Data Traffic (PB/ month)
Source: Cisco
11,250 7,500 3,750
Original paper by [George Miller, et al 1990 ] cited over 5,000 times Organizes over 150,000 words into 117,000 categories called synsets. Establishes
lexical relationships in NLP and related tasks.
Senior Research Scholar Computer Science Department, Princeton President, Global WordNet Consortium
Germ an shepherd: breed of large shepherd dogs used in police work and as a guide for the blind. m icrowave: kitchen appliance that cooks food by passing an electromagnetic wave through it. m ountain: a land mass that projects well above its surroundings; higher than a hill. jacket: a short coat
A m a ssiv e ontology of im a ges to tra nsform com p uter v ision Ind iv id ua lly Illustra ted W ord Net Nod es
Princeton Jia Deng 1st Ph.D. student Princeton
Entity Ma m m a l Dog Germ a n Shep herd
Dog Germ a n Shep herd
Dog Germ a n Shep herd
40 ,0 0 0 × 10 ,0 0 0 × 3 / 2 = 60 0 0 ,0 0 0 ,0 0 0 sec ≈ 19 years N
Human-generated datasets transcend algorithmic limitations, leading to better machine perception. Machine-generated datasets can only match the best algorithms of the time.
Per-Object Regions and Labels Russell et al, 2005
Hand-Traced Parse Trees Yao et al, 2007
[Deng et al. ’09]
SUN, 131K
[Xiao et al. ‘10]
LabelMe, 37K
[Russell et al. ’07]
PASCAL VOC, 30K
[Everingham et al. ’06-’12]
Caltech10 1, 9K
[Fei-Fei, Fergus, Perona, ‘03]
To better replicate human visual acuity
To ensure immediate application and a sense of community
To create a benchmarking dataset and advance the state of machine perception, not merely reflect it
Carnivore
Olga Russakovsky Stanford Fei-Fei Li Stanford Alex Berg UNC Chapel Hill Wei Liu UNC Chapel Hill
Eunbyung Park UNC Chapel Hill Sean Ma Stanford Jonathan Krause Stanford Sanjeev Satheesh Stanford Hao Su Stanford Aditya Khosla Stanford Zhiheng Huang Stanford Jia Deng
1973-2012
Alex Berg, Jia Deng, Fei-Fei Li, Wei Liu, Olga Russakovsky
2010
35 29 8 1 123 157 172
2011 2012 2013 2014 2015 2016
Num ber of Entries
2010
35 29 8 1 123 157 172
2011 2012 2013 2014 2015 2016
Num ber of Entries Classification Errors (top-5)
0 .28 0 .0 3
2010
35 29 8 1 123 157 172
2011 2012 2013 2014 2015 2016
Num ber of Entries Classification Errors (top-5)
0 .28 0 .0 3 0 .23 0 .66
Average Precision For Object Detection
Statistics PASCAL VOC 20 12 ILSVRC 20 13 Object classes 20 20 0 Training Images 5.7K 395K Objects 13.6K 345K
25x 10 x 70 x
Table Chair Horse Dog Cat Bird
# classes: 200 # annotations = 80M!
“Ca rd iga n W elsh Corgi” “Pem broke W elsh Corgi”
[Gebru, Krause, Deng, Fei-Fei, CHI 2017]
ImageNet becomes a benchmark Machine learning advances and changes dramatically Breakthroughs in
Krizhevsky, Sutskever & Hinton, NIPS 2012
Citations
[Krizhevsky et al. NIPS 2012]
“AlexNet”
[Szegedy et al. CVPR 2015]
“GoogLeNet”
[Simonyan & Zisserman, ICLR 2015]
“VGG Net”
[He et al. CVPR 2016]
“ResNet”
Neural Nets GPUs
Thing Animalia Chordate Arthropoda Mammal Insect Carnivora Diptera Felidae Muscidae Felis Musca Housefly Domestica Domestica Leo Lion House Cat Primate Pongidae Pan Troglodytes Chimpanzee Hominidae Homo Sapiens Human Marsupial Wombat is a is a is a
W om ba t
Deng, Krause, Berg & Fei-Fei, CVPR 2012
Thing Anim a lia Chordate Arthropoda Ma m m a l Insect Carnivora Diptera Felidae Muscidae Felis Musca Housefly Domestica Domestica Leo Lion House Cat Primate Pongidae Pan Troglodytes Chimpanzee Hominidae Homo Sapiens Human Ma rsup ia l W om ba t is a is a is a
W om ba t
Thing Anim a l Ma m m a l Ma rsup ia l W om ba t
Deng, Krause, Berg & Fei-Fei, CVPR 2012
Thing Anim a lia Chordate Arthropoda Ma m m a l Insect Carnivora Diptera Felidae Muscidae Felis Musca Housefly Domestica Domestica Leo Lion House Cat Primate Pongidae Pan Troglodytes Chimpanzee Hominidae Homo Sapiens Human Ma rsup ia l W om ba t is a is a is a
W om ba t
Deng, Krause, Berg & Fei-Fei, CVPR 2012
Our Mod el
Deng, Krause, Berg & Fei-Fei, CVPR 2012
Kuettel, Guillaumin, Ferrari. Segm entation Propagation in Im ageNet. ECCV 2012 ECCV 2012 Best paper Award
“ First, w e find that the perform ance on vision tasks still increases linearly w ith orders of m agnitude of training data size.”
Andrej Karpathy. http:/ / karpathy.github.io/ 2014/ 09/ 02/ what-i-learned-from-competing-against-a-convnet-on-imagenet/
GoogLeNet
Top-5 error rate
Suscep tible to:
Hum an
Top-5 error rate
Suscep tible to:
Andrej Karpathy. http:/ / karpathy.github.io/ 2014/ 09/ 02/ what-i-learned-from-competing-against-a-convnet-on-imagenet/
p erson p erson p erson p erson p erson sca le room
p erson Sta nd ing on p erson Step p ing on p erson W a tching a nd la ug hing room sca le W a nts to w eig h him self W a nts to p la y a p ra nk
Step p ing on a sca le a d d s w eig ht a nd up s the rea d ing .
Im age credit: http s:/ / w w w .y outub e.com / w a tch?v =ip -KIzQm cBo (Oliver Villar)
Im ageNet: Deng et al. 2009; COCO: Lin et al. 2014
tree ski
jacket boots
snow
sunglasses
vest
pole coat
glove head
building
leaves equipment
bag hat sky
COCO: Lin et al. 2014
Q: What is the man in the center doing? A: Sta nd ing on a ski. Q: What is the color of the sky? A: Blue Q: Where are the pine trees? A: Behind the hill.
<woman, wear, coat> <trees, be, green> <trees, behind, group (of people)> <man, has, jacket> <boots, be, yellow> <lady, hold, skis> “A man standing.” “A clear blue sky at a ski resort.” “A snowy hill is in front of pine trees.” “There are several pine trees.” “A group of people getting ready to ski.”
tree ski
jacket boots
snow
sunglasses
vest
pole coat
glove head
building
leaves equipment
bag hat sky
entire universe of im ages
[Johnson et al., CVPR 2015]
– Objects, verbs, attibutes
– Relationships and contexts
Krishna et al. IJCV 2016
A dataset, a know ledge base, an ongoing effort to connect structural im age concepts to language.
A dataset, a know ledge base, an ongoing effort to connect structural im age concepts to language.
Krishna et al. IJCV 2016 Q: What is the person sitting on the right of the elephant wearing? A: A b lue shirt.
DenseCap & Paragraph Generation
Karpathy et al. CVPR’16 Krause et al. CVPR’17
Relationship Prediction Krishna et al.
ECCV’16
Im age Retrieval w/ Scene Graphs Johnson et al.
CVPR’15 Xu et al. CVPR’17
Visual Q&A
Zhu et al. CVPR’16
A dataset, a know ledge base, an ongoing effort to connect structural im age concepts to language.
Krishna et al. IJCV 2016 Q: What is the person sitting on the right of the elephant wearing? A: A b lue shirt.
DenseCap & Paragraph Generation
Karpathy et al. CVPR’16 Krause et al. CVPR’17
Relationship Prediction Krishna et al.
ECCV’16
Im age Retrieval w/ Scene Graphs Johnson et al.
CVPR’15 Xu et al. CVPR’17
Visual Q&A
Zhu et al. CVPR’16
26 July 2017 | Honolulu, Haw aii in conjunction w ith CVPR 20 17 http :/ / w w w .v ision.ee.ethz.ch/ w ebv ision/ w orkshop .htm l
81
Agency: The integration
understanding and action
Vision Language Understanding Action
reduction of image classification error
improvement of detection precision
We’re passing the baton to Kaggle: a community of more than 1M data scientists. Why? dem ocratizing data is vital to dem ocratizing AI. im age-net.org remains live at Stanford.
ImageNet Object Localization Challenge ImageNet Object Detection Challenge ImageNet Object Detection from Video Challenge
Alex Berg Michael Bernstein Edward Chang Brendan Collins Jia Deng Minh Do Wei Dong Alexei Efros Mark Everingham Christiane Fellbaum Adam Finkelstein Thomas Funkhouser Timnit Gebru Derek Hoiem Zhiheng Huang Andrej Karpathy Aditya Khosla Jonathan Krause Fei-Fei Li Kai Li Li-Jia Li Wei Liu Sean Ma Xiaojuan Ma Jitendra Malik Dan Osherson Eunbyung Park Chuck Rosenberg Olga Russakovksy Sanjeev Satheesh Richard Socher Hao Su Zhe Wang Andrew Zisserman
49k Amazon Mechanical Turk Workers
W I N S T O N C H U R C H I L L