Lecture 1: Introduction 1 5-Jan-15 Lecture 1 - Fei-Fei Li - - PowerPoint PPT Presentation

lecture 1 introduction
SMART_READER_LITE
LIVE PREVIEW

Lecture 1: Introduction 1 5-Jan-15 Lecture 1 - Fei-Fei Li - - PowerPoint PPT Presentation

Lecture 1: Introduction 1 5-Jan-15 Lecture 1 - Fei-Fei Li & Andrej Karpathy Welcome to CS231n 2 5-Jan-15 Lecture 1 - Fei-Fei Li & Andrej Karpathy Biology Biology


slide-1
SLIDE 1

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Lecture 1: Introduction

5-­‑Jan-­‑15 ¡ 1 ¡

slide-2
SLIDE 2

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Welcome to CS231n

5-­‑Jan-­‑15 ¡ 2 ¡

slide-3
SLIDE 3

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Computer Computer Vision ision

Neuroscience Machine learning Speech, NLP

Information retrieval

Mathematics Mathematics Computer Computer Science Science Biology Biology Engineering Engineering Physics Physics

Robotics Cognitive sciences

Psychology Psychology

graphics, algorithms, theory,…

Image processing 5-­‑Jan-­‑15 ¡ 3 ¡

systems, architecture, …

  • ptics
slide-4
SLIDE 4

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Computer Computer Vision ision

Neuroscience Machine learning Speech, NLP

Information retrieval

Mathematics Mathematics Computer Computer Science Science Biology Biology Engineering Engineering Physics Physics

Robotics Cognitive sciences

Psychology Psychology

graphics, algorithms, theory,…

Image processing 5-­‑Jan-­‑15 ¡ 4 ¡

systems, architecture, …

  • ptics
slide-5
SLIDE 5

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Computer Vision courses @ Stanford

  • CS131 (fall, 2014, Prof. Fei-Fei Li):

– Undergraduate introductory class

  • CS231a (this term, Prof. Silvio Savarese)

– Core computer vision class for seniors, masters, and PhDs – Topics include image processing, cameras, 3D reconstruction, segmentation, object recognition, scene understanding

  • CS231n (this ter

CS231n (this term, Pr m, Prof. Fei-Fei Li & Andr

  • f. Fei-Fei Li & Andrej

ej Karpathy Karpathy) )

– Neural network (aka “deep lear Neural network (aka “deep learning”) class on image ning”) class on image classification classification

  • CS231b (spring, 2015, Prof. Fei-Fei Li): Cutting Edge

Computer Vision

– Project-based advanced vision class to prepare students for CV research

  • CS231m (spring, 2015, Prof. Silvio Savarese): Mobile Vision

– Computer vision and computational photography for mobile platform (e.g. Android)

  • And an assortment of CS331 and CS431 for advanced topics

in computer vision

5-­‑Jan-­‑15 ¡ 5 ¡

slide-6
SLIDE 6

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Today’s agenda

  • A brief history of computer vision
  • CS231n overview

5-­‑Jan-­‑15 ¡ 6 ¡

slide-7
SLIDE 7

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 7 ¡

543million years, B.C.

slide-8
SLIDE 8

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Camera Obscura

5-­‑Jan-­‑15 ¡ 8 ¡

Leonardo da Vinci 16th Century, A.D.

slide-9
SLIDE 9

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 9 ¡

Hubel & Wiesel, 1959

slide-10
SLIDE 10

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 10 ¡

Block world

Larry Roberts, 1963

slide-11
SLIDE 11

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 11 ¡

slide-12
SLIDE 12

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 12 ¡

David Marr, 1970s

slide-13
SLIDE 13

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 13 ¡

Stages of Visual Representation, David Marr, 1970s

slide-14
SLIDE 14

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 14 ¡

  • Generalized Cylinder
  • Pictorial Structure

Brooks & Binford, 1979 Fischler and Elschlager, 1973

slide-15
SLIDE 15

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 15 ¡

David Lowe, 1987

slide-16
SLIDE 16

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 16 ¡

Normalized Cut

(Shi & Malik, 1997)

slide-17
SLIDE 17

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 17 ¡

Face Detection, Viola & Jones, 2001

slide-18
SLIDE 18

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 18 ¡

“SIFT” & Object Recognition, David Lowe, 1999

slide-19
SLIDE 19

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 19 ¡

Spatial Pyramid Matching, Lazebnik, Schmid & Ponce, 2006

slide-20
SLIDE 20

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 20 ¡

Histogram of Gradients (HoG) Dalal & Triggs, 2005 Deformable Part Model Felzenswalb, McAllester, Ramanan, 2009

slide-21
SLIDE 21

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 21 ¡

PASCAL V ASCAL Visual Object Challenge isual Object Challenge (20 object categories) (20 object categories)

[Everingham et al. 2006-2012]

2009 2010 2011 2012 0.4 0.5 0.6 0.7 0.8 0.9 1

Average Precision Challenge Year

all aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor

slide-22
SLIDE 22

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 22 ¡

22K ¡categories ¡and ¡14M ¡images ¡

www.image-­‑net.org ¡

Deng, Dong, Socher, Li, Li, & Fei-Fei, 2009

  • Animals ¡
  • Bird ¡
  • Fish ¡
  • Mammal ¡
  • Invertebrate ¡

¡

  • Plants ¡
  • Tree ¡
  • Flower ¡
  • Food ¡
  • Materials ¡
  • Structures ¡
  • ArHfact ¡
  • Tools ¡
  • Appliances ¡
  • Structures ¡
  • Person ¡
  • Scenes ¡
  • Indoor ¡
  • Geological ¡FormaHons ¡
  • Sport ¡AcHviHes ¡

¡

slide-23
SLIDE 23

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 23 ¡

Output: Output: Scale T-shirt Steel drum Drumstick Mud turtle

Steel ¡drum ¡

✔ ¡ ✗ ¡

Output: Output: Scale T-shirt Giant panda Drumstick Mud turtle

The Image Classification Challenge: 1,000 object classes 1,431,167 images

Russakovsky et al. arXiv, 2014

slide-24
SLIDE 24

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 24 ¡

Steel ¡drum ¡

The Image Classification Challenge: 1,000 object classes 1,431,167 images

0.28 ¡ 0.26 ¡ 0.16 ¡ 0.12 ¡ 0.07 ¡

Russakovsky et al. arXiv, 2014

slide-25
SLIDE 25

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Today’s agenda

  • A brief history of computer vision
  • CS231n overview

5-­‑Jan-­‑15 ¡ 25 ¡

slide-26
SLIDE 26

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

CS231n focuses on one of the most important problems of visual recognition – image classification

5-­‑Jan-­‑15 ¡ 26 ¡

slide-27
SLIDE 27

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 27 ¡

slide-28
SLIDE 28

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

There is a number of visual recognition problems that are related to image classification, such as

  • bject detection, image captioning

5-­‑Jan-­‑15 ¡ 28 ¡

slide-29
SLIDE 29

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 29 ¡

  • Object detection
  • Action classification
  • Image captioning
slide-30
SLIDE 30

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Convolutional Neural Network (CNN) has become an important tool for object recognition

5-­‑Jan-­‑15 ¡ 30 ¡

slide-31
SLIDE 31

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

ConvoluHon ¡ Pooling ¡ SoMmax ¡ Other ¡

GoogLeNet VGG MSRA SuperVision

[Krizhevsky NIPS 2012]

Year 2012 ear 2012 Year 2014 ear 2014 Year 2010 ear 2010

Dense ¡grid ¡descriptor: ¡ HOG, ¡LBP ¡ Coding: ¡local ¡coordinate, ¡ super-­‑vector ¡ Pooling, ¡SPM ¡ Linear ¡SVM ¡

NEC-UIUC

[Lin CVPR 2011] [Szegedy arxiv 2014] [Simonyan arxiv 2014] [He arxiv 2014]

5-­‑Jan-­‑15 ¡ 31 ¡

slide-32
SLIDE 32

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Convolutional Neural Network (CNN) is not invented overnight

5-­‑Jan-­‑15 ¡ 32 ¡

slide-33
SLIDE 33

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

1998 2012

LeCun et al. Krizhevsky et al. # of transistors # of pixels used in training # of transistors # of pixels used in training

107 1014 106 109

GPUs 5-­‑Jan-­‑15 ¡ 33 ¡

slide-34
SLIDE 34

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

The quest for visual intelligence goes far beyond object recognition…

5-­‑Jan-­‑15 ¡ 34 ¡

slide-35
SLIDE 35

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 35 ¡

slide-36
SLIDE 36

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 36 ¡

slide-37
SLIDE 37

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Some kind of game or fight. Two groups of two men? The foregound pair looked like one was getting a fist in the face. Outdoors seemed like because i have an impression of grass and maybe lines on the grass? That would be why I think perhaps a game, rough game though, more like rugby than football because they pairs weren't in pads and helmets, though I did get the impression of similar clothing. maybe some trees? in the background. (Subject: SM)

PT = 500ms PT = 500ms

Fei-Fei, Iyer, Koch, Perona, JoV, 2007

5-­‑Jan-­‑15 ¡ 37 ¡

slide-38
SLIDE 38

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 38 ¡

slide-39
SLIDE 39

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Computer V Computer Vision T ision Technology echnology Can Better Our Lives Can Better Our Lives

5-­‑Jan-­‑15 ¡ 39 ¡

slide-40
SLIDE 40

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Who we are

  • Instructors: Prof. Fei-Fei Li & Andrej Karpathy
  • Teaching Assistants

– Justin Johnson, Ph.D. candidate, CS – Yuke Zhu, master candidate, CS – TBA

  • Keeping in touch:

– cs231n-winter1415-staf cs231n-winter1415-staff@lists.stanfor f@lists.stanford.edu d.edu – Piazza Piazza – Twitter witter

5-­‑Jan-­‑15 ¡ 40 ¡

slide-41
SLIDE 41

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Grading policy

  • 3 Problem Sets: 15% x 3 = 45%
  • Midterm Exam: 15%
  • Final Course Project: 40%

– Milestone: 5% – Final write-up: 35% – Bonus points for exceptional poster presentation

  • Late policy

– 7 free late days – use them in your ways – Afterwards, 25% off per day late – Not accepted after 3 late days per PS – Does not apply to Final Course Project

  • Collaboration policy

– Read the student code book, understand what is ‘collaboration’ and what is ‘academic infraction’

5-­‑Jan-­‑15 ¡ 41 ¡

slide-42
SLIDE 42

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Pre-requisite

  • Proficiency in Python, familiarity with C/C++

– All class assignments will be in Python (and use numpy), but some of the deep learning libraries we may look at later in the class are written in C++. – A Python tutorial available on course website

  • College Calculus, Linear Algebra
  • Equivalent knowledge of CS229 (Machine

Learning)

– We will be formulating cost functions, taking derivatives and performing optimization with gradient descent.

5-­‑Jan-­‑15 ¡ 42 ¡

slide-43
SLIDE 43

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

Syllabus

  • Go to website…

http://vision.stanford.edu/teaching/cs231n/index.html

5-­‑Jan-­‑15 ¡ 43 ¡

slide-44
SLIDE 44

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 44 ¡

  • Hubel, David H., and Torsten N. Wiesel. "Receptive fields, binocular interaction and functional

architecture in the cat's visual cortex." The Journal of physiology 160.1 (1962): 106. [PDF]

  • Roberts, Lawrence Gilman. "Machine Perception of Three-dimensional Solids." Diss. Massachusetts

Institute of Technology, 1963. [PDF]

  • Marr, David. "Vision.” The MIT Press, 1982. [PDF]
  • Brooks, Rodney A., and Creiner, Russell and Binford, Thomas O. "The ACRONYM model-based vision
  • system. " In Proceedings of the 6th International Joint Conference on Artificial Intelligence (1979):

105-113. [PDF]

  • Fischler, Martin A., and Robert A. Elschlager. "The representation and matching of pictorial structures."

IEEE Transactions on Computers 22.1 (1973): 67-92. [PDF]

  • Lowe, David G., "Three-dimensional object recognition from single two-dimensional images," Artificial

Intelligence, 31, 3 (1987), pp. 355-395. [PDF]

  • Shi, Jianbo, and Jitendra Malik. "Normalized cuts and image segmentation." Pattern Analysis and

Machine Intelligence, IEEE Transactions on 22.8 (2000): 888-905. [PDF]

  • Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features."

Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. Vol. 1. IEEE, 2001. [PDF]

  • Lowe, David G. "Distinctive image features from scale-invariant keypoints." International Journal of

Computer Vision 60.2 (2004): 91-110. [PDF]

  • Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid

matching for recognizing natural scene categories." Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006. [PDF]

References

slide-45
SLIDE 45

Lecture 1 -

  • Fei-Fei Li & Andrej Karpathy

5-­‑Jan-­‑15 ¡ 45 ¡

  • Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer

Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE,

  • 2005. [PDF]
  • Felzenszwalb, Pedro, David McAllester, and Deva Ramanan. "A discriminatively trained, multiscale,

deformable part model." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008 [PDF]

  • Everingham, Mark, et al. "The pascal visual object classes (VOC) challenge." International Journal of

Computer Vision 88.2 (2010): 303-338. [PDF]

  • Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and Pattern

Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009. [PDF]

  • Russakovsky, Olga, et al. "Imagenet Large Scale Visual Recognition Challenge." arXiv:1409.0575. [PDF]
  • Lin, Yuanqing, et al. "Large-scale image classification: fast feature extraction and SVM training."

Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. [PDF]

  • Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep

convolutional neural networks." Advances in neural information processing systems. 2012. [PDF]

  • Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).

[PDF]

  • Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image

recognition." arXiv preprint arXiv:1409.1556 (2014). [PDF]

  • He, Kaiming, et al. "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition."

arXiv preprint arXiv:1406.4729 (2014). [PDF]

  • LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the

IEEE 86.11 (1998): 2278-2324. [PDF]

  • Fei-Fei, Li, et al. "What do we perceive in a glance of a real-world scene?." Journal of vision 7.1 (2007):
  • 10. [PDF]