Lecture 2 AdaBoost and Cascade Structure (with a case on face - PowerPoint PPT Presentation

Lecture 2 AdaBoost and Cascade Structure (with a case on face detection) Lin ZHANG, PhD School of Software Engineering Tongji University Fall 2020 Lin ZHANG, SSE, 2020

Any faces contained in the image? Who are they? Lin ZHANG, SSE, 2020

Overview • Face recognition problem – Given a still image or video of a scene, identify or verify one or more persons in this scene using a stored database of facial images Lin ZHANG, SSE, 2020

Overview • Face identification Lin ZHANG, SSE, 2020

Overview • Face verification Lin ZHANG, SSE, 2020

Overview • Applications of face detection&recognition Intelligent surveillance Lin ZHANG, SSE, 2020

Overview • Applications of face detection&recognition Hong Kong—Luohu, border control E-channel Lin ZHANG, SSE, 2020

Overview • Applications of face detection&recognition National Stadium, Beijing Olympic Games, 2008 Lin ZHANG, SSE, 2020

Overview • Applications of face detection&recognition Check on work attendance Lin ZHANG, SSE, 2020

Overview • Applications of face detection&recognition Smile detection: embedded in most modern cameras Lin ZHANG, SSE, 2020

Overview • Why is face recognition so difficult? • Intra-class variance and inter-class similarity Images of the same person Lin ZHANG, SSE, 2020

Overview • Why is face recognition so difficult? • Intra-class variance and inter-class similarity Images of twins Lin ZHANG, SSE, 2020

Overview Who are they? Lin ZHANG, SSE, 2020

Overview-General Architecture Lin ZHANG, SSE, 2020

Introduction • Identify and locate human faces in an image regardless of their • Position • Scale • Orientation • pose (out-of-plane rotation) • illumination Lin ZHANG, SSE, 2020

Introduction Where are the faces, if any? Lin ZHANG, SSE, 2020

Introduction • Why face detection is so difficult? Lin ZHANG, SSE, 2020

Introduction • Appearance based methods • Train a classifier using positive (and usually negative) examples of faces • Representation: different appearance based methods may use different representation schemes • Most of the state-of-the-art methods belong to this category The most successful one: Viola-Jones method! VJ is based on AdaBoost classifier Lin ZHANG, SSE, 2020

AdaBoost (Adaptive Boosting) • It is a machine learning algorithm [1] • AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers • The classifiers it uses can be weak, but as long as their performance is slightly better than random they will improve the final model [1] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting", Journal of Computer and System Sciences,1995 Lin ZHANG, SSE, 2020

AdaBoost (Adaptive Boosting) • AdaBoost is an algorithm for constructing a ”strong” classifier as a linear combination of simple weak classifiers, T = ∑ α f x ( ) h x ( ) t t = t 1 • Terminology • h t ( x ) is a weak or basis classifier • H ( x )=sgn( f ( x )) is the final strong classifier Lin ZHANG, SSE, 2020

AdaBoost (Adaptive Boosting) • AdaBoost is an iterative training algorithm, the stopping criterion depends on concrete applications • For each iteration t h x ( ) – A new weak classifier is added based on the current t training set – Modify the weight for each training sample; the weight for h x ( ) the sample being correctly classified by will be t h x ( ) reduced, while the sample being misclassified by will t be increased Lin ZHANG, SSE, 2020

AdaBoost (algorithm for binary classification) Given: y ∈ − + ( x y , ),( x , y ),...,( x , y ) { 1, 1} • Training set , where 1 1 2 2 m m i = D i 1 ( ) 1/ m Initialize weights for samples For t = 1: T Train weak classifiers based on training set and the D t m ∑ [ ] ε = ≠ h D i ( ) h x ( ) y find the best weak classifier with error t t t t i i = ε ≥ i 1 0.5 if , stop; t ( ) ( ) α = − ε ε 0.5ln 1 / set ( ) t t t − α D i ( )exp y h x ( ) = t t i t i D ( ) i update weights for samples + t 1 Denom Outputs the final classifier,   T ∑ = α H x ( ) sgn h x ( )   t t   = t 1 Lin ZHANG, SSE, 2020

AdaBoost—An Example (0.1) 10 training samples  (0.1) (0.1) (0.1) Weak classifiers: vertical or  (0.1) horizontal lines (0.1) (0.1) D 1 Initial weights for samples  = = D i 1 ( ) 0.1, i 1~10 (0.1) (0.1) Three iterations (0.1)  Lin ZHANG, SSE, 2020

AdaBoost—An Example After iteration one (0.1) Get the weak classifier h 1 ( x ) (0.1) (0.1) (0.1) ε = 0.3 1 (0.1) − ε 1 1 ln (0.1) (0.1) α = = D 1 1 0.4236 1 ε 2 1 (0.1) (0.1) (0.1) h 1 ( x ) update weights (0.1667) (0.0714) (0.1667) (0.1667) (0.0714) D 2 (0.0714) (0.0714) (0.0714) (0.0714) (0.0714) Lin ZHANG, SSE, 2020

AdaBoost—An Example After iteration 2 (0.1667) Get the weak classifier h 2 ( x ) (0.0714) (0.1667) ε = (0.1667) 0.2142 2 (0.0714) D 2 − ε 1 1 ln (0.0714) (0.0714) α = = 2 0.6499 2 ε 2 2 (0.0714) (0.0714) (0.0714) h 2 ( x ) update weights (0.1060) (0.0454) (0.1060) (0.1060) (0.1667) D 3 (0.0454) (0.1667) (0.0454) (0.0454) (0.1667) Lin ZHANG, SSE, 2020

AdaBoost—An Example After iteration 3 (0.1060) Get the weak classifier h 3 ( x ) (0.0454) (0.1060) h 3 ( x ) (0.1060) ε = 0.1362 3 (0.1667) D 3 (0.0454) − ε 1 1 ln (0.1667) α = = 3 0.9236 3 ε 2 3 (0.0454) (0.0454) (0.1667) H x = ( ) sgn + 0.6499 + 0.9236 0.4236 Now try to classify the 10 samples using H ( x ) Lin ZHANG, SSE, 2020

Viola-Jones face detection • VJ face detector [1] • Harr-like features are proposed and computed based on integral image ; they act as “weak” classifiers • Strong classifiers are composed of “weak” classifiers by using AdaBoost • Many strong classifiers are combined in a cascade structure which dramatically increases the detection speed [1] P. Viola and M.J. Jones, “Robust real-time face detection", IJCV, 2004 Lin ZHANG, SSE, 2020

Harr features • Compute the difference between the sums of pixels within two (or more) rectangular regions Example Harr features shown relative to the enclosing face detection window Lin ZHANG, SSE, 2020

Harr features • Integral image • The integral image at location ( x, y ) contains the sum of all the pixels above and to the left of x , y , inclusive: = ∑ ' ' ii x y ( , ) i x y ( , ) ≤ ≤ x ' x y , ' y where i ( x , y ) is the original image • By the following recurrence, the integral image can be computed in one pass over the original image = − + s x y ( , ) s x y ( , 1) i x y ( , ) = − + ii x y ( , ) ii x ( 1, ) y s x y ( , ) where s ( x , y ) is the cumulative row sum, s ( x , -1) = 0, and ii (-1, y ) = 0 Lin ZHANG, SSE, 2020

Harr features • Haar feature can be efficiently computed by using integral image B A x 1 x 2 D C x 3 x 4 original image i ( x , y ) integral image ii ( x , y ) Actually, = ii ( x ) A 1 = + = + − − ii ( x ) A B D ii ( x ) ii ( x ) ii ( x ) ii ( x ) 2 4 1 2 3 = + ii ( x ) A C 3 = + + + ii ( x ) A B C D 4 Lin ZHANG, SSE, 2020

Harr features • Haar feature can be efficiently computed by using integral image x 2 x 3 x 2 x 3 x 1 x 1 B A x 4 x 5 x 6 x 4 x 6 x 5 original image i ( x , y ) integral image ii ( x , y ) How to calculate A-B in integral image? How? Lin ZHANG, SSE, 2020

Harr features • Given a detection window, tens of thousands of Harr features can be computed • One Harr feature is a weak classifier to decide whether the underlying detection window contains face < θ  1, pf x ( ) p = − h x f p t ( , , , ) 1, otherwise  where x is the detection window, f defines how to compute the Harr feature on window x , p is 1 or -1 to make the inequalities have a θ unified direction, is a threshold θ • f can be determined in advance; by contrast, p and are determined by training, such that the minimum number of examples are misclassified Lin ZHANG, SSE, 2020

Harr features The first and second best Harr features. The first feature measures the difference in intensity between the region of the eyes and a region across the upper cheeks. The feature capitalizes on the observation that the eye region is often darker than the cheeks. The second feature compares the intensities in the eye regions to the intensity across the bridge of the nose. Lin ZHANG, SSE, 2020

From weak learner to stronger learner • Any single Harr feature (thresholded single feature) is quite weak on deciding whether the underlying detection window contains face or not • Many Harr features (weak learners) can be combined into a strong learner by using Adaboost • However, the most straightforward technique for improving detection performance, adding more features to the classifier, directly increases computation cost Construct a cascade classifier Lin ZHANG, SSE, 2020

Lecture 2 AdaBoost and Cascade Structure (with a case on face - PowerPoint PPT Presentation

Lecture 2 AdaBoost and Cascade Structure (with a case on face detection) Lin ZHANG, PhD School of Software Engineering Tongji University Fall 2020 Lin ZHANG, SSE, 2020 Any faces contained in the image? Who are they? Lin ZHANG, SSE, 2020

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

The Cascade High Productivity Language The Cascade High Productivity Language Brad Chamberlain

Boosting under high noise. Adaboost is sensitive to label noise Letter / Irvine Database

Lecture 30 Ratio, Feed Forward, Cascade Control Process Control Prof. Kannan M. Moudgalya IIT

Study of Study of T ricyclic ricyclic Cascade Netw Cascade Networks using orks using Dynamic

Tree Recursion Announcements Order of Recursive Calls The Cascade Function (Demo) Each

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University Last time Bias/Variance

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

Lab #10: Demonstration of AdaBoost CS109A Introduction to Data Science Pavlos Protopapas, Kevin

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

AdaBoost MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON Elie Kawerk Data Scientist

Cascade and Chinook Journey History and Future Chinook Gas Export Csar Palagi Walker Ridge

Cascade-Correlation and Deep Learning Scott E. Fahlman Professor Emeritus Language Technologies

Results of Cascade Road Diet: Background Before data set was collected on 1/18/18, and

2008 Voters approve PCC Bond $374,000,000 total $60,265,000 allocated to Cascade Campus

Update on Cascade Care procurement activities and proposed timeline Senior Citizens

An Empirical Study of Optimization for Maximizing Diffusion in Networks Kiyan Ahmadizadeh

CSCI 3210: Computational Game Theory Cascading Behavior in Networks Ref: [AGT] Ch 24 Mohammad

Introduction to Web Design & Computer Principles CSS CSCI-UA 4 Cascading Style Sheets

Not the usual suspects: patient engagement and the HIV care cascade Malika Sharma, MD FRCPC MEd

Online social networks OSNwebsitesarepopular,e.g.,Flickr,Facebook,Orkut

(from Chapter 5 4 th edition of the text Chapter 4 5 th edition) Review: Locations for CSS

State of Art Techniques in Digital to Analog Converter Design Dr. Rahmi Hezar Senior Member of

On the well-posedness of cascades of analytic nonlinear input-output systems driven by noise

Lecture 2 AdaBoost and Cascade Structure (with a case on face - PowerPoint PPT Presentation

Lecture 2 AdaBoost and Cascade Structure (with a case on face detection) Lin ZHANG, PhD School of Software Engineering Tongji University Fall 2020 Lin ZHANG, SSE, 2020 Any faces contained in the image? Who are they? Lin ZHANG, SSE, 2020

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

The Cascade High Productivity Language The Cascade High Productivity Language Brad Chamberlain

Boosting under high noise. Adaboost is sensitive to label noise Letter / Irvine Database

Lecture 30 Ratio, Feed Forward, Cascade Control Process Control Prof. Kannan M. Moudgalya IIT

Study of Study of T ricyclic ricyclic Cascade Netw Cascade Networks using orks using Dynamic

Tree Recursion Announcements Order of Recursive Calls The Cascade Function (Demo) Each

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University Last time Bias/Variance

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

Lab #10: Demonstration of AdaBoost CS109A Introduction to Data Science Pavlos Protopapas, Kevin

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

AdaBoost MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON Elie Kawerk Data Scientist

Cascade and Chinook Journey History and Future Chinook Gas Export Csar Palagi Walker Ridge

Cascade-Correlation and Deep Learning Scott E. Fahlman Professor Emeritus Language Technologies

Results of Cascade Road Diet: Background Before data set was collected on 1/18/18, and

2008 Voters approve PCC Bond $374,000,000 total $60,265,000 allocated to Cascade Campus

Update on Cascade Care procurement activities and proposed timeline Senior Citizens

An Empirical Study of Optimization for Maximizing Diffusion in Networks Kiyan Ahmadizadeh

CSCI 3210: Computational Game Theory Cascading Behavior in Networks Ref: [AGT] Ch 24 Mohammad

Introduction to Web Design &amp; Computer Principles CSS CSCI-UA 4 Cascading Style Sheets

Not the usual suspects: patient engagement and the HIV care cascade Malika Sharma, MD FRCPC MEd

Online social networks OSNwebsitesarepopular,e.g.,Flickr,Facebook,Orkut

(from Chapter 5 4 th edition of the text Chapter 4 5 th edition) Review: Locations for CSS

State of Art Techniques in Digital to Analog Converter Design Dr. Rahmi Hezar Senior Member of

On the well-posedness of cascades of analytic nonlinear input-output systems driven by noise

Introduction to Web Design & Computer Principles CSS CSCI-UA 4 Cascading Style Sheets