Computer vision and machine learning at Adelaide Chunhua Shen - PowerPoint PPT Presentation

Computer vision and machine learning at Adelaide Chunhua Shen Australian Centre for Robotic Vision; and School of Computer Science, The University of Adelaide

Australian Centre for Visual Technologies • Largest computer vision centre at Australia, with ~70 staff and PhD students, including: • 4 full professors • 7 tenure-track/tenured staff   • Main hub of two major Gov. projects:   •ARC Centre of Excellence for Robotic Vision ($20M, 7 yrs) •Data to Decisions CRC Centre ($25M, 5 yrs)

My team at Adelaide: 20+ PhD students and Postdoc researchers (4 more joining in 2015) www.cs.adelaide.edu.au/~chhshen 3

http://tinyurl.com/pjhx8dc PhD scholarships available too!

Glenelg beach: 9km from UofA

Henley beach: 9.7km from UofA

Brighton beach: 15km from UofA

UofA in CBD UofA is right in CBD top 10 most liveable cities 2014 1. Melbourne, Australia 2. Vienna, Austria 3. Vancouver, Canada 4. Toronto, Canada 5. Adelaide, Australia

Acknowledgements: most of the hard work was done by my (ex-) students and postdocs. Credit goes to them. Among many others, in particular I’d mention: • Guosheng Lin (2011~present, now postdoc) • Fayo Liu (2011~present, PhD student) • Yao Li (2013~present, PhD student) • Lingqiao Liu (2010~present, now postdoc) • Sakrapee Paul Paisitkriangkrai (2006~2015; departed) • Peng Wang (2008~present, now postdoc)

Agenda 1. What we did: boosting, sdp, etc.   2. What we are doing: •deep learning •structured output learning •deep structured output learning   3. Future work

Boosting Boosting builds a very accurate classifier by combining rough and only moderately accurate classifiers. Boosting procedures Given a set of labeled training examples On each round The booster devises a distribution (importance) over the 1 example set The booster requests a weak hypothesis/classifier/learner 2 with low error Upon convergence,the booster combine the weak hypothesis into a single prediction rule.

Why boosting works Let H be a class of base classifier H = { h j ( · ) : X → R } , j = 1 · · · N , a boosting algorithm seeks for a convex combination: N X F ( w ) = w j h j ( x ) j =1 Statistical view [Friedman et al. 2000], maximum margin [Schapire et al. 1998], still there are open questions [Mease & Wyner 2008] The Lagrange dual problems of AdaBoost, LogitBoost and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems [Shen & Li 2010 TPAMI]

A duality view of boosting Explicitly find a meaningful Lagrange dual for some boosting algorithms Dual of AdaBoost The Lagrange dual of AdaBoost is a Shannon entropy maximization problem: reg . in dual z }| { M M r X X > , u ≥ 0 , 1 > u = 1 . max u i log u i , s . t . y i u i H i ≤ − r 1 T − r, u i =1 i =1 Here H i = [ H i 1 ...H iN ] denotes i -th row of H , which constitutes the output of all weak classifiers on x i .

A duality view of boosting Primal of AdaBoost (Note the auxiliary variables z i , i = 1 , · · · ) M ! X min log exp z i , w i =1 s . t . z i = − y i H i w ( ∀ i = 1 , · · · , M ) , > w = 1 w ≥ 0 , 1 T . Dual of boosting algorithms are entropy regularized LPBoost. algorithm loss in primal entropy reg. LPBoost in dual adaboost exponential loss Shannon entropy logitboost logistic loss binary relative entropy soft-margin ` p ( p > 1) LPBoost generalized hinge loss Tsallis entropy

Average margin vs. margin variance Why AdaBoost just works? Theorem: AdaBoost approximately maximizes the average margin and at the same time minimizes the variance of the margin distribution under the assumption that the margin follows a Gaussian distribution. Proof: See [Shen & Li 2010 TPAMI]. Main tools used: 1 Central limit theorem; 2 Monte Carlo integral.

Average margin vs. margin variance What this theorem tells us: 1 We should focus on optimizing the overall margin distribution. Almost all previous work on boosting has focused on a large minimum margin. 2 Answered an open question in [Reyzin & Schapire 2006], [Mease & Wyner 2008] 3 We can design new boosting algorithm to directly maximize the average margin and minimize the margin variance [Shen & Li, 2010 TNN]

Margin distribution boosting 2 σ 2 , s . t . w ≥ 0 , 1 > w = T. ρ − 1 max ¯ w It is equivalent to > A ρ − 1 > ρ , 1 min 2 ρ w , ρ > w = T, s . t . w ≥ 0 , 1 ρ i = y i H i w , ∀ i = 1 , · · · , M. Its dual is M X > A � 1 ( u − 1) , s . t ., > . min r, u r + 1 / (2 T )( u − 1) y i u i H i ≤ r 1 i =1

Fully corrective boosting for regularised risk minimisation 1 A general framework that can be used to designed new boosting algorithms. 2 The proposed boosting framework, termed CGBoost, can accommodate various loss functions and di ff erent regularizers in a totally-corrective optimization way.

Boosting via column generation 1 Samples’ margins γ and weak classifiers’ clipped edges d + are dual to each other. 2 ` p regularization in primal corresponds to ` q regularization in dual with 1 /p + 1 /q = 1. Primal Dual min P m min P m i =1 � ∗ ( � u i ) + r k d + k ∞ i =1 � ( � i ) + ⌫ k w k 1 ` 1 min P m i =1 � ( � i ) + ⌫ k w k 2 min P m i =1 � ∗ ( � u i ) + r k d + k 2 ` 2 2 2 min P m min P m i =1 � ∗ ( � u i ) + r k d + k 1 i =1 � ( � i ) + ⌫ k w k ∞ ` ∞ k d + k q : loss in dual � ( γ ): loss in primal k w k p : regularization in primal � ∗ ( u ): regularization in dual

Boosting via column generation A Dual Working set Violated constraint selection � M ) = argmax i =1 u i y i h ( x i ) . h ( · ) Primal Optimization KKT Dual Primal variable u variable w

Boosting via column generation • We now have a general framework for designing fully- corrective boosting methods, to minimise arbitrary : • convex loss + convex regularisation • It converges faster with on par test accuracy compared with conventional stage-wise boosting (such as AdaBoost, logistic boosting) Refs: TPAMI2010, TNN2010, NN2013

Applications of this general boosting framework # 1: Cascade classifiers (1) standard cascade (2) multi-exit cascade. Only those classified as true detection by all nodes will be true targets. h j , h j +1 , · · · h 1 , h 2 , · · · · · · , h n − 1 , h n input target T T T T 1 2 N F F F h j , h j +1 , · · · · · · , h n − 1 , h n h 1 , h 2 , · · · input target T T T T 1 2 N F F F

Boosting for node classifier learning Biased Minimax Probability Machines:  � x 1 ⇠ ( µ 1 , Σ 1 ) Pr { w > x 1 ≥ b } w ,b, γ γ s . t . max inf ≥ γ ,  � x 2 ⇠ ( µ 2 , Σ 2 ) Pr { w > x 2 ≤ b } inf ≥ γ 0 . Let’s consider a special case: γ 0 = 0 . 5: The 2nd class will have a classification accuracy around 50%. Refs: ECCV2010, IJCV2013

# 2: Direct approach to Multi-class boosting; sharing features in multi-class boosting We generalize this idea to the entire training set and introduce slack variables ξ to enable soft-margin. The primal problem that we want to optimize can then be written as m + ⌫ k W k 1 , 2 X ξ i + ν || W || 1 min W, ξ i =1 s . t . δ r,y i + H i : w y i ≥ 1 + H i : w r − ξ i , ∀ i, r, W ≥ 0 . Here ν > 0 is the regularization parameter. Refs: CVPR2011, CVPR2013

# 3: Structured output boosting Natural Language Parsing Given a sequence of words x , predict the parse tree y. Dependencies from structural constraints, since y has to be a tree. y S NP VP x The dog chased the cat NP Det N V Det N

Structured SVM Original SVM Problem • Exponential constraints • Most are dominated by a small set of “important” constraints Structural SVM Approach • Repeatedly finds the next most violated constraint… • …until set of constraints is a good approximation. This is so-called the “cutting plane” method

Structured Boosting • The discriminant function we want to learn: is F : X ⇥ Y 7! R , structured weak learner input-output pairs. > Ψ ( x , y ) = P F ( x , y ; w ) = w j w j j ( x , y ) , P with w � 0 . As in other structured learning models, the process for predicting a structured output (or inference) is to find an output y that maximizes the joint compatibility function: y ? = argmax > Ψ ( x , y ) . F ( x , y ; w ) = argmax w y y

Structured Boosting Primal: > w + C > ξ min (3a) w � 0 , ξ � 0 1 m 1  � > s . t . : w Ψ ( x i , y i ) � Ψ ( x i , y ) � ∆ ( y i , y ) � ⇠ i , 8 i = 1 , . . . , m ; and 8 y 2 Y . (3b) • Exponentially many variables and constraints • More challenging than structured SVM and boosting

Structured Boosting • Let’s put aside the difficulty of many constraints   in the primal, and using the CG framework to design boosting Dual: X max µ ( i, y ) ∆ ( y i , y ) µ � 0 i, y s . t . : P i, y µ ( i, y ) δ Ψ i ( y )  1 , y µ ( i, y )  C 0  P m , 8 i = 1 , . . . , m.

Computer vision and machine learning at Adelaide Chunhua Shen - PowerPoint PPT Presentation

Computer vision and machine learning at Adelaide Chunhua Shen Australian Centre for Robotic Vision; and School of Computer Science, The University of Adelaide Australian Centre for Visual Technologies Largest computer vision centre at

Style at Hotel Grand Chancellor Adelaide on Hindley Elegance at Hotel Grand Chancellor Adelaide

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

The Adelaide Experiment: Would you like an iPad with that? Sim on Pyke , Karin Barovich & Bob

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Vision, Language, Interaction and Generation Qi Wu Australian Institute for Machine Learning

New Royal Adelaide Hospital JACK SNELLING Treasurer New Royal Adelaide Hospital (new RAH) PPP

Level 1 Telephone (08) 8223 8000 157 Grenfell Street International +618 8223 8000 Adelaide

Wiluna Uranium Project Process Development AusIMM Adelaide AusIMM Adelaide 17 June 2010 17 June

ADELAIDE, SOUTH AUSTRALIA REGIONALDEVELOPMENT THROUGHPROCUREMENT 1 Office of the Industry

August 2012 DISCLAIMER This presentation has been prepared by Adelaide Managed Funds Limited

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Recommended readings Recent review articles with an extensive collection of literature M.

2 u + f ( x, u ) = 0 where x R 2 , subject to u = 0 on f ( x, u )

Assembly Language Assembler translates the assembly language source into binary instructions in

From Problem solving to Piloting in 5 days presented by Francisco Jimnez, Phillippa Rose and Dr

ISLE

Beyond 'One Size Fits All' A tiered model for digital preservation Open Repositories 2013 Umar

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov

Exploring models Categorical data R.W. Oldford 1974 Motor trend magazine data Recall the R data

Computer vision and machine learning at Adelaide Chunhua Shen - PowerPoint PPT Presentation

Computer vision and machine learning at Adelaide Chunhua Shen Australian Centre for Robotic Vision; and School of Computer Science, The University of Adelaide Australian Centre for Visual Technologies Largest computer vision centre at

Style at Hotel Grand Chancellor Adelaide on Hindley Elegance at Hotel Grand Chancellor Adelaide

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

The Adelaide Experiment: Would you like an iPad with that? Sim on Pyke , Karin Barovich &amp; Bob

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Vision, Language, Interaction and Generation Qi Wu Australian Institute for Machine Learning

New Royal Adelaide Hospital JACK SNELLING Treasurer New Royal Adelaide Hospital (new RAH) PPP

Level 1 Telephone (08) 8223 8000 157 Grenfell Street International +618 8223 8000 Adelaide

Wiluna Uranium Project Process Development AusIMM Adelaide AusIMM Adelaide 17 June 2010 17 June

ADELAIDE, SOUTH AUSTRALIA REGIONALDEVELOPMENT THROUGHPROCUREMENT 1 Office of the Industry

August 2012 DISCLAIMER This presentation has been prepared by Adelaide Managed Funds Limited

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Recommended readings Recent review articles with an extensive collection of literature M.

2 u + f ( x, u ) = 0 where x R 2 , subject to u = 0 on f ( x, u )

Assembly Language Assembler translates the assembly language source into binary instructions in

From Problem solving to Piloting in 5 days presented by Francisco Jimnez, Phillippa Rose and Dr

ISLE

Beyond 'One Size Fits All' A tiered model for digital preservation Open Repositories 2013 Umar

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov

Exploring models Categorical data R.W. Oldford 1974 Motor trend magazine data Recall the R data

The Adelaide Experiment: Would you like an iPad with that? Sim on Pyke , Karin Barovich & Bob