Connecting the dots with common sense and linear models L eon - PowerPoint PPT Presentation

Connecting the dots with common sense and linear models L´ eon Bottou NEC Labs America COS 424 – 2/4/2010

Introduction Useful things: – understanding probabilities, – understanding statistical learning theory, – knowing countless statistical procedures, – knowing countless machine learning algorithms. Essential things: – applying common sense, – paying attention to details, – being able to setup experiments, – and to measure the outcome of experiments, – and to measure plenty of other things, L´ eon Bottou 2/45 COS 424 – 2/4/2010

Connecting the dots Question: Find y given x . x y 0.31 1.87 0.25 1.84 3.78 2.23 3.30 3.04 3.83 2.68 -3.29 0.01 -0.90 0.37 -3.61 0.37 0.64 2.05 -0.34 0.96 . . . L´ eon Bottou 3/45 COS 424 – 2/4/2010

Connecting the dots Question: Answer: Find y given x . Connect the dots. Read the curve. x y 5 0.31 1.87 4 0.25 1.84 3.78 2.23 3 3.30 3.04 3.83 2.68 2 -3.29 0.01 -0.90 0.37 1 -3.61 0.37 0 0.64 2.05 -0.34 0.96 −1 -3.53 -0.35 1.63 3.18 −2 −4 −3 −2 −1 0 1 2 3 4 . . . . . . . . . L´ eon Bottou 4/45 COS 424 – 2/4/2010

Connecting the dots – take two Question: Find y given x . [ x ] 1 [ x ] 2 [ x ] 3 [ x ] 4 [ x ] 5 [ x ] 6 [ x ] 7 [ x ] 8 . . . [ x ] 13 , 123 [ x ] 13 , 124 [ x ] 13 , 125 y 0.39 0.50 5.84 -4.36 -0.01 7.20 -7.40 -7.16 . . . -5.48 0.77 5.03 5.46 7.34 1.92 -5.66 -5.33 -6.15 -3.14 4.53 6.37 . . . -2.30 6.45 5.10 5.18 2.27 4.57 4.18 -6.07 -5.47 -6.97 2.67 -3.93 . . . 2.77 7.46 4.84 6.97 1.09 -2.17 -6.38 5.66 -2.65 -2.81 -0.69 2.76 . . . 0.42 5.88 0.29 -7.13 2.85 1.79 6.22 1.34 -1.83 3.01 3.99 -1.75 . . . 0.03 1.55 -3.32 -5.42 -5.67 2.53 -3.47 -0.46 3.21 -2.73 6.65 -0.77 . . . -1.41 -3.93 3.14 5.37 3.80 -0.00 1.89 3.24 2.30 -1.45 7.63 -2.12 . . . 6.47 2.04 3.58 -4.96 7.54 2.47 6.39 4.95 -2.51 -6.46 0.49 -0.61 . . . 5.10 1.90 1.79 3.20 -7.99 4.93 -2.13 -7.11 -5.10 2.13 6.31 7.00 . . . 1.71 -2.35 -7.87 -4.70 -6.80 7.33 -0.99 4.17 -7.81 -7.64 4.01 -3.37 . . . 7.29 -2.41 7.66 -6.70 -0.78 5.34 -5.94 -1.76 3.79 2.92 0.75 7.04 . . . -3.87 -1.46 -3.37 -3.66 7.54 2.47 6.39 4.95 -2.51 -6.46 0.49 -0.61 . . . 5.10 1.90 1.79 3.20 -7.99 4.93 -2.13 -7.11 -5.10 2.13 6.31 7.00 . . . 1.71 -2.35 -7.87 -4.70 -6.80 7.33 -0.99 4.17 -7.81 -7.64 4.01 -3.37 . . . 7.29 -2.41 7.66 -6.70 . . . . . . . . . . . . . . . . . . Idea: (1) understand how we do the 2D case. (2) generalize ! L´ eon Bottou 5/45 COS 424 – 2/4/2010

A Simple Linear Model Polynomial: f ( x ) = w 0 + w 1 x + w 2 x 2 + · · · + w n x n Slight generalization:     φ 0 ( x ) φ 0 ( x ) φ 1 ( x ) φ 1 ( x )     Φ( x ) = f ( x ) = [ w 0 , w 1 , . . . , w n ] × x − → − →     · · · · · ·     φ n ( x ) φ n ( x ) Equivalently: f ( x ) = w ⊤ Φ( x ) Lets choose a basis Φ and use the data to determine w . L´ eon Bottou 6/45 COS 424 – 2/4/2010

Linear Least Squares Input : x i w ⊤ Φ( x i ) Output : Desired Output : y i y i − w ⊤ Φ( x i ) Difference : n � 2 � y i − w ⊤ Φ( x i ) � Minimize : C ( w ) = i =1 Quadratic convex function in w . The minimum exists and is unique. But it could be reached for multiple values of w . L´ eon Bottou 7/45 COS 424 – 2/4/2010

A little bit of Linear Algebra n dC Φ( x i ) ⊤ = 0 � y i − w ⊤ Φ( x i ) � � At the optimum, dw = 2 i =1 Therefore we must solve the system of equations :     n n � Φ( x i )Φ( x i ) ⊤ �  × w = y i Φ( x i )    i =1 i =1 ( X ⊤ X ) w = ( X ⊤ Y ) Shorthand form : L´ eon Bottou 8/45 COS 424 – 2/4/2010

Singularities w = ( X ⊤ X ) − 1 ( X ⊤ Y ) . Almost the same as You should never solve a system by inverting a matrix. Who said X ⊤ X is invertible? Consider the case where φ 1 ( x ) = φ 8 ( x ) – the matrix X ⊤ X is singular. – but the minimum is unchanged. – the minimum is reached by many w , as long as w 1 + w 8 remains constant. Among the w that minimize C ( w ) , compute the one with the smallest norm. L´ eon Bottou 9/45 COS 424 – 2/4/2010

Numerical Procedures Diagonalization of X ⊤ X w = Q ⊤ D + Q X ⊤ Y Q ⊤ D Q w = X ⊤ Y = ⇐ Traditional methods: SVD or QR decomposition of X V D U ⊤ U D V ⊤ w = V D U ⊤ Y w = V D + U ⊤ Y = ⇐ R ⊤ Q ⊤ Q R w = R ⊤ Q ⊤ Y R w = Q ⊤ Y = ⇐ and solve using back-substitution. Simple and Fast: Regularization + Cholevsky min C ( w ) + εw 2 ( X ⊤ X + εI ) w = ( X ⊤ Y ) ⇐ ⇒ U U ⊤ w = ( X ⊤ Y ) ⇐ ⇒ and solve using two rounds of back-substitution. L´ eon Bottou 10/45 COS 424 – 2/4/2010

Polynomial degree 1 Φ( x ) = 1 , x Polynomial d=1 5 4 3 2 1 0 −1 −2 −6 −4 −2 0 2 4 6 L´ eon Bottou 11/45 COS 424 – 2/4/2010

Polynomial degree 2 Φ( x ) = 1 , x, x 2 Polynomial d=2 5 4 3 2 1 0 −1 −2 −6 −4 −2 0 2 4 6 L´ eon Bottou 12/45 COS 424 – 2/4/2010

Polynomial degree 3 Φ( x ) = 1 , x, x 2 , x 3 Polynomial d=3 5 4 3 2 1 0 −1 −2 −6 −4 −2 0 2 4 6 L´ eon Bottou 13/45 COS 424 – 2/4/2010

Polynomial degree 6 Φ( x ) = 1 , x, x 2 , x 3 , x 4 , x 5 , x 6 Polynomial d=6 5 4 3 2 1 0 −1 −2 −6 −4 −2 0 2 4 6 L´ eon Bottou 14/45 COS 424 – 2/4/2010

Polynomial degree 9 Φ( x ) = 1 , x, x 2 , . . . , x 9 Polynomial d=9 5 4 3 2 1 0 −1 −2 −6 −4 −2 0 2 4 6 L´ eon Bottou 15/45 COS 424 – 2/4/2010

Polynomial Basis Polynomial basis 5 4 3 2 1 0 −1 −2 −6 −4 −2 0 2 4 6 Polynomials of the form x k quickly become very steep. There are much better polynomial bases : e.g. Chebyshev, Hermite, . . . L´ eon Bottou 18/45 COS 424 – 2/4/2010

Mean squared error for polynomial models 100000 Training MSE Training set MSE: True MSE 10000 n 1 1000 ( y i − ˆ f ( x i )) 2 � n 100 i =1 10 True MSE: 1 � +4 1 σ 2 f ( x )) 2 dx true +( f true ( x ) − ˆ 0.1 8 − 4 0.01 0 5 10 15 20 polynomial degree Is MSE a good measure of the error ? Why integrating on [ − 4 , +4] ? L´ eon Bottou 19/45 COS 424 – 2/4/2010

About Error Measures Domain – should be related to the input data distribution. Metric – Uniform metric: L ∞ – Averaged with a L p norm, e.g. MSE. Derivatives – Very close functions can have very different derivatives. – Sobolev metrics. Integrals – Conversely, very close functions always have very close integrals. L´ eon Bottou 20/45 COS 424 – 2/4/2010

Piecewise Linear Basis Piecewise linear (hinges) Choose knots r 1 . . . r k 5 4 φ 0 ( x ) = 1 3 φ 1 ( x ) = x 2 φ 2 ( x ) = max(0 , x − r 1 ) 1 . . . 0 φ j ( x ) = max(0 , x − r j − 1 ) −1 −2 −6 −4 −2 0 2 4 6 L´ eon Bottou 21/45 COS 424 – 2/4/2010

Piecewise Linear Models Piecewise linear with 2 knots Piecewise linear with 3 knots Piecewise linear with 4 knots 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 L´ eon Bottou 22/45 COS 424 – 2/4/2010

Piecewise Linear Models Piecewise linear with 5 knots Piecewise linear with 9 knots Piecewise linear with 18 knots 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 L´ eon Bottou 23/45 COS 424 – 2/4/2010

MSE for Piecewise Linear Models 1000 Training MSE Training set MSE: True MSE 100 n 1 ( y i − ˆ f ( x i )) 2 � n 10 i =1 1 True MSE: � +4 1 σ 2 f ( x )) 2 dx true +( f true ( x ) − ˆ 0.1 8 − 4 0.01 0 5 10 15 20 number of knots L´ eon Bottou 24/45 COS 424 – 2/4/2010

Piecewise Linear Variants Counting the dimensions - Linear functions on K + 1 segments: 2 K + 2 parameters. - Continuity constraints: K constraints. - Other constraints: 0 (hinges), 1 (ramps), 2 (triangles). Piecewise linear (ramps) Piecewise linear (triangles) 5 5 4 4 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 Ramps Triangles dim (Φ) = K + 1 dim (Φ) = K L´ eon Bottou 25/45 COS 424 – 2/4/2010

Piecewise Linear Variants Piecewise ramps with 6 knots Piecewise triangles with 7 knots 5 5 4 4 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 L´ eon Bottou 26/45 COS 424 – 2/4/2010

Piecewise Polynomial (Splines) Piecewise quadratic 5 4 3 2 1 0 −1 −2 −6 −4 −2 0 2 4 6 – Quadratic splines : Φ( x ) = 1 , x, x 2 , . . . max(0 , x − r k ) 2 . . . Φ( x ) = 1 , x, x 2 , x 3 , . . . max(0 , x − r k ) 3 . . . – Cubic splines : L´ eon Bottou 27/45 COS 424 – 2/4/2010

Connecting the dots with common sense and linear models L eon - PowerPoint PPT Presentation

Connecting the dots with common sense and linear models L eon Bottou NEC Labs America COS 424 2/4/2010 Introduction Useful things: understanding probabilities, understanding statistical learning theory, knowing countless

Connecting the Dots: Connecting the Dots: Black Lives Matter, Connecting the Dots: COVID-19,

CONNECTING THE DOTS Current Events Forecast Revenue Estimate 1 11/17/2016 CONNECTING THE

Topic II.2: Connecting the Dots Discrete Topics in Data Mining Universitt des Saarlandes,

4/3/2014 Function, Literacy, and the Common Core Connecting the Dots The dots The Sensory

DOTS Signal Channel and Data Channel drafts Interim Meeting

QUANTUM DOTS Presented by Abhisek Banerjee Bishan Mukherjee Somaditya Indu Suman Roy 1 What

Quantum Dots Quantum dots are extremely small semiconductor structures, usually ranging from 2-

DOTS LINES & SHAPES Shuyi, Kaie, Yuting DOTS Definition, Properties, Examples Definition

CAMI - Exploding Dots 6.28.18 BMCC F306 An interesting machine... The 12 Machine: Lets

DOTS Server(s) Discovery https://tools.ietf.org/html/draft-boucadair-dots-server-discovery

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

CONNECTING THE DOTS: MAKING SENSE OF PARANEOPLASTIC SYNDROMES JP MCGHIE, MEDICAL ONCOLOGIST; BC

When the plain sense of Scripture makes common sense, make no other sense, therefore take every

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Connecting the Dots: Major New England Energy Initiatives Katie Dykes Deputy Commissioner for

Consensus measures generated by weighted Kemeny distances on linear orders e Luis GARC David

Parallel Homotopy Algorithms to Solve Polynomial Systems Jan Verschelde Department of Math, Stat

QUERYING AND MINING QUERYING AND MINING DATA STREAMS Elena Ikonomovska Joef Stefan Institute

STAT 339 Evaluating a Classifier 3 February 2017 Colin Reimer Dawson Questions/Administrative

The Second-Moment Method Will Perkins January 28, 2013 Markovs Inequality Recall Markovs

Quantum Chebyshevs Inequality and Applications Yassine Hamoudi, Frdric Magniez IRIF ,

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 1

The Probabilistic Method Week 8: Second Moment Method Joshua Brody CS49/Math59 Fall 2015

Connecting the dots with common sense and linear models L eon - PowerPoint PPT Presentation

Connecting the dots with common sense and linear models L eon Bottou NEC Labs America COS 424 2/4/2010 Introduction Useful things: understanding probabilities, understanding statistical learning theory, knowing countless

Connecting the Dots: Connecting the Dots: Black Lives Matter, Connecting the Dots: COVID-19,

CONNECTING THE DOTS Current Events Forecast Revenue Estimate 1 11/17/2016 CONNECTING THE

Topic II.2: Connecting the Dots Discrete Topics in Data Mining Universitt des Saarlandes,

4/3/2014 Function, Literacy, and the Common Core Connecting the Dots The dots The Sensory

DOTS Signal Channel and Data Channel drafts Interim Meeting

QUANTUM DOTS Presented by Abhisek Banerjee Bishan Mukherjee Somaditya Indu Suman Roy 1 What

Quantum Dots Quantum dots are extremely small semiconductor structures, usually ranging from 2-

DOTS LINES &amp; SHAPES Shuyi, Kaie, Yuting DOTS Definition, Properties, Examples Definition

CAMI - Exploding Dots 6.28.18 BMCC F306 An interesting machine... The 12 Machine: Lets

DOTS Server(s) Discovery https://tools.ietf.org/html/draft-boucadair-dots-server-discovery

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

CONNECTING THE DOTS: MAKING SENSE OF PARANEOPLASTIC SYNDROMES JP MCGHIE, MEDICAL ONCOLOGIST; BC

When the plain sense of Scripture makes common sense, make no other sense, therefore take every

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Connecting the Dots: Major New England Energy Initiatives Katie Dykes Deputy Commissioner for

Consensus measures generated by weighted Kemeny distances on linear orders e Luis GARC David

Parallel Homotopy Algorithms to Solve Polynomial Systems Jan Verschelde Department of Math, Stat

QUERYING AND MINING QUERYING AND MINING DATA STREAMS Elena Ikonomovska Joef Stefan Institute

STAT 339 Evaluating a Classifier 3 February 2017 Colin Reimer Dawson Questions/Administrative

The Second-Moment Method Will Perkins January 28, 2013 Markovs Inequality Recall Markovs

Quantum Chebyshevs Inequality and Applications Yassine Hamoudi, Frdric Magniez IRIF ,

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 1

The Probabilistic Method Week 8: Second Moment Method Joshua Brody CS49/Math59 Fall 2015

DOTS LINES & SHAPES Shuyi, Kaie, Yuting DOTS Definition, Properties, Examples Definition