Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. - PowerPoint PPT Presentation

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts.

Two person zero sum games. m × n payoff matrix A . Row mixed strategy: x = ( x 1 ,..., x m ) . Column mixed strategy: y = ( y 1 ,..., y n ) . Payoff for strategy pair ( x , y ) : p ( x , y ) = x t Ay That is, � � � � = ∑ ∑ ∑ ∑ x i a i , j y j x i a i , j y j . i j j i Recall row minimizes, column maximizes. Equilibrium pair: ( x ∗ , y ∗ ) ? ( x ∗ ) t Ay ∗ = max y ( x ∗ ) t Ay = min x x t Ay ∗ . (No better column strategy, no better row strategy.)

Equilibrium. Equilibrium pair: ( x ∗ , y ∗ ) ? p ( x , y ) = ( x ∗ ) t Ay ∗ = max y ( x ∗ ) t Ay = min x x t Ay ∗ . (No better column strategy, no better row strategy.) No row is better: min i A ( i ) · y = ( x ∗ ) t Ay ∗ . 1 No column is better: max j ( A t ) ( j ) · x = ( x ∗ ) t Ay ∗ . 1 A ( i ) is i th row.

Best Response Column goes first: Find y , where best row is not too low.. x ( x t Ay ) . R = max min y Note: x can be ( 0 , 0 ,..., 1 ,... 0 ) . Example: Roshambo. Value of R ? Row goes first: Find x , where best column is not high. y ( x t Ay ) . C = min x max Agin: y of form ( 0 , 0 ,..., 1 ,... 0 ) . Example: Roshambo. Value of C ?

Duality. x ( x t Ay ) . R = max min y y ( x t Ay ) . C = min x max Weak Duality: R ≤ C . Proof: Better to go second. At Equilibrium ( x ∗ , y ∗ ) , payoff v : row payoffs ( Ay ∗ ) all ≥ v = ⇒ R ≥ v . column payoffs ( ( x ∗ ) t A ) all ≤ v = ⇒ v ≥ C . = ⇒ R ≥ C Equilibrium = ⇒ R = C ! Strong Duality: There is an equilibrium point! and R = C ! Doesn’t matter who plays first!

Proof of Equilibrium. Later. Still later... Aproximate equilibrium ... C ( x ) = max y x t Ay R ( y ) = min x x t Ay Always: R ( y ) ≤ C ( x ) Strategy pair: ( x , y ) Equilibrium: ( x , y ) R ( y ) = C ( x ) → C ( x ) − R ( y ) = 0. Approximate Equilibrium: C ( x ) − R ( y ) ≤ ε . With R ( y ) ≤ C ( x ) → “Response y to x is within ε of best response” → “Response x to y is within ε of best response”

Proof of approximate equilibrium. How? (A) Using geometry. (B) Using a fixed point theorem. (C) Using multiplicative weights. (D) By the skin of my teeth. (C) ..and (D). Not hard. Even easy. Still, head scratching happens.

Games and experts Again: find ( x ∗ , y ∗ ) , such that ( max y x ∗ Ay ) − ( min x x ∗ Ay ∗ ) ≤ ε C ( x ∗ ) R ( y ∗ ) ≤ ε − Experts Framework: n Experts, T days, L ∗ -total loss. Multiplicative Weights Method yields loss L where L ≤ ( 1 + ε ) L ∗ + log n ε

Games and Experts. Assume: A has payoffs in [ 0 , 1 ] . For T = log n days: ε 2 1) m pure row strategies are experts. Use multiplicative weights, produce row distribution. Let x t be distribution (row strategy) x t on day t . 2) Each day, adversary plays best column response to x t . Choose column of A that maximizes row’s expected loss. Let y t be indicator vector for this column. Let y ∗ = 1 T ∑ t y t and x ∗ = argmin x t x t Ay t . Claim: ( x ∗ , y ) ∗ are 2 ε -optimal for matrix A . Proof Idea: x t minimizes the best column response is chosen. Clearly good for row. column best response is at least what it is against x t . Total loss, L is at least column payoff. Best row payoff, L ∗ is roughly less than L due to MW anlysis. Combine bounds. Done!

Approximate Equilibrium! Experts: x t is strategy on day t , y t is best column against x t . Let y ∗ = 1 T ∑ t y t and x ∗ = argmin x t x t Ay t . Claim: ( x ∗ , y ) ∗ are 2 ε -optimal for matrix A . Column payoff: C ( x ∗ ) = max y x ∗ Ay . Loss on day t , x t Ay t ≥ C ( x ∗ ) by the choice of x . Thus, algorithm loss, L , is ≥ TC ( x ∗ ) . Best expert: L ∗ - best row against all the columns played. best row against ∑ t Ay t and Ty ∗ = ∑ t y t → best row against TAy ∗ . → L ∗ ≤ TR ( y ∗ ) . Multiplicative Weights: L ≤ ( 1 + ε ) L ∗ + ln n ε TC ( x ∗ ) ≤ ( 1 + ε ) TR ( y ∗ )+ ln n → C ( x ∗ ) ≤ ( 1 + ε ) R ( y ∗ )+ ln n ε T ε → C ( x ∗ ) − R ( y ∗ ) ≤ ε R ( y ∗ )+ ln n ε T . T = ln n ε 2 , R ( y ∗ ) ≤ 1 → C ( x ∗ ) − R ( y ∗ ) ≤ 2 ε .

Approximate Equilibrium: notes! Experts: x t is strategy on day t , y t is best column against x t . Let x ∗ = 1 T ∑ t x t and y ∗ = 1 T ∑ t y t . Claim: ( x ∗ , y ) ∗ are 2 ε -optimal for matrix A . Column payoff: C ( x ∗ ) = max y x ∗ Ay . Let y r be best response to C ( x ∗ ) . Day t , y t best response to x t → x t Ay t ≥ x t Ay r . Algorithm loss: ∑ t x t Ay t ≥ ∑ t x t Ay r L ≥ TC ( x ∗ ) . Best expert: L ∗ - best row against all the columns played. best row against ∑ t Ay t and Ty ∗ = ∑ t y t → best row against TAy ∗ . → L ∗ ≤ TR ( y ∗ ) . Multiplicative Weights: L ≤ ( 1 + ε ) L ∗ + ln n ε TC ( x ∗ ) ≤ ( 1 + ε ) TR ( y ∗ )+ ln n → C ( x ∗ ) ≤ ( 1 + ε ) R ( y ∗ )+ ln n ε T ε → C ( x ∗ ) − R ( y ∗ ) ≤ ε R ( y ∗ )+ ln n ε T . T = ln n ε 2 , R ( y ∗ ) ≤ 1 → C ( x ∗ ) − R ( y ∗ ) ≤ 2 ε .

Comments For any ε , there exists an ε -Approximate Equilibrium. Does an equilibrium exist? Yes. Something about math here? Fixed point theorem. Later: will use geometry, linear programming. Complexity? ε 2 → O ( nm log n T = ln n ε 2 ) . Basically linear! Versus Linear Programming: O ( n 3 m ) Basically quadratic. (Faster linear programming: O ( √ n + m ) linear solution solves.) Still much slower ... and more complicated. Dynamics: best response, update weight, best response. Also works with both using multiplicative weights. “In practice.”

Learning. Learning just a bit. Example: set of labelled points, find hyperplane that separates. − + + − + + + − − − − + Looks hard. Get 1 / 2 on correct side? Easy. Arbitrary line. And Scan. Useless. A bit more than 1 / 2 Weak Learner: Classify ≥ 1 2 + ε points correctly. Not really important but ...

Weak Learner/Strong Learner Input: n labelled points. Weak Learner: produce hypothesis correctly classifies 1 2 + ε fraction Strong Learner: produce hypothesis correctly classifies 1 + µ fraction That’s a really strong learner! produce hypothesis correctly classifies 1 − µ fraction Same thing? Can one use weak learning to produce strong learner? Boosting: use a weak learner to produce strong learner.

Poll. Given a weak learning method (produce ok hypotheses.) produce a great hypothesis. Can we do this? (A) Yes (B) No If yes. How? Multiplicative Weights! The endpoint to a line of research.

Experts Picture

Boosting/MW Framework Experts are points. “Adversary” weak learner. Points want to be misclassified. Learner wants to maximize probability of classifying random point correctly. Strong learner algorithm will come from adversary. Do T = 2 γ 2 log 1 µ rounds 1. Row player: multiplicative weights( 1 − γ ) on points. 2. Column: run weak learner on row distribution. 3. Hypothesis h ( x ) : majority of h 1 ( x ) , h 2 ( x ) ,..., h T ( x ) . Claim: h ( x ) is correct on 1 − µ of the points ! ! ! Cool! Really? Proof?

Some intuition Intuition 1: Each point classified correctly independently in each round with probability 1 2 + ε . After enough rounds, majority rule correct for almost all points. Intuition 2: Say some point classified correctly ≤ 1 / 2 of time. High probability of choosing such point in distribuiont. In limit, whole distribution becomes such point. This subset will be classified correctly with probability 1 / 2 + ε .

Adaboost proof. Claim: h ( x ) is correct on 1 − µ of the points ! ! Let S bad be the set of points where h ( x ) is incorrect. majority of h t ( x ) are wrong for x ∈ S bad . x ∈ S bad is a good expert – loses less than 1 2 the time. T 2 | S bad | W ( T ) ≥ ( 1 − ε ) Each day, weak learner gets ≥ 1 2 + γ payoff. → L t ≥ 1 2 + γ . → W ( T ) ≤ n ( 1 − ε ) L ≤ ne − ε L ≤ ne − ε ( 1 2 + γ ) T Combining | S bad | ( 1 − ε ) T / 2 ≤ W ( T ) ≤ ne ε ( 1 2 + γ T )

Calculation.. | S bad | ( 1 − ε ) T / 2 ≤ ne ε ( 1 2 + γ ) T Set ε = γ , take logs. � � | S bad | + T 2 ln ( 1 − γ ) ≤ − γ T ( 1 ln 2 + γ ) n Again, − γ − γ 2 ≤ ln ( 1 − γ ) , ≤ − γ 2 T � � � � | S bad | | S bad | + T 2 ( − γ − γ 2 ) ≤ − γ T ( 1 ln 2 + γ ) → ln n n 2 And T = 2 γ 2 log 1 µ , � � | S bad | ≤ log µ → | S bad | → ln ≤ µ . n n The misclassified set is at most µ fraction of all the points. The hypothesis correctly classifies 1 − µ of the points ! ! ! Claim: Multiplicative weights: h ( x ) is correct on 1 − µ of the points ! Claim: Weak learning → strong learning! not so weak after all.

Some details... Weak learner learns over distributions of points not points. Make copies of points to simulate distributions. Used often in machine learning.

Example. Set of points on unit ball in d -space. Learner: learns hyperplanes through origin. Can learn if there is a hyperplane, H , that separates all the points. and find 1 2 + ε weighted separating plane. Experts output is average of hyperplanes ...a hyperplane! 1 2 + ε separating hyperplane? Assumption: margin γ . Random hyperplane? Not likely to be exactly normal to H . √ Should get 1 2 + γ / d O ( d log n ) to find separating hyperplane. γ 2 Weak learner: random Wow. That’s weak.

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. - PowerPoint PPT Presentation

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person zero sum games. m n payoff matrix A . Row mixed strategy: x = ( x 1 ,..., x m ) . Column mixed strategy: y = ( y 1 ,..., y n ) . Payoff for

What is the League Today 1 1/23/2017 What is the League Today What is the League Today 2

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2

Lecture 15 Logistics HW4 is due today HW5 posted today HW5 posted today Exam

WIEMANN LAMPHERE ARCHITECTS MONTPELIER TODAY MONTPELIER TODAY PARKING! VEHICLES ARE

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

1. Abertis today 2. 2016 Financial Year 3. Outlook 4. Conclusions Abertis today 2016

Matt Fisher EUA Coordinator Overview of Parramatta today Overview of Parramatta today Overview

Course Business New dataset on CourseWeb: bpd.csv Midterm project due today Today

Featherweight Scala Week 14 January 31 1 Today Previously: Featherweight Java Today:

Stuff New HW on the web later today No lab today Tests graded by Thurs Last Time

Welcome back. Today. Welcome back. Today. Review: Spectral gap, Edge expansion h ( G ) ,

Sorting 15-121 Fall 2020 Margaret Reid-Miller Today Margaret will have office hours today

Exceptions Announcements Exceptions Today's Topic: Handling Errors 4 Today's Topic: Handling

Today and Tomorrow HEARING LOSS TECHNOLOGY TODAY AND TOMORROW Laura E. Plummer, MA, CRC, ATP

Fr From om Aristoteles to A o AI Today Today Prof. of. Nikol ola K a Kasabov abov Fellow

In September of 2018, Brigadier General (retired) Zanetti was asked to be the keynote speaker at a

Plan: 1) The context : Kurt Gdels philosophy and the gdelian studies 2) The ANR project and

How to write SQL queries? Dimitri Fontaine Citus Data @tapoueh P O S T G R E S Q L M A J O R

PostgreSQL for developers Dimitri Fontaine PostgreSQL Major Contributor P O S T G R E S Q L M A

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

Game Theory: Spring 2020 Ulle Endriss Institute for Logic, Language and Computation University

Convergence Problems of General-Sum Multiagent Reinforcement Learning Michael Bowling Carnegie

Homework for lecture slides 4a, 4b, and 4c. 1,0 1 L R Homework 4.1. 0,2 1,0 2 L R