CS286r Presentation James Burns March 7, 2006 Calibrated Learning - PDF document

CS286r Presentation James Burns March 7, 2006 • Calibrated Learning and Correlated Equi- librium – by Dean Foster and Rakesh Vohra • Regret in On-Line Decision Problem – by Dean Foster and Rakesh Vohra

Outline • Correlated Equlibria • Forecasts and Calibration • Calibration and Correlated Equilibria • Loss and Regret • Existence of a no regret forecasting scheme • Further results and discussion

Correlated Equilibria • Motivation – Difficult to find learning rules that guar- antee convergence to NE – CE are easy to compute – Consistent with Bayesian perspective (Au- mann 87) – CEs can pareto dominate NEs- relevant? • Drawback – Problem of multiplicity of equilibrium is worse!

Forecasts • f ( t ) = { p 1 ( t ) , · · · , p n ( t ) } • p j ( t ) is forecasted probability that event j occurs at time t • N ( p, t ) be the number of times that f gen- erates the forecast p up to time t

Calibration • Let χ ( j, t ) = 1 if event j occurs at time t • We now define ρ ( p, j, t ) as the empirical frequency of action j given the forecast p  0 N ( p, t ) = 0 if  ρ ( p, j, t ) = I f ( s )= p χ ( j,s ) � t otherwise s =1  N ( p,t ) • For the forecasting scheme to be calibrated we require: | ρ ( p, j, t ) − p j | N ( p, t ) � lim = 0 t t →∞ p

Example: Forecasting the Weather • Pick a forecasting scheme to predict whether it will rain or not • f ( t ) = p ( t ) is forecasted probability that it will rain at time t • N ( p, t ) be the number of times that f ( t ) = p up to the time t • ρ ( p, t ) is the frequency with which it rained given that it was forecasted to rain with probability p For the forecasting scheme to be calibrated we require: p | ρ ( p, t ) − p | N ( p,t ) • lim t →∞ = 0 � t

How does fictitious play fit in? • Fictitious play is a particular forecast scheme that requires the forecast to be equal to an agent’s prior updated by the unconditioned empirical frequency of events • This means that if the forecast converges, we have t p j ( t ) → 1 � χ ( j, s ) t s =1 where χ ( j, s ) = 1 if event j occurs at time t • In fictitious play forecasts converge to empirical frequencies, whereas calibration requires that forecasts converge to empirical frequencies conditioned on the forecasts .

Calibrated Forecasts and Correlated Equilib- rium • Consider a two player game G . We can characterize a CE in the set of all CE of the game G, π ( G ), by the induced joint distribution over the agents strategy sets S (1) × S (2). • We denote this joint distribution by D ( x, y ). Further, let D t ( x, y ) be the empirical frequency that ( x, y ) is played up to time t .

• Theorem 1 (VF, 97) If each player uses a forecast that is calibrated against the oth- ers sequence of plays, and then makes a best response to this forecast, then, min x ∈ S (1) ,y ∈ S (2) | D t ( x, y ) − D ( x, y ) | → 0 max D ∈ π ( G ) • Important assumption: players use a de- terministic tie breaking rule in making best responses. • What does this actually claim?

Outline of Proof • D t ( x, y ) lies in the ( nm − 1)-dimensional unit simplex-hence closed and bounded • BW theorem implies that D t ( x, y ) has a convergent subsequence D t i ( x, y ) • Let D ∗ be the limit of D t i ( x, y ), we show D ∗ is a Correlated Equilibrium • Basic Argument: Show that the vector whose yth component is D ∗ ( x, y ) / � c ∈ S (2) D ∗ ( x, c ) is in the set of mixtures over S (2) for which x is a best response. This will hold because the forecasting rule is calibrated.

• Missing! If theorem does not hold there must be a sequence D t j ( x, y ) such that | D t j ( x, y ) − D ( x, y ) | > ǫ for some ǫ and all t . However, this subsequence must have itself a convergent subsequence that, from above, must converge to a CE, contradicting our assumption.

Calibration and CE continued • Theorem 2 (VF, 97) For almost every game the set of distributions which calibrated learning rules can converge to is identical to the set of correlated equilibrium. – Proof is constructive – Is this theorem useful? what can it re- ally tell us?

• Theorem 3 (VF, 97) There exists a randomized forecast that player 1 can use such that no matter what learning rule player 2 uses, player 1 will be calibrated. – Proof does give algorithm for construct- ing a randomized forecast rule that is calibrated, but not intuitive. – Based on a regret-measure. – Each step in procedure requires com- puting an invariant vector of increasing size

• We consider an ODP in which an agent incurs a loss in every period as a function of the decision made and the state of the world in that period. The objective of the agent is to minimize the total loss incurred. e.g. guessing a sequence of 0s and 1s.

Loss • Notation – Let D = { d 1 , d 2 , · · · , d n } set of possible decisions at time t – L j t ≤ 1 loss incurred at time t from tak- ing action j – We represent a decision making scheme S by the probability vectors w t where w j t the probability that decision j is chosen at time t . • Define L(S), the expected loss from using scheme S over T periods T w j t L j � � t t =1 d j ∈ D

Regret • We now compare the loss under the scheme S with the loss that would have been incurred had a different scheme been used. • In particular, we consider the change in loss from replacing an action d j with another action d i . • Given a scheme S that uses decision d j in period t with probability w t j , define the pair- wise regret of switching from decision d j to d i as T R j → i w j ( t )( L j t − L i � ( S ) = t ) T t =1

• Define the regret incurred by S from using decision d j up to time T as � + R j R j → i � � T ( S ) = ( S ) T i ∈ D � + = max { 0 , R j → i R j → i � where ( S ) ( S ) } T T • Define the regret from using S as R j � R T ( S ) = T ( S ) j ∈ D • We say that the scheme S has the no in- ternal regret property if its expected regret is small R T ( S ) = o ( T )

Existence of a No-Regret Scheme • Proof for case where | D | = 2 • We have defined � + ( R i → j � � � R T ( S ) = ( S ) T i ∈ D j ∈ D � + = � + = 0 � � R 0 → 0 R 1 → 1 • But ( S ) ( S ) T T • Goal: to show that the time average of � + and � + go to zero. � � R 1 → 0 R 0 → 1 ( S ) ( S ) T T

• Define the following game – Agent chooses between strategy ”0” and strategy ”1” in each period – Payoffs are vectors with the payoff for using strategy ”0” in period t is ( L 0 t − L 1 t , 0) and (0 , L 1 t − L 0 t ) for using strategy ”1” • Suppose that the agent follows a scheme that chooses strategy ”0” with probability w t , then the time averaged payoffs at round T are �� T t =1 w t ( L 0 t − L 1 � T t =1 (1 − w t )( L 1 t − L 0 � t ) t ) , T T . • Note that we have defined the payoffs such that the time averaged payoffs are equal to ( R 1 → 0 ( S ) /T, R 0 → 1 ( S ) /T ) as defined above. T T

• Blackwells Approachability Theorem: A convex set is approachable iff every tangent hyperplane of G is approachable. • Our target set is nonpositive orthant- that is we want R 1 → 0 ( S ) /T ≤ 0 T and R 0 → 1 ( S ) /T ≤ 0 T • If the payoff vector is not in the nonpositive orthant then we consider the line separating the payoff vector from the target set. The line l is given by � + x + � + y = 0 � R 0 → 1 � R 1 → 0 ( S ) ( S ) T T

• The agent must choose ”0” with probability p such the expected payoff vector L 0 T +1 − L 1 L 1 T +1 − L 0 � � � � �� p , (1 − p ) T +1 T +1 lies on the line l . • This requires: � + p � + (1 − p ) R 0 → 1 L 0 T +1 − L 1 R 1 → 0 L 1 T +1 − L 0 � � � � � � ( S ) + ( S ) = 0 T T +1 T T +1 � + R 0 → 1 � ( S ) T • Which yields: p = ( S ) ] + [ R 1 → 0 ( S ) + ] − [ R 0 → 1 T T • Not what is in paper!

• We have solved for p that will in expecta- tion be on the line separating the payoff vector from the target set. From Black- well’s Theorem the target set is approachable. We have found a no-regret scheme. • This result can be generalized to D > 2 but will require solving a system of equations.

Further Results: • The existence of a no-regret scheme implies the existence of an almost calibrated forecast scheme • If all agents in a game play a no-regret strategy, play will converge to correlated equilibrium.

Further Reading • A Simple Adaptive Procedure Leading to Correlated Equilibrium - Hart and Mas-Colell 2000 • A General Class of Adaptive Strategies- Hart and MasColell 2001 • A General Class of No-Regret Learning Al- gorithms and Game-Theoretic Equilibria- Greenwald, Jafari and Marks

CS286r Presentation James Burns March 7, 2006 Calibrated Learning - PDF document

CS286r Presentation James Burns March 7, 2006 Calibrated Learning and Correlated Equi- librium by Dean Foster and Rakesh Vohra Regret in On-Line Decision Problem by Dean Foster and Rakesh Vohra Outline Correlated Equlibria

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r

On-demand radio imaging On-demand radio imaging access to calibrated data for all astronomers

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University

PLURISUBHARMONICITY and PSEUDOCONVEXITY IN CALIBRATED (and other) GEOMETRIES with REESE HARVEY

Bayesian Games Yiling Chen October 1, 2008 CS286r Fall08 Bayesian Games 1 So far Up to

Social Media and Social Influence Nihar Shah Peter Tu # cs286r 7 November 2012 Nihar Shah

Crowdsourcing Contests Ruggiero Cavallo Microsoft Research NYC CS286r: November 5, 2012 What is

2006 Group Business Strategy 2006 Group Business Strategy Group Business Strategy 2006 2006

within drug substance synthesis Dr Michael Burns Senior Scientist michael.burns@lhasalimited.org

The High Court's decision in Burns v Corbett The effect of Burns v Corbett Which tribunals are

2006 OPERATING BUDGETS 2006 OPERATING BUDGETS MARCH 6, 2006 PRESENTATION MARCH 6, 2006

James James Author: James, Brother Of Jesus Date Written: 62 Recipient: Jewish Believers

Second Quarter Second Quarter Second Quarter 2006 2006 2006 Earnings Earnings Earnings

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Juicing Up Ye Olde GPU Monte Carlo Code Richard Hayden, Andrey Zhezherun, Oleg Rasskazov ( JP

The Comparison of Information Structures in Games: Bayes Correlated Equilibrium and Individual

After the FBAR Overhaul: Foreign Account Reporting Enforcement Preparing for IRS Exams, Potential

Pay-to-Play Presentation Stan Mitchell, March 30, 2019 League of Independent Voters

FALSE CLAIMS ACT OVERVIEW Enacted during the Civil War in 1863 To fight procurement

PAYROLL TAXES Prepared by JFO and LC 2 Federal Payroll Taxes FICA (Federal Insurance

FY 20-21 BUDGET ADDRESS MAYOR WENDELL LYNCH MAY 1, 2020 COVID-19 Impact Globally 3,251,925

Small Business ss Relief: Answering ng Your ur Questions ns Featuring Fe Josh Caron

CS286r Presentation James Burns March 7, 2006 Calibrated Learning - PDF document

CS286r Presentation James Burns March 7, 2006 Calibrated Learning and Correlated Equi- librium by Dean Foster and Rakesh Vohra Regret in On-Line Decision Problem by Dean Foster and Rakesh Vohra Outline Correlated Equlibria

Nash Q-Learning for General-Sum Stochastic Games Hu &amp; Wellman March 6th, 2006 CS286r

On-demand radio imaging On-demand radio imaging access to calibrated data for all astronomers

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University

PLURISUBHARMONICITY and PSEUDOCONVEXITY IN CALIBRATED (and other) GEOMETRIES with REESE HARVEY

Bayesian Games Yiling Chen October 1, 2008 CS286r Fall08 Bayesian Games 1 So far Up to

Social Media and Social Influence Nihar Shah Peter Tu # cs286r 7 November 2012 Nihar Shah

Crowdsourcing Contests Ruggiero Cavallo Microsoft Research NYC CS286r: November 5, 2012 What is

2006 Group Business Strategy 2006 Group Business Strategy Group Business Strategy 2006 2006

within drug substance synthesis Dr Michael Burns Senior Scientist michael.burns@lhasalimited.org

The High Court's decision in Burns v Corbett The effect of Burns v Corbett Which tribunals are

2006 OPERATING BUDGETS 2006 OPERATING BUDGETS MARCH 6, 2006 PRESENTATION MARCH 6, 2006

James James Author: James, Brother Of Jesus Date Written: 62 Recipient: Jewish Believers

Second Quarter Second Quarter Second Quarter 2006 2006 2006 Earnings Earnings Earnings

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Juicing Up Ye Olde GPU Monte Carlo Code Richard Hayden, Andrey Zhezherun, Oleg Rasskazov ( JP

The Comparison of Information Structures in Games: Bayes Correlated Equilibrium and Individual

After the FBAR Overhaul: Foreign Account Reporting Enforcement Preparing for IRS Exams, Potential

Pay-to-Play Presentation Stan Mitchell, March 30, 2019 League of Independent Voters

FALSE CLAIMS ACT OVERVIEW Enacted during the Civil War in 1863 To fight procurement

PAYROLL TAXES Prepared by JFO and LC 2 Federal Payroll Taxes FICA (Federal Insurance

FY 20-21 BUDGET ADDRESS MAYOR WENDELL LYNCH MAY 1, 2020 COVID-19 Impact Globally 3,251,925

Small Business ss Relief: Answering ng Your ur Questions ns Featuring Fe Josh Caron

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r