Game Theoretic Learning for Verification and Control Sanjit A. - PowerPoint PPT Presentation

Game ‐ Theoretic Learning for Verification and Control Sanjit A. Seshia Professor EECS, UC Berkeley Joint work with Dorsa Sadigh, Jon Kotker, Daniel Bundala, Anca Dragan, Alexander Rakhlin, S. Shankar Sastry Dagstuhl Seminar March 16, 2017

Two Stories: 1 Control, 1 Verification Control: Human Cyber-Physical Systems (e.g. autonomous/semi-autonomous driving) Learning (Synthesizing) Models of Human Behavior Verification: Timing Analysis of Embedded Software Learning (Synthesizing) Model of Platform (how platform impacts a program’s timing behavior) S. A. Seshia 2

Challenge: Interactions with Humans and Human ‐ Controlled Systems outside the Vehicle “One of the biggest challenges facing automated cars is blending them into a world in which humans don’t behave by the book .” S. A. Seshia 3

How can we make an autonomous vehicle behave/ communicate “naturally” with (possibly adversarial) humans in its environment?

Interaction ‐ Aware Control • D. Sadigh, S. Sastry, S. Seshia, A. Dragan. Information Gathering Actions over Internal Human State. In IROS, 2016. • D. Sadigh, S. Sastry, S. Seshia, A. Dragan. Planning for Autonomous Cars that Leverages Effects on Human Actions. In RSS, 2016. S. A. Seshia 5

Interaction as a Dynamical System direct control over � � indirect control over � � Model the problem as a Stackelberg Game . Robot moves first. 8

Assumptions/Simplifications Model Predictive (Receding Horizon) Control: Plan for short time horizon N, replan at every step t. Assume deterministic “rational” human model, human optimizes reward function which is a linear combination of “features”. Human has full access to � � for the short time horizon. ∗ � � � � � � � � � 9

Interaction as a Dynamical System ∗ � argmax ∗ �� , � � �� , � � , � � Find optimal actions for the autonomous vehicle while accounting for ∗ as optimizing ∗ . Model � � the human response � � the human reward function � � . ∗ � � , � � � argmax � � � � � � �� , � � , � � � 10

Learning (Human) Driver Models Learn Human’s reward function based on Inverse Reinforcement Learning [Ziebart et al, AAAI’08; Levine & Koltun, 2012] . Assume structure of human reward function: � , � � � , � � � � � � � , � � � � � ⏉ �� , � � � (a) Features for the (b) Feature for staying (c) Features for avoiding boundaries of the road inside the lanes. other vehicles. B. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning. In AAAI, 2008. S. Levine, V. Koltun. Continuous inverse optimal control with locally optimal examples. arXiv , 2012. 11

Solution of Nested Optimization ∗ � argmax ∗ ��, � � �� , � � , � � � � , � � � � � � , � � � � �, � � , � � � � � �� Gradient-Based Method (Quasi- Newton) : (solve using L-BFGS technique) ∗ � � � ∗ � � � � � � � � ∗ �, � � � argmax � � � � � � ��, � � , � � � � � , � � � � � � , � � � � �, � � , � � � � � ��

Implication: Efficiency obot uman

Implication: Efficiency

Implication: Coordination

Human crossing First y of Human Vehicle Human crossing Second x of Autonomous Vehicle

Summary • Model control problem as Stackelberg Game • Data ‐ driven approach to learning model of human as rational agent maximizing their reward function – Next Steps: more realistic human model (“bounded rational” model) • Combine with receding horizon control approach to obtain interaction ‐ aware controller – Next Steps: Combine with previous work on correct ‐ by ‐ construction control with temporal logic specifications • Temporal logic compiled into constraints – Need to improve constrained optimization methods! S. A. Seshia 20

Two Stories: 1 Verification, 1 Control Control: Human Cyber-Physical Systems (e.g. autonomous/semi-autonomous driving) Learning (Synthesizing) Models of Human Behavior Verification: Timing Analysis of Embedded Software Learning (Synthesizing) Model of Platform (how platform impacts a program’s timing behavior) S. A. Seshia 21

Game ‐ Theoretic Timing Analysis • S. A. Seshia and A. Rakhlin. Game-Theoretic Timing Analysis . In ICCAD 2008. • S. A. Seshia and A. Rakhlin. Quantitative Analysis of Systems Using Game-Theoretic Learning . In ACM Trans. Embed. Sys., 2012. S. A. Seshia 22

Challenge in Timing Analysis Challenge in Timing Analysis Does the brake-by-wire software always actuate the brakes within 1 ms? NASA’s Toyota UA report (2011) mentions: “ In practice…there are significant limitations ” (in the state of the art in timing analysis). CHALLENGE: ENVIRONMENT MODELING Need a good model of the platform (processor, memory hierarchy, network, I/O devices, etc.) – 23 –

Complexity of a Timing Model: Complexity of a Timing Model: Path Space x Platform State Space Path Space x Platform State Space On a processor with a data cache flag!=0 Timing of an edge (basic block) depends on: flag=1; x • Path it lies on (*x)++; • Initial platform state flag!=0 Challenges: • Exponential number of paths and platform states! *x += 2; • Lack of visibility into platform state Program CFG unrolled to a DAG – 24 –

Example: Automotive Window Controller Example: Automotive Window Controller ~ 1000 lines of C code ~ 10 16 paths – 25 –

Our Approach and Contributions Our Approach and Contributions [S. A. Seshia & A. Rakhlin, ICCAD ’08, ACM TECS] Model the estimation problem as a Game – Tool vs. Platform  Measurement-based, but minimal instrumentation – Perform end-to-end measurements of selected (linearly many) paths on platform  Learn Environment Model – Similar to online shortest path in the ‘bandit’ setting  Online, randomized algorithm: GameTime – Theoretical guarantee: can predict worst-case timing with arbitrarily high probability under model assumptions  Uses satisfiability modulo theories (SMT) solvers for test generation – 26 –

The Game Formulation The Game Formulation  Complexity  Path Space x Platform State Space (controllable) (uncontrollable)  Model as a 2-player Game: Tool vs. Platform – Tool selects program paths – Platform ‘selects’ its state (possibly adversarially)  Questions: – What is a good platform model? – How to select paths so that we can learn an accurate platform model from executing those? – 27 –

Platform Model Platform Model Platform selects weights for edges of the CFG Models path-independent timing w Nominal weight on edge of unrolled CFG + + Path-specific perturbation  Models path-dependent timing – 28 –

A Path is a Vector x  {0,1} m A Path is a Vector x  {0,1} m ( m = #edges) 1 1 1 1 Insight: Only need to sample 1 a Basis of the space of paths 1 – 29 –

Basis Paths Basis Paths #(basis paths 1 ≤ m 1 < 200 basis paths for automotive 1 controller 1 Useful to compute certain special 1 bases called “barycentric 1 spanners” – 30 –

Timing Analysis Game (Our Model) Timing Analysis Game (Our Model) Played over several rounds t = 1, 2, 3, …,  At each round t: 5 Tool Platform picks x t 7 picks w t CFG 1 Platform picks  t ( x t ) 11 (-1, -1, -1, -1) Tool observes l t = x t · (w t +  t ) (5+7+1+11) - 4 = 20 At round  : Tool makes prediction (longest path x*  )  Tool wins iff its prediction is correct – 31 –

Theorem about Estimating Distribution Theorem about Estimating Distribution (pictorial view) (pictorial view) Mean Perturbation Assumption:  x  Paths | E [ x .  t ] | ≤  max  is O ( b  max ) (exec. time) – 32 –

Some Experimental Results Some Experimental Results (details in ICCAD’08, ACM TECS, FMCAD’11 papers)  GameTime is Efficient – E.g.: 7 x 10 16 total paths vs. < 200 basis paths  Accurately predicts WCET for complex platforms – I & D caches, pipeline, branch prediction, …  Basis paths effectively encode information about timing of other paths – Found paths 25% longer than sampled basis  GameTime can accurately estimate the distribution of execution times with few measurements – Measure basis paths, predict other paths – 33 –

Discussion: Qualitative Characterization of the Problems Described Control/Synthesis Verification/Analysis Adversarial Almost black-box (w+  ) platform model • Platform only constrained by assumptions on w,  • • Know only structure of human reward function beforehand, observe entire system state • Human can behave arbitrarily, albeit only as a rational agent, not actively violating robot’s obj. Cooperative No Full Information Information S. A. Seshia 34

Game Theoretic Learning for Verification and Control Sanjit A. - PowerPoint PPT Presentation

Game Theoretic Learning for Verification and Control Sanjit A. Seshia Professor EECS, UC Berkeley Joint work with Dorsa Sadigh, Jon Kotker, Daniel Bundala, Anca Dragan, Alexander Rakhlin, S. Shankar Sastry Dagstuhl Seminar March 16, 2017 Two

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure

Game Theoretic Pragmatics Michael Franke Preliminaries Game Theory Fundamentals Interpretation

DIVS DL/ID Verification Systems Verification of Legal Status DIVS Passport Verification

A Game-Theoretic Approach to Network Security Mohammad Pirani and Henrik Sandberg Department of

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law General game playing

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

VIDEOGAMES ARE A MESS Ian Bogost WHAT IS A GAME? Is a game a system of rules, or is a game a

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Connect your device to application GAME ENGINE ON ANDROID Julian Chu Agenda We Love Game Why

Recursion Theoretic Results for the Game of Cops and Robbers on Graphs Shelley Stahl University

Intoduction to the Fifth Workshop Game-Theoretic Probability and Related Topics Glenn Shafer 13

Incentives in Crowdsourcing: A Game-theoretic Approach ARPITA GHOSH Cornell University NIPS

Computing Game-Theoretic Solutions for Security Vincent Conitzer Dmytro Korzhyk Dmytro Korzhyk

Derandomization in Game- Theoretic Probability Kenshi Miyabe, Meiji University, Japan (joint

Computer Science & Engineering 423/823 Design and Analysis of Algorithms Lecture 04

Leadership Minali Wadu Mesthri BSc in HR & Leadership (UK) MSc in Business Psychology (UK)

Singapore RI conference organized by A*STAR, NTU and NUS October 22, 2018 60 minutes

Walgreens Boots Alliance Inc. (NASDAQ: WBA) Prepared for Best Ideas 2018, Hosted by MOI Global by

1st Latin American Summer School Prof Jacob Bedrossian University of Maryland, College Park USA

Industrial Management & Data Systems Adopting customer relationship management technology

CHIRAL DYNAMICS and NUCLEAR MATTER Wolfram Weise ECT * Trento and T echnische U niversitt M

FileSender http://www.filesender.org/ Update: Sept. 2010 Jan. 2011 Jan Meijer TF-Storage 9

Game Theoretic Learning for Verification and Control Sanjit A. - PowerPoint PPT Presentation

Game Theoretic Learning for Verification and Control Sanjit A. Seshia Professor EECS, UC Berkeley Joint work with Dorsa Sadigh, Jon Kotker, Daniel Bundala, Anca Dragan, Alexander Rakhlin, S. Shankar Sastry Dagstuhl Seminar March 16, 2017 Two

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Game interoperability with functors functor AgsFun (structure Game : GAME) :&gt; sig structure

Game Theoretic Pragmatics Michael Franke Preliminaries Game Theory Fundamentals Interpretation

DIVS DL/ID Verification Systems Verification of Legal Status DIVS Passport Verification

A Game-Theoretic Approach to Network Security Mohammad Pirani and Henrik Sandberg Department of

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law General game playing

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

VIDEOGAMES ARE A MESS Ian Bogost WHAT IS A GAME? Is a game a system of rules, or is a game a

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Connect your device to application GAME ENGINE ON ANDROID Julian Chu Agenda We Love Game Why

Recursion Theoretic Results for the Game of Cops and Robbers on Graphs Shelley Stahl University

Intoduction to the Fifth Workshop Game-Theoretic Probability and Related Topics Glenn Shafer 13

Incentives in Crowdsourcing: A Game-theoretic Approach ARPITA GHOSH Cornell University NIPS

Computing Game-Theoretic Solutions for Security Vincent Conitzer Dmytro Korzhyk Dmytro Korzhyk

Derandomization in Game- Theoretic Probability Kenshi Miyabe, Meiji University, Japan (joint

Computer Science &amp; Engineering 423/823 Design and Analysis of Algorithms Lecture 04

Leadership Minali Wadu Mesthri BSc in HR &amp; Leadership (UK) MSc in Business Psychology (UK)

Singapore RI conference organized by A*STAR, NTU and NUS October 22, 2018 60 minutes

Walgreens Boots Alliance Inc. (NASDAQ: WBA) Prepared for Best Ideas 2018, Hosted by MOI Global by

1st Latin American Summer School Prof Jacob Bedrossian University of Maryland, College Park USA

Industrial Management &amp; Data Systems Adopting customer relationship management technology

CHIRAL DYNAMICS and NUCLEAR MATTER Wolfram Weise ECT * Trento and T echnische U niversitt M

FileSender http://www.filesender.org/ Update: Sept. 2010 Jan. 2011 Jan Meijer TF-Storage 9

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure

Computer Science & Engineering 423/823 Design and Analysis of Algorithms Lecture 04

Leadership Minali Wadu Mesthri BSc in HR & Leadership (UK) MSc in Business Psychology (UK)

Industrial Management & Data Systems Adopting customer relationship management technology