Planner Metrics Should Satisfy Independence of Irrelevant - PowerPoint PPT Presentation

Planner Metrics Should Satisfy Independence of Irrelevant Alternatives Jendrik Seipp July 12, 2019 University of Basel, Switzerland

Independence of irrelevant alternatives (IIA) • one of four criteria from Arrow’s impossibility theorem • decision whether A > B or A < B is irrelevant from C • important for planner metrics, but some violate it 1/9

• if reference plans are optimal, sat satisfies IIA sat does not satisfy IIA IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9  Cost ∗ ( π )  Cost ( P ,π ) sat ( P , π ) =  • Cost ∗ ( π ) is the cost of a reference plan

sat does not satisfy IIA IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9  Cost ∗ ( π )  Cost ( P ,π ) sat ( P , π ) =  • Cost ∗ ( π ) is the cost of a reference plan • if reference plans are optimal, sat satisfies IIA

IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9  Cost ∗ ( π )  Cost ( P ,π ) sat ( P , π ) =  • Cost ∗ ( π ) is the cost of a reference plan • if reference plans are optimal, sat satisfies IIA sat does not satisfy IIA

use optimal planners or domain-specific solvers to find IPC satisficing track – example sat good reference plans B > A 1.4 0.7 0.65 1/4 1/5 1/1 2 2/5 2/4 2/5 1 C B A 3/9 Cost R A B C 1.3 1.4 4/4 4/5 2/5 2/4 B A sat 6 4 5 1 2 5 4 5 π 1 π 1 π 2 π 2 ∑ → A > B

use optimal planners or domain-specific solvers to find IPC satisficing track – example 1.3 good reference plans 1.4 0.7 0.65 1/4 1/5 1/1 2/5 2/4 2/5 C B A sat Cost R A B C 3/9 1.4 B 2 5 4 5 6 4 5 1 sat A 2/5 2/4 4/4 4/5 π 1 π 1 π 1 π 2 π 2 π 2 ∑ ∑ → A > B → B > A

IPC satisficing track – example 1.3 good reference plans 1.4 0.7 0.65 1/4 1/5 1/1 2/5 2/4 2/5 C B A sat Cost R A B C 3/9 1.4 4/4 4/5 2/5 2/4 B A sat 2 5 4 5 6 4 5 1 π 1 π 1 π 1 π 2 π 2 π 2 ∑ ∑ → A > B → B > A → use optimal planners or domain-specific solvers to find

use agl 2018 in future agile tracks agl 2018 P IPC agile track 1 300 if T P 0 300 T P if 1 300 T P 1 1 if T P otherwise 0 4/9 T ∗ ( π ) : mininum runtime of all participating planners  T ( P ,π ) 1 / ( 1 + log 10 T ∗ ( π ) ) if T ( P , π ) ≤ 300  agl 2014 ( P , π ) = 

IPC agile track 0 0 1 otherwise 4/9 T ∗ ( π ) : mininum runtime of all participating planners  T ( P ,π ) 1 / ( 1 + log 10 T ∗ ( π ) ) if T ( P , π ) ≤ 300  agl 2014 ( P , π ) =   if T ( P , π ) < 1    1 − log( T ( P ,π )) agl 2018 ( P , π ) = if 1 ≤ T ( P , π ) ≤ 300 log( 300 )   if T ( P , π ) > 300  → use agl 2018 in future agile tracks

Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real  par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S )  par10 ( S ) sparkle ( P , π ) = 

Sparkle planning challenge – example • 100 tasks • {A, B} B > A • {A, B, C} A > B 6/9 • planner A solves 1 task π • planners B and C solve 99 tasks but fail to solve π

Sparkle planning challenge – example • 100 tasks 6/9 • planner A solves 1 task π • planners B and C solve 99 tasks but fail to solve π • {A, B} → B > A • {A, B, C} → A > B

Sparkle planning challenge – problems of the metric • penalizes similar planners • easily gameable: submit several “dummy” planners and one “real” planner (leader board, IPC planners available) • penalizes collaboration, favors closed-source planners • discourages submitting multiple planners 7/9

Sparkle planning challenge – suggestion • IIA: use fixed portfolio of baseline planners 8/9

Summary • IIA is critical for evaluation metrics • several planner metrics do not satisfy IIA • there are alternatives that do satisfy IIA 9/9

Planner Metrics Should Satisfy Independence of Irrelevant - PowerPoint PPT Presentation

Planner Metrics Should Satisfy Independence of Irrelevant Alternatives Jendrik Seipp July 12, 2019 University of Basel, Switzerland Independence of irrelevant alternatives (IIA) one of four criteria from Arrows impossibility theorem

CDD ambiguity and irrelevant CDD ambiguity and irrelevant deformations of 2D QFT deformations

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

NEGATIVE POSITIVE FLUFFY AND IRRELEVANT UNHEARD GOD ONLY GIVES SPECIAL KIDS TO SPECIAL

Dave Wechner, Director Roger Harada, Planner III James Black, Planner II Valerie Montague,

Development Metrics You Should Use (but dont) Development Metrics You Should Use (but

Metrics You Should Use (but Dont) @catswetel at #YOW18 Metrics You Should Use (but Dont)

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Computational Social Choice: Autumn 2010 Ulle Endriss Institute for Logic, Language and

r t Prt

Beams Ian Swainson IAEA-Physics Section With special thanks to Gary Was, University of Michigan

CS 6410: ADVANCED SYSTEMS TODAYS LECTURE: ROBBERT VAN RENESSE NORMAL PROF: HAKIM WEATHERSPOON

CSC304 Lectures 17-18 Voting 3: Axiomatic, Statistical, and Utilitarian Approaches to Voting

Chapter 3 Section 2 MA1020 Quantitative Literacy Sidney Butler Michigan Technological University

AUTO2, a saturation-based heuristic prover for higher-order logic Bohua Zhan Massachusetts

Collective Choices Lecture 2: Social Welfare Functions, Restricted Domains and Voting Power

Sambuz

Useful Links

Newsletter

Mail Us