Generatjng Explanatjons for Temporal Logic Planner Decisions Daniel - - PowerPoint PPT Presentation

generatjng explanatjons for temporal logic planner
SMART_READER_LITE
LIVE PREVIEW

Generatjng Explanatjons for Temporal Logic Planner Decisions Daniel - - PowerPoint PPT Presentation

Generatjng Explanatjons for Temporal Logic Planner Decisions Daniel Kasenberg*, Ravenna Thielstrom, and Matuhias Scheutz *Daniel Kasenberg dmk@cs.tufus.edu @dkasenberg dkasenberg.github.io Our (long-term) goal Agents which can Learn


slide-1
SLIDE 1

@dkasenberg

*Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Generatjng Explanatjons for Temporal Logic Planner Decisions

Daniel Kasenberg*, Ravenna Thielstrom, and Matuhias Scheutz

slide-2
SLIDE 2

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Our (long-term) goal

  • Agents which can

– Learn interpretable objectjves (through language and

behavior) [1]

– Behave competently with respect to these objectjves,

even when they confmict [2]

– Explain their behaviors to human teammates in terms of

these objectjves (and correct objectjves or world models if needed)

  • ... all while operatjng in the same environments

(MDPs) in which reinforcement learning agents have been successfully deployed.

[1] Kasenberg, D., & Scheutz, M. (2017, December). Interpretable apprentjceship learning with temporal logic specifjcatjons. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC) (pp. 4914-4921). IEEE. [2] Kasenberg, D., & Scheutz, M. (2018, April). Norm confmict resolutjon in stochastjc domains. In Thirty-Second AAAI Conference on Artjfjcial Intelligence.

slide-3
SLIDE 3

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Markov Decision Processes

A tuple , where

  • a fjnite set of states
  • a fjnite set of actjons
  • a transitjon functjon
  • an initjal state
  • a discount factor
  • a labeling functjon

– is a set of atomic propositjons – is the set of propositjons true at

  • Our explanatjon approach assumes deterministjc
slide-4
SLIDE 4

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Example: ShopWorld

  • Agent is a robot

sent to go shopping for its user in a store selling a watch

  • User wants the

watch, but gives the robot insuffjcient money

slide-5
SLIDE 5

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Linear temporal logic (LTL)

  • A simple propositjonal logic encoding tjme

where , are LTL statements, a propositjon.

  • : “in the next tjme step,
  • : “in all present and future tjme steps, ”
  • : “in some present/future tjme step, ”
  • : “ will be true untjl becomes true”
slide-6
SLIDE 6

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

LTL specifjcatjons in ShopWorld

“never leave the store while holding an object that has not been bought” (no shoplifuing) “leave the store while holding the watch”

slide-7
SLIDE 7

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Preferences over LTL objectjves

  • We can give each objectjve a priority

and a weight

  • Violatjons of objectjves with the same priority can be

traded ofg (using their weights as an “exchange rate”)

  • Violatjons of objectjves with difgerent prioritjes can’t

be traded ofg: the agent prefers to satjsfy the higher- priority objectjve and violate any number of lower- priority objectjves

– Lexicographic ordering

  • , induce a relatjon over vectors in
slide-8
SLIDE 8

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Multj-objectjve LTL planning problem

where

– a Markov Decision Process – a set of (safe/co-safe) LTL objectjves – are the weight and priority vectors respectjvely

slide-9
SLIDE 9

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

From LTL to fjnite state machines

  • We use syntactjcally (co-)safe LTL objectjves
  • For each such objectjve , we can construct a

fjnite state machine (FSM) which accepts on if is a bad (good) prefjx of

– e.g. → good prefjx any fjnite

trajectory where hold at some

  • Use this to construct product MDP whose

state space is

slide-10
SLIDE 10

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Solving the LTL planning problem

Let Then we can defjne a product-space reward functjon and thus can be framed as a reward maximizatjon problem on (solvable with value iteratjon):

slide-11
SLIDE 11

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

LTL “why” queries

  • We consider queries of the form , where is an

arbitrary (safe/co-safe) LTL statement

  • Interpretatjon: “why did the agent act in such a way as to

make hold?”

  • Examples in ShopWorld:

“why didn’t the agent leave the store?”

“why did the agent never buy the watch?”

“why didn’t the agent leave the store while holding the watch”

slide-12
SLIDE 12

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Minimal evidence for an unsatjsfactory trajectory

  • We defjne the minimal evidence that a trajectory is

unsatjsfactory for an LTL statement as: where

  • : positjve and negatjve literals of
  • : good prefjxes of if co-safe

non-bad prefjxes of if safe

  • e.g. in ShopWorld:
slide-13
SLIDE 13

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Explanatjon structures

  • The agent responds to a “why” query with an explanatjon

structure where

  • is a trajectory (or )
  • contains one or more pairs , where

– is an LTL statement – is a set of (tjmestep, literal) pairs suffjcient to show

that is unsatjsfactory for

  • is as , but for
slide-14
SLIDE 14

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Answering “ ”

  • 1. ? If not, return

(“ is not, in fact, true”) e.g. 2.Is there some achievable s.t. ? If not, return (“ is true because impossible to make false”) e.g.

slide-15
SLIDE 15

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Answering “ ”

  • 3. Compute a trajectory that maximally satjsfjes

such that

  • The solutjon to the new planning problem
  • Return the explanatjon structure

(comparing and in terms of their satjsfactjon of )

  • Because maximally satjsfjes , this structure indicates how

satjsfying would compromise the agent’s ability to satjsfy

slide-16
SLIDE 16

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Answering “ ” in ShopWorld

query: ( “why didn’t you leave the store while holding the watch?”)

  • 1. ?
  • 2. ?

3. return:

  • Indicates that while the true trajectory fails to leave while holding the watch,

the only way to satjsfy would have been to steal the watch, which would violate a higher-priority specifjcatjon : pickUp, leaveStore

slide-17
SLIDE 17

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

From explanatjon structures to natural language

  • We integrated this functjonality with the NL

pipeline in DIARC, a robotjc architecture [3, 4]

  • Specifjcatjons and queries in an object-oriented

extension to LTL (violatjon enumeratjon language; VEL) allowing quantjfjcatjon over objects

  • Utuerance → VEL query →explanatjon structure

→ natural language response

[3] Kasenberg, D., Roque, A., Thielstrom, R. and Scheutz, M., 2019. Engaging in Dialogue about an Agent’s Norms and Behaviors. In Proceedings of the 1st Workshop on Interactjve Natural Language Technology for Explainable Artjfjcial Intelligence (NL4XAI 2019) (pp. 26-28). [4] Kasenberg, D., Roque, A., Thielstrom, R., Chita-Tegmark, M. and Scheutz, M., 2019. Generatjng justjfjcatjons for norm-related agent decisions. In Proceedings of the 12th Internatjonal Conference on Natural Language Generatjon (pp. 484-493).

slide-18
SLIDE 18

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

  • Example: ShopWorld with two objects (glasses

and watch); agent can afgord one

– Buys the glasses, leaves the watch

Natural language explanatjons

slide-19
SLIDE 19

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Future work

  • Incorporatjng explicit causal models (esp. in

case)

  • Tailoring explanatjons to interactant knowledge
  • Adaptjng to stochastjc environments

– Need to represent multjple trajectories or probability

distributjon

  • Improving effjciency of planner

– Impractjcal for nontrivial domains

  • Dropping assumptjon that agent has perfect

knowledge of transitjon dynamics

slide-20
SLIDE 20

@dkasenberg

Daniel Kasenberg

dmk@cs.tufus.edu dkasenberg.github.io

Thanks

  • Funding sources

– NSF IIS grant 43520050 – NASA grant C17-2D00-TU

  • Collaborators

– Matuhias Scheutz (advisor; co-author) – Ravenna Thielstrom (co-author) – Antonio Roque – Meia Chita-Tegmark – others