Learning Strategies in Game- Theoretic Data Interaction Ben - - PowerPoint PPT Presentation

learning strategies in game theoretic data interaction
SMART_READER_LITE
LIVE PREVIEW

Learning Strategies in Game- Theoretic Data Interaction Ben - - PowerPoint PPT Presentation

Learning Strategies in Game- Theoretic Data Interaction Ben McCamish, Arash Termehchy, Behrouz Touri, Liang Huang I nformation & D ata Manag e ment and A nalytics Laboratory (IDEA) 1 Querying a database of student grades Grades First_Name


slide-1
SLIDE 1

Learning Strategies in Game- Theoretic Data Interaction

Ben McCamish, Arash Termehchy, Behrouz Touri, Liang Huang Information & Data Management and Analytics Laboratory (IDEA)

1

slide-2
SLIDE 2
  • A user’s intent is the content they

wish to find in the database

  • They use queries attempting to

communicate their intent

Querying a database of student grades

Results

First_Name Last_Name Dept. Grade

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

2

slide-3
SLIDE 3

Most users cannot precisely express their intents

  • Intent: user looking for grade of

student Kerry Smith

  • Not sufficiently familiar with the

database content and structure

Results

First_Name Last_Name Dept. Grade

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

3

slide-4
SLIDE 4

Most users cannot precisely express their intents

  • Query: Has last name “Smith”
  • Does not precisely express intent

Smith

Results

First_Name Last_Name Dept. Grade

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

4

slide-5
SLIDE 5

Smith

  • Database has too many tuples

matching query, mostly non-relevant.

Most users cannot precisely express their intents

Results

First_Name Last_Name Dept. Grade

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

5

slide-6
SLIDE 6
  • Database system returns only a

subset of matching tuples

Most users cannot precisely express their intents

Smith

Results

First_Name Last_Name Dept. Grade

Sarah Smith CE A John Smith EE B

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

6

slide-7
SLIDE 7
  • The user doesn’t find the student

she is looking for

Most users cannot precisely express their intents

7

Smith

Results

First_Name Last_Name Dept. Grade

Sarah Smith CE A John Smith EE B

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

7

slide-8
SLIDE 8

Smith CS

Results

First_Name Last_Name Dept. Grade

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

Users learn by interacting with database systems

  • Reformulates query after learning about the

database and it’s content.

  • Reformulated Query: Has last name

“Smith” and is in the Department “CS”

  • New query expresses user’s intent much

more accurately

8

slide-9
SLIDE 9

Smith CS

Results

First_Name Last_Name Dept. Grade

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

  • Database system finds the desired

tuple

But they learn by interacting with database systems

9

slide-10
SLIDE 10
  • Database system returns the

desired tuple

Smith CS

Results

First_Name Last_Name Dept. Grade

Kerry Smith CS D

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

But they learn by interacting with database systems

10

slide-11
SLIDE 11

11

  • User selects the returned tuple
  • Learning and reformulating query

allowed the user to find the desired student

Smith CS

Results

First_Name Last_Name Dept. Grade

Kerry Smith CS D

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

But they learn by interacting with database systems

11

slide-12
SLIDE 12

Database system can learn as well

  • Intent: User looking for grade of

student Kerry Smith

  • Query: Has Last Name “Smith”
  • Does not precisely express intent

Smith

Results

First_Name Last_Name Dept. Grade

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

12

slide-13
SLIDE 13

Smith

  • Database has too many tuples

matching query

Results

First_Name Last_Name Dept. Grade

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

Database system can learn as well

13

slide-14
SLIDE 14
  • Database system has learned to

return Kerry Smith in CS department

Smith

Results

First_Name Last_Name Dept. Grade

Kerry Smith CS D

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

Database system can learn as well

14

slide-15
SLIDE 15
  • The user finds and selects the tuple

15

Smith

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

Results

First_Name Last_Name Dept. Grade

Kerry Smith CS D

Database system can learn as well

15

slide-16
SLIDE 16

Interaction is a game between two potentially rational agents

  • Two Players: user and database system
  • They have common interests and work together
  • Want to reach a mutual understanding such that user gets desired

information

  • Strategy of the user is how intents are expressed using queries
  • Strategy of the database system is how to decode queries

16

slide-17
SLIDE 17

User strategy

Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Query # Query q1

“Smith CE”

q2

“Smith”

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

17

  • Row-stochastic mapping

from intents to queries.

slide-18
SLIDE 18

Query # Query q1

“Smith CE”

q2

“Smith”

User may use a single query for multiple intents

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1 Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

18

  • Due to the lack of knowledge,

saving time, …

  • Makes it hard to interpret the

exact intent behind the query.

slide-19
SLIDE 19

Database system strategy

Database Strategy (D)

e1 e2 e3 q1 1 q2 0.5 0.5 Intent # Intent e1

ans(y)← Grades(x,’Smith’, ‘EE’, y)

e2

ans(y)← Grades(x,’Smith’, ‘CE’, y)

e3

ans(y)← Grades(x,’Smith’, ‘CS’, y)

Query # Query q1

“Smith CE”

q2

“Smith”

Grades

First_Name Last_Name Dept. Grade

… … … …

Sarah Smith CE A John Smith EE B Kerry Smith CS D

… … … …

19

Sarah Smith in CE

  • Row-stochastic mapping

from queries to intents

slide-20
SLIDE 20

Payoff: expected effectiveness of communicating every intent

r(U, D) =

m

X

i=1

πi

n

X

j=1

Uij

  • X

`=1

Dj` prec(ei, e`)

Database Strategy (D)

e1 e2 e3 q1 1 q2 0.5 0.5

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1

20

Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Query # Query q1

“Smith CE”

q2

“Smith”

  • Prior probability of intent
slide-21
SLIDE 21

Payoff: expected effectiveness of communicating every intent

r(U, D) =

m

X

i=1

πi

n

X

j=1

Uij

  • X

`=1

Dj` prec(ei, e`)

21

Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Query # Query q1

“Smith CE”

q2

“Smith”

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1

Database Strategy (D)

e1 e2 e3 q1 1 q2 0.5 0.5

slide-22
SLIDE 22

Payoff: expected effectiveness of communicating every intent

r(U, D) =

m

X

i=1

πi

n

X

j=1

Uij

  • X

`=1

Dj` prec(ei, e`)

22

Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Query # Query q1

“Smith CE”

q2

“Smith”

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1

Database Strategy (D)

e1 e2 e3 q1 1 q2 0.5 0.5

slide-23
SLIDE 23

Payoff: expected effectiveness of communicating every intent

  • Precision is the fraction of the returned tuples that are desired
  • Computed using user feedback

r(U, D) =

m

X

i=1

πi

n

X

j=1

Uij

  • X

`=1

Dj` prec(ei, e`)

23

Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Query # Query q1

“Smith CE”

q2

“Smith”

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1

Database Strategy (D)

e1 e2 e3 q1 1 q2 0.5 0.5

slide-24
SLIDE 24

Interesting problems

  • 1. What are the stable states (equilibria) of the game? Is there

any undesirable (sub-optimal) equilibria?

  • 2. What are the user’s learning mechanisms?
  • 3. What learning algorithms should the database system adopt

so the collaboration converges to desirable equilibria?

1.Learning may not converge or converge to a desired equilibrium in games, e.g., Shapely game.

24

slide-25
SLIDE 25

Equilibria of the game

  • Nash Equilibrium: A strategy profile in which no player can

increase its payoff by unilaterally deviating from the current strategy

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1

Database Strategy (D)

e1 e2 e3 q1 1 q2 0.5 0.5

r(U,D) = 2

25

Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Query # Query q1

“Smith CE”

q2

“Smith”

slide-26
SLIDE 26

The game has Nash equilibria with sub-optimal payoff

r(U,D) = 1

26

Database Strategy (D)

e1 e2 e3 q1 1 q2 1

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1 Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Query # Query q1

“Smith CE”

q2

“Smith”

  • User express all intents using q2: “Smith”
  • Database system aways returns e2: Sara

Smith in CE

  • Detailed analyses are at

http://tinyurl.com/charmarxiv

slide-27
SLIDE 27

The game has Nash equilibria with sub-optimal payoff

  • If user learns query q1 to represent

e2, payoff will not increase

r(U,D) = 1 r(U,D) = 1

27

.User Strategy (U)

q1 q2 e1 1 e2 1 e3 1

Database Strategy (D)

e1 e2 e3 q1 1 q2 1

User Strategy (U)

q1 q2 e1 1 e2 1 e3 1 Intent # Intent e1

John Smith in EE

e2

Sarah Smith in CE

e3

Kerry Smith in CS

Query # Query q1

“Smith CE”

q2

“Smith”

Database Strategy

e1 e2 e3 q1 1 q2 1

  • Details at http://tinyurl.com/charmarxiv
slide-28
SLIDE 28

How users may learn?

  • Research in psychology shows that humans exhibit

reinforcement learning behavior

  • Select a query based on its past payoff, i.e., exploitation.
  • Explore and try new/ less successful queries to gain new

knowledge, i.e., exploration.

  • Sacrifice payoff in the short-term in the hope of more payoff over the

long run.

28

slide-29
SLIDE 29

User learning mechanism

  • Win-Stay/Lose-Randomize: keeps using a query

with non-zero payoff, randomly picks a query

  • therwise.
  • Latest-Reward: uses a query with probability

proportional to its latest payoff

Short-term memory

29

slide-30
SLIDE 30

User learning mechanism

  • Bush and Mosteller’s: Reinforces probability of using a query with

non-zero payoff by an amount independent of payoff

  • Roth and Erev’s: Reinforces probability of using a query proportional

to its accumulated payoff

  • Roth and Erev’s Modified: Adds the ability to forget to Roth and Erev
  • Cross’s: Reinforces probability of using a query proportional to a

linear adjustment of its accumulated payoff

Long-term memory

30

slide-31
SLIDE 31
  • Yahoo query log over 300,00 interactions
  • https://webscope.sandbox.yahoo.com

Empirical evaluation

Method Mean Squared Distance Bush and Mosteller’s 0.0112 Cross’s 0.01131 Roth and Erev 0.00993 Roth and Erev Modified 0.00994 Win-Stay/Lose-Randomize 0.01752 Latest-Reward 0.15167

31

slide-32
SLIDE 32

What learning strategies should database system use?

  • Current database systems assume that user strategy is fixed.
  • They model the problem as stochastic multi-armed bandit and use online learning

algorithms, such as UCB-1

  • We have used Roth and Erev reinforcement algorithms for the database

system learning.

  • Uses randomization to explore available options
  • Theorem: If players use the Roth and Erev method, the sequence of

payoffs is a submartingale (statistically non-decreasing) and converges almost surely.

32

slide-33
SLIDE 33

Roth and Erev outperforms UCB-1in the long run

33

  • Yahoo! interaction log
slide-34
SLIDE 34

Conclusion

  • The interaction between user and database systems is

better modeled as a collaborative game

  • The game has both desirable and undesirable equilibria
  • Users are rather surprisingly intelligent learners
  • Database system should use randomized learning strategies.
  • More information at our technical report: http://tinyurl.com/charmarxiv

34