from Raw Choice Data ARJUN SESHADRI, STANFORD UNIVERSITY ALEX - - PowerPoint PPT Presentation

from raw choice data
SMART_READER_LITE
LIVE PREVIEW

from Raw Choice Data ARJUN SESHADRI, STANFORD UNIVERSITY ALEX - - PowerPoint PPT Presentation

Discovering Context Effects from Raw Choice Data ARJUN SESHADRI, STANFORD UNIVERSITY ALEX PEYSAKHOVICH, FACEBOOK ARTIFICIAL INTELLIGENCE RESEARCH JOHAN UGANDER, STANFORD UNIVERSITY ICML 2019 Modelling in Discrete Choice Data of the form


slide-1
SLIDE 1

Discovering Context Effects from Raw Choice Data

ARJUN SESHADRI, STANFORD UNIVERSITY ALEX PEYSAKHOVICH, FACEBOOK ARTIFICIAL INTELLIGENCE RESEARCH JOHAN UGANDER, STANFORD UNIVERSITY

ICML 2019

slide-2
SLIDE 2

Modelling in Discrete Choice

 Data of the form where “alternative is chosen from the set ”

and is a subset of , the universe of alternatives

 Discrete choice settings are ubiquitous

slide-3
SLIDE 3

Modelling in Discrete Choice

 Data of the form where “alternative is chosen from the set ”

and is a subset of , the universe of alternatives

 Discrete choice settings are ubiquitous

slide-4
SLIDE 4

Encompasses Many Fields

Recommender Systems Inverse reinforcement learning Virtual Assistants Structural Modeling

slide-5
SLIDE 5

Independence of Irrelevant Alternatives (IIA)

 Fully determines the workhorse Multinomial Logit (MNL) Model  Main (strong) assumption:  The Good:

 inferentially tractable, powerful, and interpretable

 The Bad:

 When IIA does not hold, out of sample predictions are wildly

miscalibrated

 Cannot account for the wide literature on context effects (e.g.

Compromise Effect)

Size Savings Compromise Effect

slide-6
SLIDE 6

Problems we address

 Modelling individual choice behavior

 Behavioral economics “anomalies” are all over the place

 Search Engine Ads (Ieong-Mishra-Sheffet ’12, Yin et al. ’14)  Google Web Browsing Choices (Benson-Kumar-Tomkins ’16)

 Need to model while retaining parametric and inferential efficiency

 Statistical tests for violations of IIA

 General, global tests are intractable (Seshadri & Ugander ‘19, Long & Freese ‘05)  Model based approaches challenging due to identifiability issues (Cheng & Long,

‘07)

“ad group quality”

slide-7
SLIDE 7

Context Dependent Utility Model (CDM)

Developing the CDM

Universal logit model (McFadden et al., ’77)

slide-8
SLIDE 8

Context Dependent Utility Model (CDM)

Developing the CDM

Universal logit model (McFadden et al., ’77)

Decompose the model (Batsell & Polking, ’85)

slide-9
SLIDE 9

Context Dependent Utility Model (CDM)

Developing the CDM

Universal logit model (McFadden et al., ’77)

Decompose the model (Batsell & Polking, ’85) Truncate to 2nd

  • rder (effects

are pairwise)

Full Rank CDM

slide-10
SLIDE 10

Context Dependent Utility Model (CDM)

Developing the CDM

Universal logit model (McFadden et al., ’77)

Decompose the model (Batsell & Polking, ’85) Truncate to 2nd

  • rder (effects

are pairwise)

Full Rank CDM

Make a low rank approximation (parameters linear in items)

Low Rank CDM

slide-11
SLIDE 11

Context Dependent Utility Model (CDM)

Developing the CDM

Universal logit model (McFadden et al., ’77)

Decompose the model (Batsell & Polking, ’85) Truncate to 2nd

  • rder (effects

are pairwise)

Full Rank CDM

Make a low rank approximation (parameters linear in items)

Low Rank CDM

r-dimensional latent feature vector r << n items Other items change how features are traded off

slide-12
SLIDE 12

A Theoretical Preview

slide-13
SLIDE 13

A Theoretical Preview

Identifiability Sufficient: Necessary: More generally:

slide-14
SLIDE 14

A Theoretical Preview

Identifiability Sufficient: Necessary: More generally: Convergence Guarantees

slide-15
SLIDE 15

A Theoretical Preview

Identifiability Sufficient: Necessary: More generally: Convergence Guarantees Hypothesis Testing

slide-16
SLIDE 16

Unifying Existing Choice Models

Low Rank CDM

slide-17
SLIDE 17

Unifying Existing Choice Models

Tversky-Simonson Model Low Rank CDM

(Tversky & Simonson, 1993)

slide-18
SLIDE 18

Unifying Existing Choice Models

Tversky-Simonson Model Low Rank CDM Batsell-Polking Model

(Tversky & Simonson, 1993) (Batsell & Polking, 1985)

slide-19
SLIDE 19

Unifying Existing Choice Models

Tversky-Simonson Model Low Rank CDM Blade-Chest Model Batsell-Polking Model

(Tversky & Simonson, 1993) (Batsell & Polking, 1985) (Chen & Joachims, 2016)

slide-20
SLIDE 20

An Empirical Preview: Performance and Interpretability

slide-21
SLIDE 21

An Empirical Preview: Performance and Interpretability

Transportation Preferences (Koppelman & Bhat, ‘06)

Survey of transportation choices for residents in various San Francisco neighborhoods

Low Rank CDMs significantly outperform MNL and MMNL

slide-22
SLIDE 22

An Empirical Preview: Performance and Interpretability

Not Like the Other (Heikinheimo & Ukkonen, ‘13)

Individuals are shown triplets of nature photographs

asked to choose photo most unlike the other two

CDM illustrates intuitive property of dataset: similar items have negative target-context inner product

 Induces grouping by similarity in both target and

context vectors

Transportation Preferences (Koppelman & Bhat, ‘06)

Survey of transportation choices for residents in various San Francisco neighborhoods

Low Rank CDMs significantly outperform MNL and MMNL

slide-23
SLIDE 23

Conclusions

 CDM models context effects with efficiency guarantees and enables

practical tests of IIA

 Can be easily applied to many pipelines by modifying “the final layer”  Simultaneously brings both:

 Machine Learning rigor to Econometrics models (identifiability, convergence)  Econometrics modeling (choice set effects) into Machine Learning research

Thanks!!

Discovering Context Effects from Raw Choice Data

Arjun Seshadri, Alex Peysakhovich, and Johan Ugander Poster: Pacific Ballroom #234