Retrieval as Interaction African Summer School on Machine Learning - - PowerPoint PPT Presentation

retrieval as interaction
SMART_READER_LITE
LIVE PREVIEW

Retrieval as Interaction African Summer School on Machine Learning - - PowerPoint PPT Presentation

Retrieval as Interaction African Summer School on Machine Learning for Data Mining and Search Maarten de Rijke January 14, 2019 University of Amsterdam derijke@uva.nl Based on joint work with Abhinav Khaitan, Ana Lucic, Anne Schuth, Boris


slide-1
SLIDE 1

Retrieval as Interaction

African Summer School on Machine Learning for Data Mining and Search

Maarten de Rijke January 14, 2019

University of Amsterdam derijke@uva.nl
slide-2
SLIDE 2

Based on joint work with Abhinav Khaitan, Ana Lucic, Anne Schuth, Boris Sharchilev, Branislav Kveton, Chang Li, Csaba Szepesv´ ari, Daan Odijk, Edgar Meij, Giorgio Stefanoni, Harrie Oosterhuis, Hinda Haned, Ilya Markov, Julia Kiseleva, Jun Ma, Kambadur Prabhanjan, Maartje ter Hoeve, Masrour Zoghi, Miles Osborne, Nikos Voskarides, Pavel Serdyukov, Pengjie Ren, Ridho Reinanda, Rolf Jagerman, Tor Lattimore, Yujie Lin, Yury Ustinovskiy, Zhaochun Ren, Ziming Li, and Zhumin Chen

1
slide-3
SLIDE 3

Background

1
slide-4
SLIDE 4

We need information to make decisions . . .

. . . to identify or structure a problem or opportunity . . . to put problem or opportunity in context . . . to generate alternative solutions . . . to choose the best alternative

2
slide-5
SLIDE 5 2
slide-6
SLIDE 6 2
slide-7
SLIDE 7

Information retrieval

Getting the right information

to the right people

in the right way

3
slide-8
SLIDE 8

Information retrieval – Two phases

4
slide-9
SLIDE 9

Information retrieval – Two phases

Online development Offmine development 5
slide-10
SLIDE 10

Information retrieval – Two phases

Online development Offmine development 6
slide-11
SLIDE 11

Information retrieval – The online phase

Retrieval system agent User environment examine document list generate implicit feedback evaluation measure reward document list action query state implicit feedback 7
slide-12
SLIDE 12

How does it all fit together?

A “spaghetti” picture for search

Logs Logs Logs UX Front door Online top-k retrievers vertical rankers blender index index index scheduler Offline source source crawl/ingest extraction aggregation enriching source indexing Evaluation framework interleaving
  • ffline evaluation
query understanding query improvement A/B learning

How does the AFIRM program fit?

8
slide-13
SLIDE 13

What does the offline phase mean?

A learning process

for Man

and Machine

9
slide-14
SLIDE 14

What does this mean for machines?

Sense – Plan – Act

10
slide-15
SLIDE 15

What does this mean for machines?

Understand and track intent

11
slide-16
SLIDE 16

What does this mean for machines?

Understand and track intent Update models and space of possible actions (answer, ranked list, SERP, . . . )

11
slide-17
SLIDE 17

What does this mean for machines?

Understand and track intent Update models and space of possible actions (answer, ranked list, SERP, . . . ) Select the best action and sense its effect

11
slide-18
SLIDE 18

What does this mean for machines?

Life is easier for systems than in an offline trained query-response paradigm

  • Engage with user
  • Educate/train user
  • Ask for clarification from user
12
slide-19
SLIDE 19

What does this mean for machines?

Life is easier for systems than in an offline trained query-response paradigm

  • Engage with user
  • Educate/train user
  • Ask for clarification from user

Life is harder for systems than in an offline trained query-response paradigm

  • Safety – Don’t hurt anyone
  • Explicability – Be transparent about model, about decisions
12
slide-20
SLIDE 20

Unpacking

Safety &

Explicability

13
slide-21
SLIDE 21

The plan for this morning

Background Safety Explicability Conclusion

14
slide-22
SLIDE 22

Safety

14
slide-23
SLIDE 23

Safety

Don’t perform worse than a reasonable baseline, e.g., production system people are used to Don’t take too long to learn to improve Don’t leave anyone behind & give everyone a fair deal Don’t fall into sinkholes – be diverse . . .

15
slide-24
SLIDE 24

When people change their mind

  • Off-policy evaluation uses historical

interaction to estimate performance

  • Non-stationary arises when user

preferences change over time

  • Idea: use decay average to correct for

bias in traditional IPS

  • Exponential decay IPS estimator

closely follow actual performance of recommender on LastFM

  • Standard IPS estimator fails to policy’s

approximate actual performance

200,000 400,000 600,000 800,000 1,000,000 Time t 0.0 0.2 0.4 Reward 0.0 0.2 0.4 Reward 0.0 0.2 0.4 Reward True VαIPS VIPS Adaptive VαIPS
  • R. Jagerman et al.. When people change their mind. In WSDM 2019, to appear.
16
slide-25
SLIDE 25

Safe online learning to re-rank via implicit click feedback

101 102 103 104 105 106 Step n 100 101 102 103 104 105 Regret PBM
  • Safely learn to re-rank in an online

setting

  • Learn user preferences not from

scratch but by combining strengths of

  • nline and offline settings
  • Start with initial ranked list (possible

learned online) and improve it online by gradually swapping high-ranked less attractive for low-ranked more attractive ones

  • C. Li et al.. Safe online learning to re-rank via implicit click feedback. Under review.
17
slide-26
SLIDE 26

Deep learning with logged bandit feedback

  • Play it safe by obtaining a lot more

training data

  • Train deep networks from data collected using

a running system – orders of magnitude more data

  • How – counterfactual risk minimization

approach using an equivariant empirical risk estimator with variance regularization

  • Resulting objective can be decomposed in a

way that allows stochastic gradient descent training

8 9 10 11 12 13 14 15 50000 100000 150000 200000 250000 Error Rate (test) Number of Bandit-Feedback Examples Bandit-ResNet FullInfo ResNet with CrossE
  • T. Joachims et al.. Deep learning with logged bandit feedback. In ICLR 2018.
18
slide-27
SLIDE 27

Dialogue generation: From imitation learning to inverse reinforcement learning

  • Making sure system responses are

informative and engaging

  • Adversarial dialogue generation model

that provides a more accurate and precise reward signal for generator training

  • An improvement of training stability
  • f adversarial training by employing

causal entropy regularization

  • Z. Li et al.. Dialogue generation: From imitation learning to inverse reinforcement learning. In AAAI 2019, to appear.
19
slide-28
SLIDE 28

Differentiable unbiased online learning to rank

Dueling Bandit Gradient Descent – first

  • nline learning to rank method
Query Document Document Document Document Ranking B User Weight #1 Weight #2 Document Document Document Document Ranking A Document Document Document Document Displayed Results Seeing/Interacting Interleaving Learning

Learns slow; hits a ceiling; fails to

  • ptimize neural models

PDGD – Unbiased, differentiable, able to

  • ptimize neural ranking models
5000 10000 15000 20000 25000 30000 impressions 0.20 0.25 0.30 0.35 0.40 0.45 0.50 NDCG perfect DBGD (linear) DBGD (neural) MGD (linear) PDGD (linear) PDGD (neural) LambdaMart (offline)
  • H. Oosterhuis and M. de Rijke. Differentiable unbiased online learning to rank. In CIKM 2018.
20
slide-29
SLIDE 29

The plan for this morning

Background Safety Explicability Conclusion

21
slide-30
SLIDE 30

Explicability

21
slide-31
SLIDE 31

Explicability

Are we “the patient” or “the doctor”? Are we the subject or the object of the interventions? Explicability

  • How does it work? −

→ Generate an explanation

  • How did we arrive at this decision? −

→ Especially when things go wrong

22
slide-32
SLIDE 32

Faithfully explaining rankings in a news recommender system

  • Explain this ranked list – what were

main features responsible for list

  • Find importance of ranking features by

perturbing their values and by measuring to what degree the ranking changes due to the changes

  • Design and train a neural network that

learns explanations generated by this method and is sufficiently efficient to run in a production environment Explanations are faithful, real-time and do not negatively impact engagement

  • M. ter Hoeve et al.. Faithfully explaining rankings in a news recommender system. Under review.
23
slide-33
SLIDE 33

Weakly-supervised contextualization of knowledge graph facts

Generate training data automatically using distant supervision

  • Explain your outcome – not

necessarily how you got to it

  • Better understands facts return from a

knowledge graph, but offering additional contextual facts

  • First generate a set of candidate facts

in the neighborhood of a given fact and then rank candidates using supervised learning to rank

  • Combine features learned from data

with a set of hand-crafted features

  • N. Voskarides et al.. Weakly-supervised contextualization of knowledge graph facts. In SIGIR 2018.
24
slide-34
SLIDE 34

Improving outfit recommendation with co-supervision of fashion generation

  • Explain your outcome – what were

you thinking?

  • Fashion recommendation: visual

understanding and visual matching

  • Neural co-supervision learning

framework

  • Incorporate supervision of generation

loss: better encode aesthetic information

  • Introducing a novel layer-to-layer

matching mechanism to fuse aesthetic information more effectively

  • Y. Lin et al.. Improving outfit recommendation with co-supervision of fashion generation. Under review.
25
slide-35
SLIDE 35

Finding influential training samples for gradient boosted decision trees

  • Explain your errors – which training

instances are responsible for it

  • Influence functions framework deals

with finding training points exerting the largest positive or negative influence on the model: How would the loss on xtest change if xtrain is upweighted/downweighted?

  • Can be solved for parametric and

non-parametric models (GDBT ensembles)

  • B. Sharchilev et al.. Finding influential training samples for gradient boosted decision trees. In ICML 2018.
26
slide-36
SLIDE 36

Contrastive explanations for large errors in retail forecasting predictions

Picture this: interest in using more complex machine learning techniques for sales forecasting Difficult to convince analysts, as well as their superiors, to adopt these techniques since the models are considered to be ‘black boxes’, even if they perform better than current models in use How can we understand a complex model by generating contrastive explanations about large errors in its predictions?

  • A. Lucic et al.. Contrastive explanations for large errors in retail forecasting predictions through Monte Carlo simulations.
Under review. 27
slide-37
SLIDE 37

Contrastive explanations for large errors in retail forecasting predictions

Focus on explaining large errors because people tend to be more curious about unexpected outcomes rather than ones that confirm their prior beliefs However, when users are confronted with errors in algorithmic predictions, they are less likely to use the model

  • Seeing an algorithm make mistakes significantly decreases confidence in the

model, and users are more likely to choose a human forecaster instead, even after seeing the algorithm outperform the human

  • Prediction mistakes have a significant impact on users’ perception of the model
  • A. Lucic et al.. Contrastive explanations for large errors in retail forecasting predictions through Monte Carlo simulations.
Under review. 28
slide-38
SLIDE 38

Contrastive explanations for large errors in retail forecasting predictions

Method

  • Generate contrastive explanations by identifying unusual properties of a particular
  • bservation
  • Assume that large errors occur due to unusual feature values in the test set that

were not common in the training set

  • Given that an observation results in a large error, generate a set of bounds for

each feature that would result in a prediction with a reasonable error as opposed to a large one

  • Also include the trend as part of the explanation in order to help users understand

the relationship between feature and target

  • A. Lucic et al.. Contrastive explanations for large errors in retail forecasting predictions through Monte Carlo simulations.
Under review. 29
slide-39
SLIDE 39

Contrastive explanations for large errors in retail forecasting predictions

  • A. Lucic et al.. Contrastive explanations for large errors in retail forecasting predictions through Monte Carlo simulations.
Under review. 30
slide-40
SLIDE 40

Contrastive explanations for large errors in retail forecasting predictions

  • A. Lucic et al.. Contrastive explanations for large errors in retail forecasting predictions through Monte Carlo simulations.
Under review. 30
slide-41
SLIDE 41

Contrastive explanations for large errors in retail forecasting predictions

Evaluation of explanations a big challenge

  • So far, explanations about errors are unable to solve the conflict of users wanting

to see explanations about unusual findings, yet being unsettled by seeing an algorithm make mistakes

  • A. Lucic et al.. Contrastive explanations for large errors in retail forecasting predictions through Monte Carlo simulations.
Under review. 31
slide-42
SLIDE 42

Contrastive explanations for large errors in retail forecasting predictions

Evaluation of explanations a big challenge

  • So far, explanations about errors are unable to solve the conflict of users wanting

to see explanations about unusual findings, yet being unsettled by seeing an algorithm make mistakes Yet another conflict: users can support a model’s deployment without necessarily trusting it

  • Evaluating human trust in and understanding of black-box models based on

explanations is not straightforward, especially when explanations focus on a model’s errors.

  • A. Lucic et al.. Contrastive explanations for large errors in retail forecasting predictions through Monte Carlo simulations.
Under review. 31
slide-43
SLIDE 43

The plan for this morning

Background Safety Explicability Conclusion

32
slide-44
SLIDE 44

Conclusion

32
slide-45
SLIDE 45

What have we done?

IR systems as interactive systems Systems that sense, plan, and act Interactive systems are not alone – safety, explicability, . . .

  • Don’t hurt anyone
  • Be transparent
33
slide-46
SLIDE 46

What should we do next?

Systems that are able to understand intent and track changes in intent Create better and more realistic online experimentation environments Mixed initiative systems Hybrid teams that solve problems together Good simulation methodology Human side of mixed teams Massive update to our information retrieval teaching materials

34
slide-47
SLIDE 47

References i

  • R. Jagerman, I. Markov, and M. de Rijke. Safe exploration for optimizing contextual
  • bandits. In Under review, 2019a.
  • R. Jagerman, I. Markov, and M. de Rijke. When people change their mind: Off-policy

evaluation in non-stationary recommendation environments. In WSDM 2019: 12th International Conference on Web Search and Data Mining. ACM, February 2019b.

  • T. Joachims, A. Swaminathan, and M. de Rijke. Deep learning with logged bandit
  • feedback. In ICLR 2018, April 2018.
  • C. Li, B. Kveton, T. Lattimore, I. Markov, M. de Rijke, C. Szepesv´

ari, and M. Zoghi. Safe online learning to re-rank via implicit click feedback. In Under review, 2019a.

slide-48
SLIDE 48

References ii

  • Z. Li, J. Kiseleva, and M. de Rijke. Dialogue generation: From imitation learning to

inverse reinforcement learning. In AAAI 2019: 33rd AAAI Conference on Artificial

  • Intelligence. AAAI, January 2019b.
  • Y. Lin, P. Ren, Z. Chen, Z. Ren, J. Ma, and M. de Rijke. Explainable fashion

recommendation with joint outfit matching and comment generation. IEEE Transactions on Knowledge and Data Engineering, To appear.

  • A. Lucic, H. Haned, and M. de Rijke. Contrastive explanations for large errors in retail

forecasting predictions through monte carlo simulations. In Under review, 2019.

  • I. Markov and M. de Rijke. What should we teach in information retrieval? SIGIR

Forum, 52(2):19–39, December 2018.

slide-49
SLIDE 49

References iii

  • H. Oosterhuis and M. de Rijke. Differentiable unbiased online learning to rank. In

CIKM 2018: International Conference on Information and Knowledge Management, pages 1293–1302. ACM, October 2018.

  • B. Sharchilev, Y. Ustinovsky, P. Serdyukov, and M. de Rijke. Finding influential

training samples for gradient boosted decision trees. In ICML 2018: International Conference on Machine Learning, pages 4584–4592, July 2018.

  • M. ter Hoeve, A. Schuth, D. Odijk, and M. de Rijke. Faithfully explaining rankings in a

news recommender system. In Under review, 2019.

  • N. Voskarides, E. Meij, R. Reinanda, A. Khaitan, M. Osborne, G. Stefanoni,
  • K. Prabhanjan, and M. de Rijke. Weakly-supervised contextualization of knowledge

graph facts. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, pages 765–774. ACM, July 2018.

slide-50
SLIDE 50

Acknowledgments

All content represents the opinion of the author(s), which is not necessarily shared or endorsed by their employers and/or sponsors.