Retrieval as Interaction
African Summer School on Machine Learning for Data Mining and Search
Maarten de Rijke January 14, 2019
University of Amsterdam derijke@uva.nl
Retrieval as Interaction African Summer School on Machine Learning - - PowerPoint PPT Presentation
Retrieval as Interaction African Summer School on Machine Learning for Data Mining and Search Maarten de Rijke January 14, 2019 University of Amsterdam derijke@uva.nl Based on joint work with Abhinav Khaitan, Ana Lucic, Anne Schuth, Boris
Retrieval as Interaction
African Summer School on Machine Learning for Data Mining and Search
Maarten de Rijke January 14, 2019
University of Amsterdam derijke@uva.nlBased on joint work with Abhinav Khaitan, Ana Lucic, Anne Schuth, Boris Sharchilev, Branislav Kveton, Chang Li, Csaba Szepesv´ ari, Daan Odijk, Edgar Meij, Giorgio Stefanoni, Harrie Oosterhuis, Hinda Haned, Ilya Markov, Julia Kiseleva, Jun Ma, Kambadur Prabhanjan, Maartje ter Hoeve, Masrour Zoghi, Miles Osborne, Nikos Voskarides, Pavel Serdyukov, Pengjie Ren, Ridho Reinanda, Rolf Jagerman, Tor Lattimore, Yujie Lin, Yury Ustinovskiy, Zhaochun Ren, Ziming Li, and Zhumin Chen
1Background
1We need information to make decisions . . .
. . . to identify or structure a problem or opportunity . . . to put problem or opportunity in context . . . to generate alternative solutions . . . to choose the best alternative
2Information retrieval
Getting the right information
Information retrieval – Two phases
4Information retrieval – Two phases
Online development Offmine development 5Information retrieval – Two phases
Online development Offmine development 6Information retrieval – The online phase
Retrieval system agent User environment examine document list generate implicit feedback evaluation measure reward document list action query state implicit feedback 7How does it all fit together?
A “spaghetti” picture for search
Logs Logs Logs UX Front door Online top-k retrievers vertical rankers blender index index index scheduler Offline source source crawl/ingest extraction aggregation enriching source indexing Evaluation framework interleavingHow does the AFIRM program fit?
8What does the offline phase mean?
A learning process
What does this mean for machines?
Sense – Plan – Act
10What does this mean for machines?
Understand and track intent
11What does this mean for machines?
Understand and track intent Update models and space of possible actions (answer, ranked list, SERP, . . . )
11What does this mean for machines?
Understand and track intent Update models and space of possible actions (answer, ranked list, SERP, . . . ) Select the best action and sense its effect
11What does this mean for machines?
Life is easier for systems than in an offline trained query-response paradigm
What does this mean for machines?
Life is easier for systems than in an offline trained query-response paradigm
Life is harder for systems than in an offline trained query-response paradigm
Unpacking
The plan for this morning
Background Safety Explicability Conclusion
14Safety
14Safety
Don’t perform worse than a reasonable baseline, e.g., production system people are used to Don’t take too long to learn to improve Don’t leave anyone behind & give everyone a fair deal Don’t fall into sinkholes – be diverse . . .
15When people change their mind
interaction to estimate performance
preferences change over time
bias in traditional IPS
closely follow actual performance of recommender on LastFM
approximate actual performance
200,000 400,000 600,000 800,000 1,000,000 Time t 0.0 0.2 0.4 Reward 0.0 0.2 0.4 Reward 0.0 0.2 0.4 Reward True VαIPS VIPS Adaptive VαIPSSafe online learning to re-rank via implicit click feedback
101 102 103 104 105 106 Step n 100 101 102 103 104 105 Regret PBMsetting
scratch but by combining strengths of
learned online) and improve it online by gradually swapping high-ranked less attractive for low-ranked more attractive ones
Deep learning with logged bandit feedback
training data
a running system – orders of magnitude more data
approach using an equivariant empirical risk estimator with variance regularization
way that allows stochastic gradient descent training
8 9 10 11 12 13 14 15 50000 100000 150000 200000 250000 Error Rate (test) Number of Bandit-Feedback Examples Bandit-ResNet FullInfo ResNet with CrossEDialogue generation: From imitation learning to inverse reinforcement learning
informative and engaging
that provides a more accurate and precise reward signal for generator training
causal entropy regularization
Differentiable unbiased online learning to rank
Dueling Bandit Gradient Descent – first
Learns slow; hits a ceiling; fails to
PDGD – Unbiased, differentiable, able to
The plan for this morning
Background Safety Explicability Conclusion
21Explicability
21Explicability
Are we “the patient” or “the doctor”? Are we the subject or the object of the interventions? Explicability
→ Generate an explanation
→ Especially when things go wrong
22Faithfully explaining rankings in a news recommender system
main features responsible for list
perturbing their values and by measuring to what degree the ranking changes due to the changes
learns explanations generated by this method and is sufficiently efficient to run in a production environment Explanations are faithful, real-time and do not negatively impact engagement
Weakly-supervised contextualization of knowledge graph facts
Generate training data automatically using distant supervision
necessarily how you got to it
knowledge graph, but offering additional contextual facts
in the neighborhood of a given fact and then rank candidates using supervised learning to rank
with a set of hand-crafted features
Improving outfit recommendation with co-supervision of fashion generation
you thinking?
understanding and visual matching
framework
loss: better encode aesthetic information
matching mechanism to fuse aesthetic information more effectively
Finding influential training samples for gradient boosted decision trees
instances are responsible for it
with finding training points exerting the largest positive or negative influence on the model: How would the loss on xtest change if xtrain is upweighted/downweighted?
non-parametric models (GDBT ensembles)
Contrastive explanations for large errors in retail forecasting predictions
Picture this: interest in using more complex machine learning techniques for sales forecasting Difficult to convince analysts, as well as their superiors, to adopt these techniques since the models are considered to be ‘black boxes’, even if they perform better than current models in use How can we understand a complex model by generating contrastive explanations about large errors in its predictions?
Contrastive explanations for large errors in retail forecasting predictions
Focus on explaining large errors because people tend to be more curious about unexpected outcomes rather than ones that confirm their prior beliefs However, when users are confronted with errors in algorithmic predictions, they are less likely to use the model
model, and users are more likely to choose a human forecaster instead, even after seeing the algorithm outperform the human
Contrastive explanations for large errors in retail forecasting predictions
Method
were not common in the training set
each feature that would result in a prediction with a reasonable error as opposed to a large one
the relationship between feature and target
Contrastive explanations for large errors in retail forecasting predictions
Contrastive explanations for large errors in retail forecasting predictions
Contrastive explanations for large errors in retail forecasting predictions
Evaluation of explanations a big challenge
to see explanations about unusual findings, yet being unsettled by seeing an algorithm make mistakes
Contrastive explanations for large errors in retail forecasting predictions
Evaluation of explanations a big challenge
to see explanations about unusual findings, yet being unsettled by seeing an algorithm make mistakes Yet another conflict: users can support a model’s deployment without necessarily trusting it
explanations is not straightforward, especially when explanations focus on a model’s errors.
The plan for this morning
Background Safety Explicability Conclusion
32Conclusion
32What have we done?
IR systems as interactive systems Systems that sense, plan, and act Interactive systems are not alone – safety, explicability, . . .
What should we do next?
Systems that are able to understand intent and track changes in intent Create better and more realistic online experimentation environments Mixed initiative systems Hybrid teams that solve problems together Good simulation methodology Human side of mixed teams Massive update to our information retrieval teaching materials
34References i
evaluation in non-stationary recommendation environments. In WSDM 2019: 12th International Conference on Web Search and Data Mining. ACM, February 2019b.
ari, and M. Zoghi. Safe online learning to re-rank via implicit click feedback. In Under review, 2019a.
References ii
inverse reinforcement learning. In AAAI 2019: 33rd AAAI Conference on Artificial
recommendation with joint outfit matching and comment generation. IEEE Transactions on Knowledge and Data Engineering, To appear.
forecasting predictions through monte carlo simulations. In Under review, 2019.
Forum, 52(2):19–39, December 2018.
References iii
CIKM 2018: International Conference on Information and Knowledge Management, pages 1293–1302. ACM, October 2018.
training samples for gradient boosted decision trees. In ICML 2018: International Conference on Machine Learning, pages 4584–4592, July 2018.
news recommender system. In Under review, 2019.
graph facts. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, pages 765–774. ACM, July 2018.
Acknowledgments
All content represents the opinion of the author(s), which is not necessarily shared or endorsed by their employers and/or sponsors.