Decision-making Bias in Instance Matching Model Selection
Mayank Kejriwal, Daniel P . Miranker Acknowledgements: US National Science Foundation, Microsoft Research
Decision-making Bias in Instance Matching Model Selection Mayank - - PowerPoint PPT Presentation
Decision-making Bias in Instance Matching Model Selection Mayank Kejriwal, Daniel P . Miranker Acknowledgements: US National Science Foundation, Microsoft Research Instance Matching 50+ year old Artificial Intelligence problem When do
Mayank Kejriwal, Daniel P . Miranker Acknowledgements: US National Science Foundation, Microsoft Research
50+ year old Artificial Intelligence problem When do two entities refer to the same underlying entity? “Record linkage: making maximum use of the discriminating power of identifying information.” Newcombe and Kennedy (1962) Numerous surveys by Winkler (2006), Rahm et al. (2010) etc. 2
3
Classifier example: feedforward multilayer perceptron (MLP)
“Machine Learning: an artificial intelligence approach.” Michalski, Carbonell and Mitchell (2013)
4
validation
layers, nodes, learning rate...)
“Machine Learning: an artificial intelligence approach.” Michalski, Carbonell and Mitchell (2013)
What percentage of labeled data should I
use for training and what percentage for validation?
“Machine Learning: an artificial intelligence approach.” Michalski, Carbonell and Mitchell (2013) 5
Most common approach in the literature is a ten-fold
split (and less often, two-fold)
What if I care more about one performance metric (say
recall, versus precision) within reasonable constraints?
What if I have sampled and labeled a lot of data (say 90%
Should answers to these questions (and others) bias my
decision?
“Semi-supervised instance matching using boosted classifiers.” Kejriwal and Miranker (2015) 6
Labeled Data (as percentage of ground-truth) Precision Recall 10% 54.13% 25.77% 50% 61.51% 28.77% 90% 73.27% 27.69% 10% 45.47% 35.64% 50% 55.50% 34.92% 90% 66.67% 36.92% Ten-fold split Two-fold split Results for the Amazon-GoogleProducts benchmark, using MLP Consistent results across two other benchmarks, and several experimental controls...
7
What if I care more about recall than precision? I should choose a two-fold split (unlike what the
literature would suggest)
What if I have sampled and labeled a lot of data(say
90% of the estimated ground-truth?)
An irrelevant concern, once the metric is
specified
8
Takeaway: Some model selection decisions can bias other model selection decisions, not always in an obvious way
9
Cognitive psychology has shown (empirically) that
human beings are neither logical nor rational
Wason Selection Task Prospect Theory (awarded the 2002 Nobel Prize for
Economics)
“Reasoning about a rule.” Wason (1968) “The logic of social exchange: Has natural selection shaped how humans reason? Studies with the Wason selection task.” Cosmides (1989) “Propsect theory: an analysis of decision under risk.” Kahneman and Tversky (1979) 10
Visualizing decision-making biases through capturing
influences between decisions Labeling budget Computational resources Training/ Validation split Performance Metric
11
Decision
“Bipartite graphs and their applications.” Asratian et al. (1998)
Labeling budget Computational resources Training/ Validation split Performance Metric Node of influence
12
The interpretation of the nodes and edges is abstract (we don’t impose strict requirements)
The art in model selection: are there edges we should
consider removing/adding?
In the paper, we form at least four hypotheses that
directly translate to recommendations Labeling budget Computational resources Training/ Validation split Performance Metric
13
14
Collected over 25 GB of data on the Microsoft Azure ML platform Used three publicly available benchmarks
15
Validation is usually much faster than training,
especially for expressive classifiers
Run-time reductions of almost 70% with proportionally less
loss in effectiveness
Recommendation: consider favoring more validation over
training if speed is an important concern
16
Validation is usually much faster than training,
especially for expressive classifiers
Grid search is no more effective than random search for
default hyperparameter values
Mean difference less than 0.99% and not statistically
significant
Recommendation: Favor random search in your
hyperparameter optimization as it is much faster (over 90% run-time decrease)
17
Hard problems (e.g. instance matching) require an
ingenious combination of heuristics, biases and models
Understanding decision-making biases can help us do
better model selection
Can also help to identify experimental confounds!
There are many proposals to visualize decision-making,
but not decision-making bias
We proposed a bipartite graph as a good candidate
The visualization is not just a pedantic exercise
About 25 GB of data shows that it can also be useful
Many future directions!
kejriwalresearch.azurewebsites.net
18
https://sites.google.com/a/utexas.edu/mayank- kejriwal/projects/semantics-and-model-selection
19