Bo Peng • @bo_p
Iterative design for data science projects
for QCon San Francisco • Nov 7, 2016
Iterative design for data science projects Bo Peng @bo_p for QCon - - PowerPoint PPT Presentation
Iterative design for data science projects Bo Peng @bo_p for QCon San Francisco Nov 7, 2016 approach case study: heritage health prize Goal: Create an algorithm that predicts how many days a patient will spend in a hospital in the
Bo Peng • @bo_p
Iterative design for data science projects
for QCon San Francisco • Nov 7, 2016
http://heritagehealthprize.com
Goal: Create an algorithm that predicts how many days a patient will spend in a hospital in the next year.
case study: heritage health prize
http://heritagehealthprize.com
case study: heritage health prize
score time (in months)
constant value all zeros goal
http://heritagehealthprize.com
case study: heritage health prize
score time (in months)
constant value all zeros goal
http://heritagehealthprize.com
case study: heritage health prize
score time (in months)
constant value all zeros goal
http://heritagehealthprize.com
case study: heritage health prize
score time (in months)
constant value all zeros goal
http://heritagehealthprize.com
case study: heritage health prize
score time (in months)
constant value all zeros goal
http://heritagehealthprize.com
case study: heritage health prize
score time (in months)
constant value all zeros goal
http://heritagehealthprize.com
case study: heritage health prize
score time (in months)
constant value all zeros goal
What can we learn from this? Solving business problems can rarely be reduced to minimizing a model’s RMSE.
score time (in months)
constant value all zeros goal
Contests are fun. Solving business problems can rarely be reduced to minimizing a model’s RMSE.
score time (in months)
constant value all zeros goal
Contests are fun. Solving business problems can rarely be reduced to minimizing a model’s RMSE.
How simple models + design go a long way
data-driven e-discovery
daegis
about patent not about patent
data-driven e-discovery
daegis
about patent not about patent turn over to plaintiff don’t turn over to plaintiff
adverse inference
data-driven e-discovery
daegis
about patent not about patent turn over to plaintiff don’t turn over to plaintiff
adverse inference give away trade secrets
data-driven e-discovery
daegis
about patent not about patent turn over to plaintiff don’t turn over to plaintiff
adverse inference give away trade secrets
data-driven e-discovery
daegis
turn over to plaintiff don’t turn over to plaintiff
data-driven e-discovery
daegis
data-driven e-discovery
daegis
create a “document map”
algorithm design patents marketing finances fantasy football lunch coffee
data-driven e-discovery
daegis
create a “document map”
fantasy football algorithm design patents lunch marketing finances coffee
review away shades of grey reduce reviews by 90-99%
data-driven e-discovery
daegis
simple, powerful interfaces relay analytics better.
iterative problem solving
generate ideas build prototype evaluate rapid iterations
plan, build, test, and iterate as quickly as possible
Procter & Gamble
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
High level goals:
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
vs.
iterative problem solving
generate ideas build prototype evaluate rapid iterations
plan, build, test, and iterate as quickly as possible
High level goals:
High level goals:
let’s compare countries.
+ 1
10 5 5 20 8 25 2 5 12 3 30 10 1 20 25 50
10 5 5 20 8 25 2 5 12 3 30 10 1 20 25 50
10 5 5 20 8 25 2 5 12 3 30 10 1 20 25 50
10 5 5 20 8 25 2 5 12 3 30 10 1 20 25 50
Iterative design for data science projects
Bo Peng • @bo_p for QCon San Francisco • Thanks!