Iterative design for data science projects Bo Peng @bo_p for QCon - - PowerPoint PPT Presentation

iterative design for data science projects
SMART_READER_LITE
LIVE PREVIEW

Iterative design for data science projects Bo Peng @bo_p for QCon - - PowerPoint PPT Presentation

Iterative design for data science projects Bo Peng @bo_p for QCon San Francisco Nov 7, 2016 approach case study: heritage health prize Goal: Create an algorithm that predicts how many days a patient will spend in a hospital in the


slide-1
SLIDE 1

Bo Peng • @bo_p

Iterative design for data science projects

for QCon San Francisco • Nov 7, 2016

slide-2
SLIDE 2

http://heritagehealthprize.com

Goal: Create an algorithm that predicts how many days a patient will spend in a hospital in the next year.

case study: heritage health prize

approach

slide-3
SLIDE 3

2 1,363 25,316 years teams entries

http://heritagehealthprize.com

case study: heritage health prize

approach

slide-4
SLIDE 4

score time (in months)

constant value all zeros goal

http://heritagehealthprize.com

case study: heritage health prize

approach

slide-5
SLIDE 5

score time (in months)

constant value all zeros goal

http://heritagehealthprize.com

case study: heritage health prize

approach

slide-6
SLIDE 6

score time (in months)

constant value all zeros goal

http://heritagehealthprize.com

case study: heritage health prize

approach

slide-7
SLIDE 7

score time (in months)

constant value all zeros goal

http://heritagehealthprize.com

case study: heritage health prize

approach

slide-8
SLIDE 8

score time (in months)

constant value all zeros goal

http://heritagehealthprize.com

case study: heritage health prize

approach

slide-9
SLIDE 9

score time (in months)

constant value all zeros goal

http://heritagehealthprize.com

case study: heritage health prize

approach

slide-10
SLIDE 10

score time (in months)

constant value all zeros goal

What can we learn from this? Solving business problems can rarely be reduced to minimizing a model’s RMSE.

slide-11
SLIDE 11

score time (in months)

constant value all zeros goal

Contests are fun. Solving business problems can rarely be reduced to minimizing a model’s RMSE.

slide-12
SLIDE 12

score time (in months)

constant value all zeros goal

Contests are fun. Solving business problems can rarely be reduced to minimizing a model’s RMSE.

slide-13
SLIDE 13

agenda

  • A common approach to data science
  • The design approach:
  • a simple model goes along way (eDiscovery)
  • finding & recommending experts within P&G
slide-14
SLIDE 14

How simple models + design go a long way

Data driven e-discovery for Daegis

slide-15
SLIDE 15

data-driven e-discovery

daegis

slide-16
SLIDE 16

about patent not about patent

data-driven e-discovery

daegis

slide-17
SLIDE 17

about patent not about patent turn over to plaintiff don’t turn over to plaintiff

adverse inference

data-driven e-discovery

daegis

slide-18
SLIDE 18

about patent not about patent turn over to plaintiff don’t turn over to plaintiff

adverse inference give away trade secrets

data-driven e-discovery

daegis

slide-19
SLIDE 19

about patent not about patent turn over to plaintiff don’t turn over to plaintiff

adverse inference give away trade secrets

data-driven e-discovery

daegis

slide-20
SLIDE 20

turn over to plaintiff don’t turn over to plaintiff

data-driven e-discovery

daegis

slide-21
SLIDE 21

data-driven e-discovery

daegis

slide-22
SLIDE 22

create a “document map”

algorithm design patents marketing finances fantasy football lunch coffee

data-driven e-discovery

daegis

slide-23
SLIDE 23

create a “document map”

fantasy football algorithm design patents lunch marketing finances coffee

review away shades of grey reduce reviews by 90-99%

data-driven e-discovery

daegis

slide-24
SLIDE 24
slide-25
SLIDE 25

care about design.

simple, powerful interfaces relay analytics better.

slide-26
SLIDE 26
slide-27
SLIDE 27

iterative problem solving

generate ideas build prototype evaluate rapid iterations

plan, build, test, and iterate as quickly as possible

slide-28
SLIDE 28

Procter & Gamble

Data driven expertise exploration

slide-29
SLIDE 29

data-driven expertise exploration

procter & gamble

slide-30
SLIDE 30

data-driven expertise exploration

procter & gamble

slide-31
SLIDE 31

High level goals:

  • reveal areas of expertise
  • evaluate connectivity within experts
slide-32
SLIDE 32
slide-33
SLIDE 33

data-driven expertise exploration

procter & gamble

slide-34
SLIDE 34 Lorem Ipsum: a narrative about blankets. Author: Charlie Brown Date: 31 Jan 2012 Lorem Ipsum is a dummy text used when typesetting or marking up documents. It has a long history starting from the 1500s and is still used in digital millennium for typesetting electronic documents, page designs, etc. In itself, the original text of Lorem Ipsum might have been taken from an ancient Latin book that was written about 50 BC. Nevertheless, Lorem Ipsum’s words have been changed so they don’t read as a proper text. Naturally, page designs that are made for text documents must contain some text rather than placeholder dots or something else. However, should they contain proper English words and sentences almost every reader will deliberately try to interpret it eventually, missing the design itself. However, a placeholder text must have a natural distribution of letters and punctuation or
  • therwise the markup will look strange and unnatural. That’s what Lorem Ipsum helps to
achieve. I would like to thank Peppermint Patty for her support on studying Lorem Ipsum as well as the infinite wisdom of Linus van Pelt and his willingness to use his blanket in my experiments.

data-driven expertise exploration

procter & gamble

slide-35
SLIDE 35
slide-36
SLIDE 36

vs.

slide-37
SLIDE 37

vs.

slide-38
SLIDE 38

iterative problem solving

generate ideas build prototype evaluate rapid iterations

plan, build, test, and iterate as quickly as possible

slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41

High level goals:

  • reveal areas of expertise
  • evaluate connectivity within experts
slide-42
SLIDE 42

High level goals:

  • reveal areas of expertise
  • evaluate connectivity within experts
slide-43
SLIDE 43
slide-44
SLIDE 44

let’s compare countries.

slide-45
SLIDE 45
slide-46
SLIDE 46

+ 1

slide-47
SLIDE 47
slide-48
SLIDE 48

10 5 5 20 8 25 2 5 12 3 30 10 1 20 25 50

slide-49
SLIDE 49

10 5 5 20 8 25 2 5 12 3 30 10 1 20 25 50

slide-50
SLIDE 50

10 5 5 20 8 25 2 5 12 3 30 10 1 20 25 50

slide-51
SLIDE 51

10 5 5 20 8 25 2 5 12 3 30 10 1 20 25 50

slide-52
SLIDE 52
slide-53
SLIDE 53

design influences data science.

slide-54
SLIDE 54

care about design.

slide-55
SLIDE 55

Iterative design for data science projects

Bo Peng • @bo_p for QCon San Francisco • Thanks!