data science @ The New York Times and how a 164-year old content - - PowerPoint PPT Presentation

data science the new york times
SMART_READER_LITE
LIVE PREVIEW

data science @ The New York Times and how a 164-year old content - - PowerPoint PPT Presentation

data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins references: bit.ly/icerm data science @ The New York Times and how a 164-year old


slide-1
SLIDE 1

data science @ The New York Times

and how a 164-year old content company became data-driven

chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins

references: bit.ly/icerm

slide-2
SLIDE 2

data science @ The New York Times

and how a 164-year old content company became data-driven

references: bit.ly/icerm

slide-3
SLIDE 3

data science @ The New York Times

and how a 164-year old content company became data-driven

references: bit.ly/icerm

slide-4
SLIDE 4

data science @ The New York Times

and how a 164-year old content company became data-driven

references: bit.ly/icerm

slide-5
SLIDE 5

data science @ The New York Times

and how a 164-year old content company became data-driven

references: bit.ly/icerm

slide-6
SLIDE 6

data science @ The New York Times

and how a 164-year old content company became data-driven

references: bit.ly/icerm

slide-7
SLIDE 7

data science @ The New York Times

and how a 164-year old content company became data-driven

references: bit.ly/icerm

slide-8
SLIDE 8

“data science” jobs, jobs, jobs

references: bit.ly/icerm

slide-9
SLIDE 9

“data science” jobs, jobs, jobs

references: bit.ly/icerm

slide-10
SLIDE 10

“data science” jobs, jobs, jobs

references: bit.ly/icerm

slide-11
SLIDE 11

data science: mindset & toolset drew conway, 2010

references: bit.ly/icerm

slide-12
SLIDE 12

modern history: 2009

references: bit.ly/icerm

slide-13
SLIDE 13

“data science” blogs, blogs, blogs

references: bit.ly/icerm

slide-14
SLIDE 14

“data science” blogs, blogs, blogs

The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science.

The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along,

  • utlining an academic program similar to what we think of

as data science.

references: bit.ly/icerm

slide-15
SLIDE 15

“data science” blogs, blogs, blogs

references: bit.ly/icerm

slide-16
SLIDE 16

“data science” ancient history: 2001

references: bit.ly/icerm

slide-17
SLIDE 17

“data science” ancient history: 2001

references: bit.ly/icerm

slide-18
SLIDE 18

data science context

references: bit.ly/icerm

slide-19
SLIDE 19

home schooled

references: bit.ly/icerm

slide-20
SLIDE 20

PhD in topology

references: bit.ly/icerm

slide-21
SLIDE 21

“By the end of late 1945, I was a statistician rather than a topologist”

references: bit.ly/icerm

slide-22
SLIDE 22

invented: “bit”

references: bit.ly/icerm

slide-23
SLIDE 23

invented: “software”

references: bit.ly/icerm

slide-24
SLIDE 24

invented: “FFT”

references: bit.ly/icerm

slide-25
SLIDE 25

“the progenitor of data science.” - @mshron

references: bit.ly/icerm

slide-26
SLIDE 26

“The Future of Data Analysis,” 1962 John W. Tukey

references: bit.ly/icerm

slide-27
SLIDE 27

introduces: “Exploratory data anlaysis”

references: bit.ly/icerm

slide-28
SLIDE 28

Tukey 1965, via John Chambers

references: bit.ly/icerm

slide-29
SLIDE 29

TUKEY BEGAT S WHICH BEGAT R

references: bit.ly/icerm

slide-30
SLIDE 30

Tukey 1972

references: bit.ly/icerm

slide-31
SLIDE 31

? 1972

references: bit.ly/icerm

slide-32
SLIDE 32

Jerome H. Friedman

references: bit.ly/icerm

slide-33
SLIDE 33

Tukey 1975

In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field

  • f information design). These course materials became the

foundation for his first book on information design, The Visual Display of Quantitative Information

references: bit.ly/icerm

slide-34
SLIDE 34

TUKEY BEGAT VDQI

references: bit.ly/icerm

slide-35
SLIDE 35

Tukey 1977

references: bit.ly/icerm

slide-36
SLIDE 36

TUKEY BEGAT EDA

references: bit.ly/icerm

slide-37
SLIDE 37

fast forward -> 2001

references: bit.ly/icerm

slide-38
SLIDE 38

“The primary agents for change should be university departments themselves.”

references: bit.ly/icerm

slide-39
SLIDE 39

data science @ The New York Times

and how a 164-year old content company became data-driven

histories

  • 1. in academia -> Bell: as heretical

statistics (see also Breiman)

  • 2. in industry: as job description

historical rant: bit.ly/data-rant

slide-40
SLIDE 40

data science @ The New York Times

and how a 164-year old content company became data-driven

chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins

references: bit.ly/icerm

slide-41
SLIDE 41

biology: 1892 vs. 1995 biology changed for good.

references: bit.ly/icerm

slide-42
SLIDE 42

genetics: 1837 vs. 2012 ML toolset; data science mindset

references: bit.ly/icerm

slide-43
SLIDE 43

genetics: 1837 vs. 2012

references: bit.ly/icerm

slide-44
SLIDE 44

genetics: 1837 vs. 2012 ML toolset; data science mindset arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost

slide-45
SLIDE 45

data science: mindset & toolset

references: bit.ly/icerm

slide-46
SLIDE 46

1851

references: bit.ly/icerm

slide-47
SLIDE 47

news: 20th century church state

references: bit.ly/icerm

slide-48
SLIDE 48

church

references: bit.ly/icerm

slide-49
SLIDE 49

church

references: bit.ly/icerm

slide-50
SLIDE 50

church

slide-51
SLIDE 51

news: 20th century church state

references: bit.ly/icerm

slide-52
SLIDE 52

news: 21st century church state engineering

references: bit.ly/icerm

slide-53
SLIDE 53

1851 1996

newspapering: 1851 vs. 1996

references: bit.ly/icerm

slide-54
SLIDE 54

example: millions of views per hour

2015

slide-55
SLIDE 55

references: bit.ly/icerm

slide-56
SLIDE 56

data science: the web

references: bit.ly/icerm

slide-57
SLIDE 57

data science: the web is your “online presence”

references: bit.ly/icerm

slide-58
SLIDE 58

data science: the web is a microscope

references: bit.ly/icerm

slide-59
SLIDE 59

data science: the web is an experimental tool

references: bit.ly/icerm

slide-60
SLIDE 60

data science: the web is an optimization tool

references: bit.ly/icerm

slide-61
SLIDE 61

1851 1996

newspapering: 1851 vs. 1996 vs. 2008

2008

references: bit.ly/icerm

slide-62
SLIDE 62

“a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank

references: bit.ly/icerm

slide-63
SLIDE 63

every publisher is now a startup

references: bit.ly/icerm

slide-64
SLIDE 64
slide-65
SLIDE 65

news: 21st century church state engineering

references: bit.ly/icerm

slide-66
SLIDE 66

news: 21st century church state engineering

references: bit.ly/icerm

slide-67
SLIDE 67

learnings

references: bit.ly/icerm

slide-68
SLIDE 68

learnings

  • supervised learning
  • unsupervised learning
  • reinforcement learning

references: bit.ly/icerm

slide-69
SLIDE 69

learnings

  • supervised learning
  • unsupervised learning
  • reinforcement learning
  • cf. modelingsocialdata.org

references: bit.ly/icerm

slide-70
SLIDE 70

stats.stackexchange.com references: bit.ly/icerm

slide-71
SLIDE 71

from “are you a bayesian or a frequentist” —michael jordan

L =

N

X

i=1

ϕ (yif(xi; β)) + λ||β||

slide-72
SLIDE 72

supervised learning, e.g.,

  • cf. modelingsocialdata.org
slide-73
SLIDE 73

supervised learning, e.g., “the funnel”

  • cf. modelingsocialdata.org
slide-74
SLIDE 74

interpretable supervised learning super cool stuff

  • cf. modelingsocialdata.org
slide-75
SLIDE 75

interpretable supervised learning super cool stuff

  • cf. modelingsocialdata.org

arxiv.org/abs/q-bio/0701021

slide-76
SLIDE 76
  • ptimization & learning, e.g.,

“How The New York Times Works “popular mechanics, 2015

slide-77
SLIDE 77

recommendation as supervised learning

slide-78
SLIDE 78

unsupervised learning, e.g,

  • cf. daeilkim.com ; import bnpy
slide-79
SLIDE 79

modeling your audience bit.ly/Hughes-Kim-Sudderth-AISTATS15

slide-80
SLIDE 80

modeling your audience (optimization, ultimately)

slide-81
SLIDE 81

also allows recommendation as inference modeling your audience

slide-82
SLIDE 82

Reporting Learning Test

aka “A/B testing”; business as usual (esp. supervised)

Some of the most recognizable personalization in our service is the collection of “genre” rows. …Members connect with these rows so well that we measure an increase in member retention by placing the most tailored rows higher on the page instead of lower.

  • cf. modelingsocialdata.org

reinforcement learning: from A/B to….

slide-83
SLIDE 83

real-time A/B -> “bandits” GOOG blog:

  • cf. modelingsocialdata.org
slide-84
SLIDE 84

Reporting Learning Test Optimizing Explore unsupervised: supervised: reinforcement:

slide-85
SLIDE 85

Reporting Learning Test Optimizing Explore unsupervised: supervised: reinforcement:

slide-86
SLIDE 86

common requirements in data science:

slide-87
SLIDE 87

common requirements in data science:

  • 1. people
  • 2. ideas
  • 3. things
  • cf. USAF
slide-88
SLIDE 88

things: what does DS team deliver?

slide-89
SLIDE 89

things: what does DS team deliver?

  • build data prototypes
  • build APIs
  • impact roadmaps
slide-90
SLIDE 90
  • build data prototypes
slide-91
SLIDE 91
  • build data prototypes
  • cf. daeilkim.com
slide-92
SLIDE 92
  • build data prototypes
  • cf. daeilkim.com
slide-93
SLIDE 93
  • in puppet, w/python2.7
  • collaboration w/pers. team
  • build APIs
slide-94
SLIDE 94
  • impact roadmaps

flickr/McJex

slide-95
SLIDE 95

data science: ideas

slide-96
SLIDE 96

data skills

  • data engineering
  • data science
  • data visualization
  • data product
  • data multiliteracies
  • data embeds
  • cf. “data scientists at work”, ch 1
slide-97
SLIDE 97

data skills

  • data engineering
  • data science
  • data visualization
  • data product
  • data multiliteracies
  • data embeds
  • cf. “data scientists at work”, ch 1
slide-98
SLIDE 98

data science: people

  • new mindset > new toolset
slide-99
SLIDE 99

summary: pay attention to:

  • 1. people
  • 2. ideas
  • 3. things
  • cf. USAF
slide-100
SLIDE 100

thanks to the data science team!

slide-101
SLIDE 101

data science @ The New York Times

and how a 164-year old content company became data-driven

chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins