SLIDE 1 data science @ The New York Times
and how a 164-year old content company became data-driven
chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins
references: bit.ly/icerm
SLIDE 2 data science @ The New York Times
and how a 164-year old content company became data-driven
references: bit.ly/icerm
SLIDE 3 data science @ The New York Times
and how a 164-year old content company became data-driven
references: bit.ly/icerm
SLIDE 4 data science @ The New York Times
and how a 164-year old content company became data-driven
references: bit.ly/icerm
SLIDE 5 data science @ The New York Times
and how a 164-year old content company became data-driven
references: bit.ly/icerm
SLIDE 6 data science @ The New York Times
and how a 164-year old content company became data-driven
references: bit.ly/icerm
SLIDE 7 data science @ The New York Times
and how a 164-year old content company became data-driven
references: bit.ly/icerm
SLIDE 8
“data science” jobs, jobs, jobs
references: bit.ly/icerm
SLIDE 9
“data science” jobs, jobs, jobs
references: bit.ly/icerm
SLIDE 10
“data science” jobs, jobs, jobs
references: bit.ly/icerm
SLIDE 11
data science: mindset & toolset drew conway, 2010
references: bit.ly/icerm
SLIDE 12
modern history: 2009
references: bit.ly/icerm
SLIDE 13
“data science” blogs, blogs, blogs
references: bit.ly/icerm
SLIDE 14 “data science” blogs, blogs, blogs
The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science.
The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along,
- utlining an academic program similar to what we think of
as data science.
references: bit.ly/icerm
SLIDE 15
“data science” blogs, blogs, blogs
references: bit.ly/icerm
SLIDE 16
“data science” ancient history: 2001
references: bit.ly/icerm
SLIDE 17
“data science” ancient history: 2001
references: bit.ly/icerm
SLIDE 18
data science context
references: bit.ly/icerm
SLIDE 19
home schooled
references: bit.ly/icerm
SLIDE 20
PhD in topology
references: bit.ly/icerm
SLIDE 21
“By the end of late 1945, I was a statistician rather than a topologist”
references: bit.ly/icerm
SLIDE 22
invented: “bit”
references: bit.ly/icerm
SLIDE 23
invented: “software”
references: bit.ly/icerm
SLIDE 24
invented: “FFT”
references: bit.ly/icerm
SLIDE 25
“the progenitor of data science.” - @mshron
references: bit.ly/icerm
SLIDE 26
“The Future of Data Analysis,” 1962 John W. Tukey
references: bit.ly/icerm
SLIDE 27
introduces: “Exploratory data anlaysis”
references: bit.ly/icerm
SLIDE 28
Tukey 1965, via John Chambers
references: bit.ly/icerm
SLIDE 29
TUKEY BEGAT S WHICH BEGAT R
references: bit.ly/icerm
SLIDE 30
Tukey 1972
references: bit.ly/icerm
SLIDE 31
? 1972
references: bit.ly/icerm
SLIDE 32
Jerome H. Friedman
references: bit.ly/icerm
SLIDE 33 Tukey 1975
In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field
- f information design). These course materials became the
foundation for his first book on information design, The Visual Display of Quantitative Information
references: bit.ly/icerm
SLIDE 34
TUKEY BEGAT VDQI
references: bit.ly/icerm
SLIDE 35
Tukey 1977
references: bit.ly/icerm
SLIDE 36
TUKEY BEGAT EDA
references: bit.ly/icerm
SLIDE 37
fast forward -> 2001
references: bit.ly/icerm
SLIDE 38
“The primary agents for change should be university departments themselves.”
references: bit.ly/icerm
SLIDE 39 data science @ The New York Times
and how a 164-year old content company became data-driven
histories
- 1. in academia -> Bell: as heretical
statistics (see also Breiman)
- 2. in industry: as job description
historical rant: bit.ly/data-rant
SLIDE 40 data science @ The New York Times
and how a 164-year old content company became data-driven
chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins
references: bit.ly/icerm
SLIDE 41
biology: 1892 vs. 1995 biology changed for good.
references: bit.ly/icerm
SLIDE 42
genetics: 1837 vs. 2012 ML toolset; data science mindset
references: bit.ly/icerm
SLIDE 43
genetics: 1837 vs. 2012
references: bit.ly/icerm
SLIDE 44
genetics: 1837 vs. 2012 ML toolset; data science mindset arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost
SLIDE 45
data science: mindset & toolset
references: bit.ly/icerm
SLIDE 46 1851
references: bit.ly/icerm
SLIDE 47
news: 20th century church state
references: bit.ly/icerm
SLIDE 48
church
references: bit.ly/icerm
SLIDE 49
church
references: bit.ly/icerm
SLIDE 50
church
SLIDE 51
news: 20th century church state
references: bit.ly/icerm
SLIDE 52
news: 21st century church state engineering
references: bit.ly/icerm
SLIDE 53 1851 1996
newspapering: 1851 vs. 1996
references: bit.ly/icerm
SLIDE 54
example: millions of views per hour
2015
SLIDE 55
references: bit.ly/icerm
SLIDE 56
data science: the web
references: bit.ly/icerm
SLIDE 57
data science: the web is your “online presence”
references: bit.ly/icerm
SLIDE 58
data science: the web is a microscope
references: bit.ly/icerm
SLIDE 59
data science: the web is an experimental tool
references: bit.ly/icerm
SLIDE 60
data science: the web is an optimization tool
references: bit.ly/icerm
SLIDE 61 1851 1996
newspapering: 1851 vs. 1996 vs. 2008
2008
references: bit.ly/icerm
SLIDE 62 “a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank
references: bit.ly/icerm
SLIDE 63
every publisher is now a startup
references: bit.ly/icerm
SLIDE 64
SLIDE 65
news: 21st century church state engineering
references: bit.ly/icerm
SLIDE 66
news: 21st century church state engineering
references: bit.ly/icerm
SLIDE 67
learnings
references: bit.ly/icerm
SLIDE 68 learnings
- supervised learning
- unsupervised learning
- reinforcement learning
references: bit.ly/icerm
SLIDE 69 learnings
- supervised learning
- unsupervised learning
- reinforcement learning
- cf. modelingsocialdata.org
references: bit.ly/icerm
SLIDE 70
stats.stackexchange.com references: bit.ly/icerm
SLIDE 71 from “are you a bayesian or a frequentist” —michael jordan
L =
N
X
i=1
ϕ (yif(xi; β)) + λ||β||
SLIDE 72 supervised learning, e.g.,
- cf. modelingsocialdata.org
SLIDE 73 supervised learning, e.g., “the funnel”
- cf. modelingsocialdata.org
SLIDE 74 interpretable supervised learning super cool stuff
- cf. modelingsocialdata.org
SLIDE 75 interpretable supervised learning super cool stuff
- cf. modelingsocialdata.org
arxiv.org/abs/q-bio/0701021
SLIDE 76
- ptimization & learning, e.g.,
“How The New York Times Works “popular mechanics, 2015
SLIDE 77
recommendation as supervised learning
SLIDE 78 unsupervised learning, e.g,
- cf. daeilkim.com ; import bnpy
SLIDE 79
modeling your audience bit.ly/Hughes-Kim-Sudderth-AISTATS15
SLIDE 80
modeling your audience (optimization, ultimately)
SLIDE 81
also allows recommendation as inference modeling your audience
SLIDE 82 Reporting Learning Test
aka “A/B testing”; business as usual (esp. supervised)
Some of the most recognizable personalization in our service is the collection of “genre” rows. …Members connect with these rows so well that we measure an increase in member retention by placing the most tailored rows higher on the page instead of lower.
- cf. modelingsocialdata.org
reinforcement learning: from A/B to….
SLIDE 83 real-time A/B -> “bandits” GOOG blog:
- cf. modelingsocialdata.org
SLIDE 84
Reporting Learning Test Optimizing Explore unsupervised: supervised: reinforcement:
SLIDE 85
Reporting Learning Test Optimizing Explore unsupervised: supervised: reinforcement:
SLIDE 86
common requirements in data science:
SLIDE 87 common requirements in data science:
- 1. people
- 2. ideas
- 3. things
- cf. USAF
SLIDE 88
things: what does DS team deliver?
SLIDE 89 things: what does DS team deliver?
- build data prototypes
- build APIs
- impact roadmaps
SLIDE 91
- build data prototypes
- cf. daeilkim.com
SLIDE 92
- build data prototypes
- cf. daeilkim.com
SLIDE 93
- in puppet, w/python2.7
- collaboration w/pers. team
- build APIs
SLIDE 95
data science: ideas
SLIDE 96 data skills
- data engineering
- data science
- data visualization
- data product
- data multiliteracies
- data embeds
- cf. “data scientists at work”, ch 1
SLIDE 97 data skills
- data engineering
- data science
- data visualization
- data product
- data multiliteracies
- data embeds
- cf. “data scientists at work”, ch 1
SLIDE 98 data science: people
- new mindset > new toolset
SLIDE 99 summary: pay attention to:
- 1. people
- 2. ideas
- 3. things
- cf. USAF
SLIDE 100
thanks to the data science team!
SLIDE 101 data science @ The New York Times
and how a 164-year old content company became data-driven
chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins