NewsDiffs: Version Controlling the News Eric Price Margaret - - PowerPoint PPT Presentation

newsdiffs version controlling the news
SMART_READER_LITE
LIVE PREVIEW

NewsDiffs: Version Controlling the News Eric Price Margaret - - PowerPoint PPT Presentation

NewsDiffs: Version Controlling the News Eric Price Margaret Sullivan MIT The New York Times 2013-03-11 http://newsdiffs.org/ Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 1 / 30 NewsDiffs


slide-1
SLIDE 1

NewsDiffs: Version Controlling the News

Eric Price Margaret Sullivan

MIT The New York Times

2013-03-11 http://newsdiffs.org/

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 1 / 30

slide-2
SLIDE 2

NewsDiffs

Online news is different from print.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-3
SLIDE 3

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-4
SLIDE 4

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. ◮ Online: easy to change, deadline now. Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-5
SLIDE 5

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. ◮ Online: easy to change, deadline now.

Online news articles have a lifecycle:

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-6
SLIDE 6

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. ◮ Online: easy to change, deadline now.

Online news articles have a lifecycle:

◮ Reporter writes a rushed story. Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-7
SLIDE 7

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. ◮ Online: easy to change, deadline now.

Online news articles have a lifecycle:

◮ Reporter writes a rushed story. ◮ Editor makes a pass or two. Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-8
SLIDE 8

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. ◮ Online: easy to change, deadline now.

Online news articles have a lifecycle:

◮ Reporter writes a rushed story. ◮ Editor makes a pass or two. ◮ (Another) reporter rewrites the story. Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-9
SLIDE 9

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. ◮ Online: easy to change, deadline now.

Online news articles have a lifecycle:

◮ Reporter writes a rushed story. ◮ Editor makes a pass or two. ◮ (Another) reporter rewrites the story. ◮ Editor makes another pass or two. Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-10
SLIDE 10

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. ◮ Online: easy to change, deadline now.

Online news articles have a lifecycle:

◮ Reporter writes a rushed story. ◮ Editor makes a pass or two. ◮ (Another) reporter rewrites the story. ◮ Editor makes another pass or two.

Libraries archive print version, not what people actually read.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-11
SLIDE 11

NewsDiffs

Online news is different from print.

◮ Print: hard to change, daily deadlines. ◮ Online: easy to change, deadline now.

Online news articles have a lifecycle:

◮ Reporter writes a rushed story. ◮ Editor makes a pass or two. ◮ (Another) reporter rewrites the story. ◮ Editor makes another pass or two.

Libraries archive print version, not what people actually read. NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

slide-12
SLIDE 12

NewsDiffs

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 3 / 30

slide-13
SLIDE 13

NewsDiffs

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 3 / 30

slide-14
SLIDE 14

Outline of Talk

1

Motivation and Creation

2

Case Studies

3

Future

4

Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 4 / 30

slide-15
SLIDE 15

Outline of Talk

1

Motivation and Creation

2

Case Studies

3

Future

4

Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 5 / 30

slide-16
SLIDE 16

Occupy Wall Street arrests

After allowing them

  • nto the bridge, police cut
  • ff and arrested dozens of
  • ccupy wall street demon-

strators. Lede rewritten to remove first bit. Lucky someone must have kept the old tab open! Reporter’s defense: body of article consistent. Hard to judge without access to old version.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 6 / 30

slide-17
SLIDE 17

N’kisi the telepathic parrot

Found via Language Log

N’kisi’s remarkable abilities, which are said to include telepathy, feature in the latest BBC Wildlife Magazine. 2004: BBC Science article appears

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 7 / 30

slide-18
SLIDE 18

N’kisi the telepathic parrot

Found via Language Log

N’kisi’s remarkable abilities, which are said to include telepathy, feature in the latest BBC Wildlife Magazine. 2004: BBC Science article appears 2006: “Telepathy” removed; no correction

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 7 / 30

slide-19
SLIDE 19

N’kisi the telepathic parrot

Found via Language Log

N’kisi’s remarkable abilities, which are said to include telepathy, feature in the latest BBC Wildlife Magazine. 2004: BBC Science article appears 2006: “Telepathy” removed; no correction 2007 (May): Article completely replaced

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 7 / 30

slide-20
SLIDE 20

N’kisi the telepathic parrot

Found via Language Log

N’kisi’s remarkable abilities, which are said to include telepathy, feature in the latest BBC Wildlife Magazine. 2004: BBC Science article appears 2006: “Telepathy” removed; no correction 2007 (May): Article completely replaced 2007 (August): “Correction” appears: Note: This story about animal communication has replaced an earlier one on this page which contained factual inaccuracies we were unable to correct. As a result, the

  • riginal story is no longer in our archive. It is still visible

elsewhere, via [link to WayBack Machine].

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 7 / 30

slide-21
SLIDE 21

The public editor, a year before NewsDiffs

Right now, tracking changes is not a priority at The Times. As [the new executive editor Jill Abramson] told me, it’s unrealistic to preserve an “immutable, permanent record of everything we have done.”

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 8 / 30

slide-22
SLIDE 22

NewsDiffs team

Jennifer 8. Lee Greg Price Eric Price

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 9 / 30

slide-23
SLIDE 23

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

slide-24
SLIDE 24

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

slide-25
SLIDE 25

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

slide-26
SLIDE 26

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

slide-27
SLIDE 27

Knight-Mozilla Open News Hackathon 27 hours of furious coding

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

slide-28
SLIDE 28

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

slide-29
SLIDE 29

A permanent record is feasible

Recall The Times’s statement: [I]t’s unrealistic to preserve an “immutable, permanent record

  • f everything we have done.”

Wikipedia does it. Version control is a solved problem. We did it in one* weekend, from the outside.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 11 / 30

slide-30
SLIDE 30

Technical overview

Scraper BeautifulSoup parser www.nytimes.com MySQL Database of Article URLs nytimes.com/2013/...ating.html nytimes.com/2013/...-jail.html Git repository

  • f text of all articles

Website Django Google diff-match-patch You

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 12 / 30

slide-31
SLIDE 31

*Not quite one weekend

Another day of work after each of 3, 10, 22 weeks. Scaling issues

◮ Running on AFS, a networked file system ◮ Moved version metadata from git to MySQL. ◮ Optimized queries to both backends

UI improvements.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 13 / 30

slide-32
SLIDE 32

Press

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 14 / 30

slide-33
SLIDE 33

Press

[A] more comprehensive archive that retains all significant versions of an article (and all corrections) would send readers a strong message that The Times is committed to full transparency and accountability. [...] As NewsDiffs demonstrates, if you don’t make yourself accountable nowadays, someone else will do it for you.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 15 / 30

slide-34
SLIDE 34

Press

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 16 / 30

slide-35
SLIDE 35

Press

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 16 / 30

slide-36
SLIDE 36

Press

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 16 / 30

slide-37
SLIDE 37

Easy to extend

We’re tracking the New York Times, CNN, BBC, Politico. To track another site, need to write code to extract plain text from webpage. 30-40 lines of code; takes maybe one hour.

◮ But resource constraints: running on free MIT servers out of my

account.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 17 / 30

slide-38
SLIDE 38

NewsDiffs is Free Software

Forks: http://redactado.com.ar/

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 18 / 30

slide-39
SLIDE 39

NewsDiffs is Free Software

Forks: http://redactado.com.ar/

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 18 / 30

slide-40
SLIDE 40

NewsDiffs is Free Software

Forks: http://newsdiffs.es/

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 18 / 30

slide-41
SLIDE 41

NewsDiffs is Free Software

Forks: http://newsdiffs.es/ Patches:

◮ Received (and merged) patch to parse tagesschau.de. Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 18 / 30

slide-42
SLIDE 42

Statistics

Tracking 28000 NYT articles (62000 over all sources). 44% of articles changed at least once.

◮ 20-30% in opinion, books, fashion sections ◮ 55-60% in sports, NY region, world sections

15% of articles changed at least three times. 9% have official corrections. 4% have byline changes.

◮ 11% in world section. Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 19 / 30

slide-43
SLIDE 43

Outline of Talk

1

Motivation and Creation

2

Case Studies

3

Future

4

Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 20 / 30

slide-44
SLIDE 44

Examples: Nuclear Talks with Iran

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 21 / 30

slide-45
SLIDE 45

Examples: Sandy Hook Shooting

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 22 / 30

slide-46
SLIDE 46

Examples: Edward Koch Obituary

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 23 / 30

slide-47
SLIDE 47

Examples: Romney and Benghazi

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 24 / 30

slide-48
SLIDE 48

Examples: Romney and Benghazi

In first version: For a country looking to understand how Mr. Romney, a Republican candidate with no foreign policy experience, would respond to a major crisis, this was a first glimpse. And as an adviser to the campaign who worked in the George

  • W. Bush administration said on Wednesday, Mr. Romney’s

accusation [...] looked like “he had forgotten the first rule in a crisis: don’t start talking before you understand what’s happening.”

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 25 / 30

slide-49
SLIDE 49

Outline of Talk

1

Motivation and Creation

2

Case Studies

3

Future

4

Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 26 / 30

slide-50
SLIDE 50

Goals

Goals for NewsDiffs

1

Reference for known interesting changes.

2

Unearth interesting changes.

3

Study the process of journalistic editing.

Currently only satisfying (1) well. To satisfy the others, need

◮ Automated tools to sift through the changes for interesting ones. ◮ Someone to use our data for research Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 27 / 30

slide-51
SLIDE 51

Simple example of study

That vs. Who

The council could choose a pope that all factions would recognize. The council could choose a pope whom all factions would recognize.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

slide-52
SLIDE 52

Simple example of study

That vs. Who

The council could choose a pope that all factions would recognize. The council could choose a pope whom all factions would recognize.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

slide-53
SLIDE 53

Simple example of study

That vs. Who

The council could choose a pope that all factions would recognize. The council could choose a pope whom all factions would recognize. Rule: “who” refers to people, “that” to non-people.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

slide-54
SLIDE 54

Simple example of study

That vs. Who

The council could choose a pope that all factions would recognize. The council could choose a pope whom all factions would recognize. Rule: “who” refers to people, “that” to non-people. When do reporters make mistakes that editors catch?

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

slide-55
SLIDE 55

Simple example of study

That vs. Who

The council could choose a pope that all factions would recognize. The council could choose a pope whom all factions would recognize. Rule: “who” refers to people, “that” to non-people. When do reporters make mistakes that editors catch? Mitik, the baby walrus who was orphaned

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

slide-56
SLIDE 56

Simple example of study

That vs. Who

The council could choose a pope that all factions would recognize. The council could choose a pope whom all factions would recognize. Rule: “who” refers to people, “that” to non-people. When do reporters make mistakes that editors catch? Mitik, the baby walrus who was orphaned The giant panda cub who died

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

slide-57
SLIDE 57

Simple example of study

That vs. Who

The council could choose a pope that all factions would recognize. The council could choose a pope whom all factions would recognize. Rule: “who” refers to people, “that” to non-people. When do reporters make mistakes that editors catch? Mitik, the baby walrus who was orphaned The giant panda cub who died The subatomic analogue of cats who are alive and dead at the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

slide-58
SLIDE 58

Simple example of study

That vs. Who

The council could choose a pope that all factions would recognize. The council could choose a pope whom all factions would recognize. Rule: “who” refers to people, “that” to non-people. When do reporters make mistakes that editors catch? Mitik, the baby walrus that was orphaned The giant panda cub that died The subatomic analogue of cats that are alive and dead at the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

slide-59
SLIDE 59

Conclusions

News websites should keep a public record of what they publish. In the meantime, NewsDiffs fills the role. We have lots of data, ready to be mined for useful information.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 29 / 30

slide-60
SLIDE 60

Outline of Talk

1

Motivation and Creation

2

Case Studies

3

Future

4

Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 30 / 30