How Elasticsearch powers the Guardians newsroom shay banon @kimchy - - PowerPoint PPT Presentation

how elasticsearch powers the guardian s newsroom
SMART_READER_LITE
LIVE PREVIEW

How Elasticsearch powers the Guardians newsroom shay banon @kimchy - - PowerPoint PPT Presentation

How Elasticsearch powers the Guardians newsroom shay banon @kimchy phil wills @philwills creator, co-founder and cto senior software architect elasticsearch guardian news and media created in 1936 ... to secure the financial and


slide-1
SLIDE 1

How Elasticsearch powers the Guardian’s newsroom

phil wills ■ @philwills

senior software architect guardian news and media

shay banon ■ @kimchy

creator, co-founder and cto elasticsearch

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

“created in 1936 ... to secure the financial and editorial independence of the Guardian in perpetuity”

slide-6
SLIDE 6
slide-7
SLIDE 7
  • ur in-house real-time traffic tool
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11

desktop workstation

production apaches

something htmly

?

slide-12
SLIDE 12

ssh $SERVER "nice tail -f /apache2/logs/guardian-access_log"

slide-13
SLIDE 13

desktop workstation

2 x production apaches publisher ssh “tail” zeromq

x

SEO dashboard

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

desktop workstation

x

slide-18
SLIDE 18

Javascript in browser

SNS SQS

hidden pixel

Dashboard Tracker

slide-19
SLIDE 19
slide-20
SLIDE 20

Elasticsearch

“you know, for search”

slide-21
SLIDE 21
slide-22
SLIDE 22

Javascript in browser

Tracker SNS SQS

image pixel

SQS Dashboard Serf elasticsearch Dashboard

slide-23
SLIDE 23
slide-24
SLIDE 24

https://github.com/guardian/status-app

6 * c3.4xlarge

in an autoscaling group (with manual scaling) instance store (SSD)

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27

{ "dt": "2014-06-13T20:01:48.026Z", "url": "http://www.theguardian.com/football/2014/jun/13/spain-v-holland-world-cup-2014- live-report", "queryString": "", "host": "www.theguardian.com", "path": "/football/2014/jun/13/spain-v-holland-world-cup-2014-live-report", "section": "football", "platform": "r2", "userAgent": { "type": "Browser", "family": "Safari 5.1.9", "os": "OS X 10.6.8", "device": "Personal computer" }, "documentReferrer": "http://www.theguardian.com/football", "browser": { "id": "gA6RUFLhWNQvWdt0rW4r78Fg", "isNew": false }, "referringHost": "theguardian.com", "referringPath": "/football", "isContent": true, "contentPublicationDate": "2014-03-03", "countryCode": "US", "countryName": "United States", "location": { "lonlat": [-73.4409, 41.2094] } }

⇠filter ⇠filter ⇠count per minute

slide-28
SLIDE 28

{ "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "term" : { "path": "/football/2014/jun/13/spain-v-holland-world-cup-2014-live-report" } } } }, …

slide-29
SLIDE 29

… "facets": { "Reddit": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "term": { "referringHost": "reddit.com" } } }, "Facebook": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "term": { "referringHost": "facebook.com" } } }, "Google": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "or": { "filters": [ { "prefix": { "referringHost": "www.google." } }, { "prefix": { "referringHost": "news.google." } } ] } } } } }

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32

"aggregations" : { "dns" : { "date_histogram" : { "field" : "dt", "interval" : "1m" }, "aggregations" : { "dns" : { "percentiles" : { "field" : "dns", "percents" : [ 50.0 ], "estimator" : "tdigest", "compression" : 10.0 } } } } }

slide-33
SLIDE 33

/graph/breakdown?section=commentisfree

slide-34
SLIDE 34

?section=commentisfree

  • phan.StandardFilters
  • phan.StandardFiltersToElasticsearch
  • rg.elasticsearch.index.

query.FilterBuilder

slide-35
SLIDE 35

{ "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "term" : { "path": "/football/2014/jun/13/spain-v-holland-world-cup-2014-live-report" } } } }, …

slide-36
SLIDE 36

"filter": { "and": { "filters": [ { "range": { "dt": { "from": "2014-03-03T00:00:00.000Z", "to": "2014-03-03T22:30:59.999Z", "include_lower": true, "include_upper": false } } }, { "not": { "filter": { "term": { "countryCode": "GNM" } } } }, { "not": { "filter": { "term": { "userAgent.type": "Robot" } } } }, { "filter": { "terms": { "section": [ "commentisfree" ] }} } ] } }

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40

thank you

phil wills ■ @philwills

senior software architect guardian news and media

shay banon ■ @kimchy

creator, co-founder and cto elasticsearch