Wrangling Logs with Logstash and ElasticSearch Nate Jones & - - PowerPoint PPT Presentation

wrangling logs with logstash and elasticsearch
SMART_READER_LITE
LIVE PREVIEW

Wrangling Logs with Logstash and ElasticSearch Nate Jones & - - PowerPoint PPT Presentation

Wrangling Logs with Logstash and ElasticSearch Nate Jones & David Castro Media Temple OSCON 2012 Thursday, July 19, 12 Why are we here? Thursday, July 19, 12 Size Quantity Efficiency Thursday, July 19, 12 Access Locality Method


slide-1
SLIDE 1

Wrangling Logs with Logstash and ElasticSearch

Nate Jones & David Castro Media Temple OSCON 2012

Thursday, July 19, 12

slide-2
SLIDE 2

Why are we here?

Thursday, July 19, 12

slide-3
SLIDE 3

Size

Efficiency Quantity

Thursday, July 19, 12

slide-4
SLIDE 4

Access

Locality Method Filtering

Thursday, July 19, 12

slide-5
SLIDE 5

Grokability

Noise Structure Metrics

Thursday, July 19, 12

slide-6
SLIDE 6

Use Case: Mail Logs

Thursday, July 19, 12

slide-7
SLIDE 7

Size

30 mail servers 2G logs / day / server 60GB / day total 1.8 TB / month 21 TB / year 1 billion log lines per week

Thursday, July 19, 12

slide-8
SLIDE 8

Access

Front-line, easy access No SSH Shareable

Thursday, July 19, 12

slide-9
SLIDE 9

Grokability

Operational

Did the email get delivered? Why was the message marked as SPAM? Are messages being rejected?

Metrics

What's the inbound/outbound message rate? How often are we seeing particular errors?

Thursday, July 19, 12

slide-10
SLIDE 10

The Solution

Thursday, July 19, 12

slide-11
SLIDE 11

Overview

Thursday, July 19, 12

slide-12
SLIDE 12

Overview

Thursday, July 19, 12

slide-13
SLIDE 13

Logstash Overview

http://logsta.sh/

  • 1. Parse log line
  • 2. Transform/extract
  • 3. Structure and send JSON

Thursday, July 19, 12

slide-14
SLIDE 14

Logstash Parsing

Log line input

2012-07-10T20:00:02.446220-04:00 mail01 spamd[2478]: spamd: clean message (-3.4/5.0) for nobody:93 in 0.0 seconds, 886 bytes.

JSON output

{ "@timestamp" : "2012-07-16T06:44:00.548000Z", "@tags" : [], "@fields" : {}, "@source_path" : "/client/127.0.0.1:40010", "@source" : "tcp://0.0.0.0:6999/client/127.0.0.1:40010", "@source_host" : "0.0.0.0", "@message" : "2012-07-10T20:00:02.446220-04:00 mail01 spamd[2478]: spamd: clean message (-3.4/5.0) for nobody:93 in 0.0 seconds, 886 bytes.", "@type" : "maillog" }

Thursday, July 19, 12

slide-15
SLIDE 15

grok { type => "maillog" pattern => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:host} %{SYSLOGPROG:service}: %{GREEDYDATA:message}" } mutate { type => "maillog" # replace the timestamp, correcting import timestamp replace => ["@timestamp", "%{timestamp}"] # replace the message sans-timestamp/host/service replace => ["@message", "%{message}"] }

Logstash Parsing

Thursday, July 19, 12

slide-16
SLIDE 16

{ "@timestamp" : "2012-07-10T20:00:02.446220-04:00", "@tags" : [], "@fields" : { "pid" : [ "2478" ], "service" : [ "spamd[2478]" ], "program" : [ "spamd" ], "host" : [ "mail01" ] }, "@source_path" : "/client/127.0.0.1:39998", "@source" : "tcp://0.0.0.0:6999/client/127.0.0.1:39998", "@source_host" : "0.0.0.0", "@message" : "spamd: clean message (-3.4/5.0) for nobody:93 in 0.0 seconds, 886 bytes.", "@type" : "maillog" }

Logstash Parsing

Thursday, July 19, 12

slide-17
SLIDE 17

RabbitMQ Overview

http://www.rabbitmq.com/ Message Queue AMQP Clustered

Thursday, July 19, 12

slide-18
SLIDE 18

Elasticsearch Intro

http://www.elasticsearch.org/ Index in Lucene shards Cluster-able Fault tolerant

Thursday, July 19, 12

slide-19
SLIDE 19

Elasticsearch Head

Thursday, July 19, 12

slide-20
SLIDE 20

Elasticsearch Browser

Thursday, July 19, 12

slide-21
SLIDE 21

Kibana Intro

http://rashidkpc.github.com/Kibana/ User friendly front-end to elasticsearch Search log lines Graph, score, trend Streaming dashboard

Thursday, July 19, 12

slide-22
SLIDE 22

Kibana Queries

Question

How many errors of a particular type are we seeing in the logs?

Query

@message:"Permission Denied"

Thursday, July 19, 12

slide-23
SLIDE 23

Kibana Queries

Thursday, July 19, 12

slide-24
SLIDE 24

Question

Why did the mail for user X get marked as SPAM?

Query

@message:"domain.com" AND @message:"X-SPAM"

Kibana Queries

Thursday, July 19, 12

slide-25
SLIDE 25

Kibana Queries

Thursday, July 19, 12

slide-26
SLIDE 26

Question

How many messages are being rejected due to the sending host being listed in an RBL?

Query

@message:"zen.spamhaus.org"

Kibana Queries

Thursday, July 19, 12

slide-27
SLIDE 27

Kibana Queries

Thursday, July 19, 12

slide-28
SLIDE 28

Question

How many log messages do we have for a specific mail host?

Query

@source_host:"n31"

Kibana Queries

Thursday, July 19, 12

slide-29
SLIDE 29

Kibana Queries

Thursday, July 19, 12

slide-30
SLIDE 30

Report Card

Thursday, July 19, 12

slide-31
SLIDE 31

Size

Efficiency Quantity

Thursday, July 19, 12

slide-32
SLIDE 32

Access

Locality Method Filtering

Thursday, July 19, 12

slide-33
SLIDE 33

Grokability

Noise Structure Metrics

Thursday, July 19, 12

slide-34
SLIDE 34

Next Steps

Push more stats into graphite Further breaking down log messages More stuff

Thursday, July 19, 12

slide-35
SLIDE 35

Everything you need

Instructions and software http://logwrangler.mtcode.com/ Puppet code and slides http://github.com/mediatemple/logwrangler Local wifi share: logwrangler (guest/guest)

Thursday, July 19, 12

slide-36
SLIDE 36

Demo

Netcat port for Logstash RabbitMQ Elasticsearch Kibana

Thursday, July 19, 12

slide-37
SLIDE 37

Contact Info

Nate Jones

@ndj nate@mediatemple.net

David Castro

@arimus dcastro@mediatemple.net

Thursday, July 19, 12

slide-38
SLIDE 38

Questions?

Thursday, July 19, 12