About me Operations Engineer - DevOps BS, MS, and CAS in - PowerPoint PPT Presentation

Analyzing large flow data sets using modern open-source data search and visualization tools FloCon 2014 Max Putas

About me ● Operations Engineer - “DevOps” ● BS, MS, and CAS in Telecommunications ● Work/research interests ○ System automation ○ Efficiency improvement ○ System and network monitoring ○ Traffic/service analysis ○ Open-Source software

Common tools for analysis ● Scripts: Bash, Perl, Python ○ Learning curve, time-intensive ○ GnuPlot for graphing/visualization ● Application-specific tools ○ SiLK, Apache Chainsaw, Wireshark ● Splunk - EXPEN$IVE ● Excel ○ :-(

General model Raw Data Transform Store Visualize - Search - Analyse

Components Binary SiLK rwfilter CSV file data rwcut Logstash Elasticsearch Kibana

Components Logstash =

Components Logstash : About ● Can act as an agent, server, or both ● Single jar file – only depends on Java ● Very young project ○ Started in late 2010 ○ First official book released last year (2013)

Components Logstash : Plugins 36 40 46 Filter Input Output CSV File Date Elasticsearch … GeoIP ... …

Components Logstash : Configuration input { file { path => "/tmp/silk-data.csv" start_position => "beginning" type => "silkcsv" } } filter { … date { type => "silkcsv" match => [ "sTime", "yyyy/MM/dd'T'HH:mm:ss.SSS" ] add_tag => [ "dated" ] } … } output { elasticsearch { host => "localhost" } }

Components Elasticsearch : About ● Built on Apache Lucene (indexing/search library) ● Java ● RESTful API ● Distributed, scalable architecture. ○ Nodes can find eachother through discovery ● JSON-based ● "Big data" focus

Components Elasticsearch : Data storage ● Index - document “database” ○ Document types fields type mappings ● Shards - pieces of the index ○ More shards, better indexing performance across the cluster ● Replicas - how many copies of each shard ○ More replicas, better search performance and redundancy Node 1 Node 2 1 Index 3 Shards 2 Replicas

Components Elasticsearch : Performance ● Lab setup: 6-core CPU : 16GB RAM : SATA HD ○ Indexing performance: 4000/s ● Double the number of shards and machines ○ ~2x index performance increase ● Double the number of replicas ○ ~2x search performance increase ● Can take full advantage of SSDs

Components Elasticsearch : Type Mapping ... "dIP" : { "type" : "ip" }, "dPort" : { "type" : "integer" }, ... "duration" : { "type" : "float" }, ... "eTime" : { "type" : "date", "format" : "yyyy/MM/dd'T'HH:mm:ss.SSS" }, ...

Kibana

Components Kibana : Features ● Pure Javascript: connects directly to Elasticsearch ○ A reverse proxy will be necessary to limit access ● Graphing/visualization: histograms, scatter plots, pie charts, ranked lists, maps, and line graphs ● Statistics: trends, min, mean, and max ● Real-time search: Simultaneous queries, sortable results, filters, field drill-down, and derived (faceted) queries

Components Development ● The developers of Kibana and Logstash were recently hired by Elasticsearch ● There is a possibility of even tighter integration in the future

More possibilities ● Logs ○ Web, database, e-mail, and DNS servers ○ Firewalls, IDS/IPS, switches, and routers ○ Syslog and Windows events ● Monitoring alerts: SNMP ● Performance metrics ● Others? ○ If it’s textual and log-like it’ll probably work ○ Custom plugins are possible ● Gather related data to correlate events in Kibana or through the Elasticsearch API

More possibilities Parsing ● Problem? Regex complexity [0-9]+-(?:0?[1-9]|1[0-2])-(?:(?:0[1-9])|(?:[12] [0-9])|(?:3[01])|[1-9]) (?:2[0123]|[01][0-9]):(?: [0-5][0-9]):(?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?), (?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9] +)?)|(?:\.[0-9]+))))

More possibilities Parsing ● Logstash provides built-in parsing (“grok”) rules: HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT} ● Common Apache log format: 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 ● Complete rule: APACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE: request}(?:HTTP/%{NUMBER:httpversion})?|%{DATA: rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)

References and resources ● http://logstash.net ● http://www.elasticsearch.org ● http://www.elasticsearch.org/overview/kibana/ ● Try Kibana yourself: ○ http://demo.kibana.org ● Debug grok parsing rules: ○ http://grokdebug.herokuapp.com ● SiLK Kibana 3 demo video: ○ https://vimeo.com/71393353 ● Contact: max.putas@gmail.com

About me Operations Engineer - DevOps BS, MS, and CAS in - PowerPoint PPT Presentation

Analyzing large flow data sets using modern open-source data search and visualization tools FloCon 2014 Max Putas About me Operations Engineer - DevOps BS, MS, and CAS in Telecommunications Work/research interests System

Lecture 27: Sorting & Searching CS 1110 Introduction to Computing Using Python [E.

Privacy Issues with the Google Android Market Thorben Kr uger Bastiaan Wissingh

Strategic Issues for Binary/File Format ILDG4 May 21 2004, T.Yoshie CCS,Tsukuna Definition

Enhancing Data Sets for Accelerated Wind Energy Development Advancing the state of the art in

Final Advisor Mr. Harker Presentation May15-03 Agenda System Technical Project Problem

Git: A Guide for Economists Frank Pinter 22 February 2019 1 / 32 Outline The importance of

MBS FileMaker Plugin Christian Schmitz Monkeybread Software MBS FileMaker Plugin 4900 functions

Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang CISC850 Cyber Analy@cs CISC850

Examining Self- Modifying Code Drew Ivarson, Union College CS Department Advisors: Prof.

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production

Mayors Office of Economic Development FY 21 Budget Hearing John Barros, Chief May 12, 2020

Combating Malware in the age of APT SANS Digital Forensic and Incident Response Summit July 2010

Massive-parallel Input & Output with structured extensible data (HDF5 successful

AD39 - Making the Jump from DevOps to DevSecOps

RAPIDS CUDA DataFrame Internals for C++ Developers - S91043 Jake Hemstad - NVIDIA - Developer

Computer Science Class XII ( As per CBSE Board) Visit : python.mykvs.in for regular updates

Fine-Grained User-Space Security Through Virtualization Mathias Payer and Thomas R. Gross ETH

MET Symposium Airways NZ view of future use of MET information 31 August 2017 Future

Domain Adaptation from a Pre-trained Source Model Application on fraud detection tasks Presenter:

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

ION GNSS SDR Metadata Standard Working Group Report Presentation/Minutes/Attendee Lis ist

Investor Presentation August 2018 Disclaimer Except as otherwise indicated, this presentation

SCTP as Alternative Transport to TCP and UDP Introduction High growth of Internet Most

Introduction to HTCondor How to distribute your compute tasks and get results with high

About me Operations Engineer - DevOps BS, MS, and CAS in - PowerPoint PPT Presentation

Analyzing large flow data sets using modern open-source data search and visualization tools FloCon 2014 Max Putas About me Operations Engineer - DevOps BS, MS, and CAS in Telecommunications Work/research interests System

Lecture 27: Sorting &amp; Searching CS 1110 Introduction to Computing Using Python [E.

Privacy Issues with the Google Android Market Thorben Kr uger Bastiaan Wissingh

Strategic Issues for Binary/File Format ILDG4 May 21 2004, T.Yoshie CCS,Tsukuna Definition

Enhancing Data Sets for Accelerated Wind Energy Development Advancing the state of the art in

Final Advisor Mr. Harker Presentation May15-03 Agenda System Technical Project Problem

Git: A Guide for Economists Frank Pinter 22 February 2019 1 / 32 Outline The importance of

MBS FileMaker Plugin Christian Schmitz Monkeybread Software MBS FileMaker Plugin 4900 functions

Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang CISC850 Cyber Analy@cs CISC850

Examining Self- Modifying Code Drew Ivarson, Union College CS Department Advisors: Prof.

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production

Mayors Office of Economic Development FY 21 Budget Hearing John Barros, Chief May 12, 2020

Combating Malware in the age of APT SANS Digital Forensic and Incident Response Summit July 2010

Massive-parallel Input &amp; Output with structured extensible data (HDF5 successful

AD39 - Making the Jump from DevOps to DevSecOps

RAPIDS CUDA DataFrame Internals for C++ Developers - S91043 Jake Hemstad - NVIDIA - Developer

Computer Science Class XII ( As per CBSE Board) Visit : python.mykvs.in for regular updates

Fine-Grained User-Space Security Through Virtualization Mathias Payer and Thomas R. Gross ETH

MET Symposium Airways NZ view of future use of MET information 31 August 2017 Future

Domain Adaptation from a Pre-trained Source Model Application on fraud detection tasks Presenter:

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

ION GNSS SDR Metadata Standard Working Group Report Presentation/Minutes/Attendee Lis ist

Investor Presentation August 2018 Disclaimer Except as otherwise indicated, this presentation

SCTP as Alternative Transport to TCP and UDP Introduction High growth of Internet Most

Introduction to HTCondor How to distribute your compute tasks and get results with high

Lecture 27: Sorting & Searching CS 1110 Introduction to Computing Using Python [E.

Massive-parallel Input & Output with structured extensible data (HDF5 successful