Introducing InfluxDB, an open source distributed time series - - PowerPoint PPT Presentation

introducing influxdb an open source distributed time
SMART_READER_LITE
LIVE PREVIEW

Introducing InfluxDB, an open source distributed time series - - PowerPoint PPT Presentation

Introducing InfluxDB, an open source distributed time series database Paul Dix @pauldix paul@errplane.com About me Co-founder, CEO of Errplane (YC W13) Organizer of NYC Machine Learning Author of Service Oriented Design with


slide-1
SLIDE 1

Introducing InfluxDB, an

  • pen source distributed

time series database

Paul Dix @pauldix paul@errplane.com

slide-2
SLIDE 2
  • Co-founder, CEO of Errplane (YC W13)
  • Organizer of NYC Machine Learning
  • Author of “Service Oriented Design with

Ruby & Rails”

About me

slide-3
SLIDE 3

Series editor for Addison Wesley’s “Data & Analytics”

slide-4
SLIDE 4

What is a time series?

slide-5
SLIDE 5

Metrics

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10

Events

  • Measurements
  • Exceptions
  • Page Views
  • User actions
  • Commits
  • Deploys
  • Things happening in time...
slide-11
SLIDE 11

Analytics

  • perations, developers, users, business
slide-12
SLIDE 12

Things you want to ask questions about, visualize, or summarize

  • ver time.
slide-13
SLIDE 13

Actually a summarization

slide-14
SLIDE 14

Also a summarization

slide-15
SLIDE 15

What about... “...order by some_time_col”

slide-16
SLIDE 16

Why a database for time series?

slide-17
SLIDE 17

Billions of data points. Scale horizontally.

slide-18
SLIDE 18

HTTP native. API to build on.

slide-19
SLIDE 19

Built in tools for downsampling and summarizing

slide-20
SLIDE 20

Automatically clear out

  • ld data if we want
slide-21
SLIDE 21

Process or monitor data as it comes in, like Storm

slide-22
SLIDE 22

Visualize and Summarize

  • Graphs & dashboards
  • Last 10 minutes
  • Last 4 hours
  • Last 24 hours
  • Past week
  • Past month
  • YTD
  • All Time
slide-23
SLIDE 23

Data Collection

  • Statsd - https://github.com/etsy/statsd/
  • CollectD - http://collectd.org/
  • Heka - https://github.com/mozilla-

services/heka

  • l2met - https://github.

com/ryandotsmith/l2met

  • Libraries
  • Framework integrations
  • Cloud integrations (AWS, OpenStack)
  • Third-party integrations
slide-24
SLIDE 24

Existing Tools

  • RRDTool (metrics)
  • Graphite (metrics)
  • OpenTSDB (metrics + events)
  • Kairos (metrics + events)
  • and others...
slide-25
SLIDE 25

Something missing...

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

InfluxDB: harness lightning, get 1.21 gigawatts.

slide-31
SLIDE 31

InfluxDB

  • Written in Go
  • Uses LevelDB for storage (may change)
  • Self contained binary
  • No external dependencies
  • Distributed (in December)
slide-32
SLIDE 32

HTTP Native

  • Read/write data via HTTP
  • Manage via HTTP
  • Security model to allow access directly from

browser

slide-33
SLIDE 33

How data is organized

  • Databases (like in MySQL, Postgres, etc)
  • Time series (kind of like tables)
  • Points or events (kind of like rows)
slide-34
SLIDE 34

Security

  • Cluster admins
  • Database admins
  • Database users

○ read permissions ■ only certain series ■ only queries with a column having a specific value (e.g. customer_id=32) ○ write permissions ■ only certain series ■ only with columns having a specific value

slide-35
SLIDE 35

InfluDB Setup

  • http://play.influxdb.org
  • OSX

○ brew update && brew install influxdb

  • http://influxdb.org/download
  • Ubuntu

○ sudo dpkg -i influxdb_latest_amd64.deb

  • RedHat

○ sudo rpm -ivh influxdb-latest-1.i686.rpm

slide-36
SLIDE 36

Examples, but sadly no R :(

slide-37
SLIDE 37

HTTP API docs at

http://influxdb.org/docs/api/http

slide-38
SLIDE 38

https://github.com /influxdb/influxdb-r

fork, write sweet code, submit PR, be loved and adored FOREVER

slide-39
SLIDE 39

Create a database

curl -X POST \ 'http://localhost:8086/db?u=root&p=root' \

  • d '{"name":"mydb", "replicationFactor": 3}'
slide-40
SLIDE 40

Add a user

curl -X POST\ 'http://.../db/mydb/users?u=root&p=root' -d \ '{"name":"paul", "password": "foo", "admin": true}'

slide-41
SLIDE 41

Write points

curl -X POST \ 'http://localhost:8086db/mydb/series?u=paul&p=pass' \

  • d '[{"name":"foo", "columns":["val"], "points": [[3]]}]'
slide-42
SLIDE 42

Querying

curl \ 'http://...:8086/db/mydb/series?u=paul&p=pass&q=...'

slide-43
SLIDE 43

SQL(ish) Query Language

select * from user_events where time > now() - 4h

slide-44
SLIDE 44

[{ "name": "foo", "columns": [ "time", "sequence_number", "val1", "val2" ], "points": [ [1384295094, 3, "paul", 23], [1384295094, 2, "john", 92], [1384295094, 1, "todd", 61] ] }, {...}]

JSON data returned

slide-45
SLIDE 45

select count(state) from user_events group by time(5m), state where time > now() - 7d

slide-46
SLIDE 46

select percentile(value, 90) from response_times group by time(30s) where time > now() - 1h

slide-47
SLIDE 47

select percentile(value, 90) from response_times group by time(5m) into response_times.percentiles.90

Continuous Queries (downsampling)

slide-48
SLIDE 48

Continuous queries for real-time processing & monitoring

slide-49
SLIDE 49

Regexes

select * from events where email =~ /.*gmail\.com/

slide-50
SLIDE 50

select percentile(value, 99) from /stats\.*/ into :series_name.percentiles.99

slide-51
SLIDE 51

select count(value) from seriesA merge seriesB

slide-52
SLIDE 52

Querying

  • Functions

○ count, min, max, mean, distinct, median, mode, percentiles, derivative, stddev

  • Where clauses
  • Group by clauses (time and other columns)
  • Periodically delete old raw data
slide-53
SLIDE 53

Built in UI

slide-54
SLIDE 54

CLI

slide-55
SLIDE 55

Libraries

  • Ruby
  • Frontend JS
  • Node
  • Python
  • PHP
  • Go (soon)
  • Java (soon)
slide-56
SLIDE 56

Ideas to come...

  • Custom functions

○ Embedded LUA, YARN like interface, or both?

  • Custom real-time queries

○ define custom logic and InfluxDB will feed it data

  • Queries triggering web hooks

○ pair with custom functions for monitoring/anomaly detection

slide-57
SLIDE 57

Project Status

  • Based on work at https://errplane.com

○ 2 billion points per month

  • http://influxdb.org
  • Code available at https://github.com/influxdb
  • API finalized in the next month
  • Clustered version in December
  • Production ready by end of year
slide-58
SLIDE 58

We’re available for consulting/help

slide-59
SLIDE 59

We need your help

  • API, what else would you like to see?
  • Client libraries
  • Visualization tools
  • Data collection integrations
  • Comments/feedback on the mailing list
  • http://influxdb.org/overview/
slide-60
SLIDE 60

Share the love

  • Star or watch the project on http://github.

com/influxdb/influxdb

  • Tweet, blog, shout, whisper
  • Participate in discussions on mailing list
slide-61
SLIDE 61

Come to the hackfest

  • Monday, December 2nd at Pivotal
  • http://meetup.com/nyc-influxdb-user-group
slide-62
SLIDE 62

OSS lives and dies by adoption/popularity

slide-63
SLIDE 63

MongoDB has 4,406 stars

slide-64
SLIDE 64

MongoDB valued at $1.2B

slide-65
SLIDE 65

Each star worth $272,355.00

slide-66
SLIDE 66

Help InfluxDB get to 10k stars!

go forth and build!

slide-67
SLIDE 67

Thanks!

@pauldix paul@errplane.com