Know more about your Ceph Cluster with ELK Stack Cameron Seader - - PowerPoint PPT Presentation

know more about your ceph cluster with elk stack
SMART_READER_LITE
LIVE PREVIEW

Know more about your Ceph Cluster with ELK Stack Cameron Seader - - PowerPoint PPT Presentation

Know more about your Ceph Cluster with ELK Stack Cameron Seader Technology Strategist cs@suse.com 2 Ceph and Logging rsyslog, syslog-ng to forward logs to (Logstash / Fluentbit) Filebeat Ceph has Graylog (GELF) support store


slide-1
SLIDE 1

Know more about your Ceph Cluster with ELK Stack

Cameron Seader Technology Strategist cs@suse.com

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

Ceph and Logging

  • rsyslog, syslog-ng to forward logs to (Logstash / Fluentbit)
  • Filebeat
  • Ceph has Graylog (GELF) support
  • store logs for later use
  • analyze logs for Alerting
  • analyze data with Machine Learning
  • X-Pack machine learning
  • R client
  • trouble shooting/postmortem analyze
slide-4
SLIDE 4

4

Forwarding logs

  • you can format your logs before forwarding
  • there is a tutorial for rsyslog how to reformat to GELF
  • Logstash has a lot of pipeline input modules
  • syslog { }
  • graylog { }
  • Filebeat
slide-5
SLIDE 5

5

Ceph GELF

  • to forward logs in GELF, update ceph.conf
  • log_to_graylog = true
  • log_graylog_host = 127.0.0.1
  • log_graylog_port = 12201
  • restart ceph services
  • or use DeepSea
  • has support for custom Ceph config options
  • salt-run state.orch ceph.stage.3
  • that also would restart services correctly
slide-6
SLIDE 6

6

Parse and Manage

  • Logstash provides methods to parse logs
  • simple alerting could be done with Logstash
  • use grok { } and other Filters of Logstash
  • to add fields
  • better indexing and managing your data
  • Ceph Logstash example
  • Supportconfig Analyzer
slide-7
SLIDE 7

7

Logstash Pipeline filter example

filter { if [type] == "cephlog" { grok { # https://github.com/ceph/ceph/blob/master/src/log/Entry.h match => { "message" => "(?m)%{TIMESTAMP_ISO8601:stamp}\s%{NOTSPACE:thread}\s*% {INT:prio}\s(%{WORD:subsys}|):?\s%{GREEDYDATA:msg}" } # https://github.com/ceph/ceph/blob/master/src/common/LogEntry.h match => { "message" => "%{TIMESTAMP_ISO8601:stamp}\s%{NOTSPACE:name}\s% {NOTSPACE:who_type}\s%{NOTSPACE:who_addr}\s%{INT:seq}\s:\s%{PROG:channel}\s\[% {WORD:prio}\]\s%{GREEDYDATA:msg}" } } date { match => [ "stamp", "yyyy-MM-dd HH:mm:ss.SSSSSS", "ISO8601" ] } } }

slide-8
SLIDE 8

8

The Tools

ELK EFK Graylog

slide-9
SLIDE 9

9

How to get it

  • ELK
  • docker-compose is simple for dev needs
  • EFK
  • Helm charts for each
  • install individually in a cluster
  • Graylog
  • Separate installed cluster
  • Future
  • Ceph in containers (Rook) converged with these tools on K8s
slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

slide-12
SLIDE 12

12

slide-13
SLIDE 13

13

slide-14
SLIDE 14

14

Kibana Search

slide-15
SLIDE 15

15

RGW

Object storage client to the ceph cluster, exposes a RESTFUL S3 & Swift API

slide-16
SLIDE 16

16

RGW

  • A RESTful API access to object storage, a la S3
  • implements user accounts, acls, buckets
  • heavy ecosystem of s3/swift client tooling can be leveraged against

RGW

slide-17
SLIDE 17

17

RGW

  • Supports a lot of S3 like features
  • Multipart uploads
  • Object Versioning
  • torrents
  • lifecycle
  • encryption
  • compression
  • static websites
  • metadata search...
  • From Jewel we support multisite which allows geographical

redundancy

slide-18
SLIDE 18

18

ElasticSearch

You know, for search

  • distributed
  • horizontally scalable
  • schemaless
  • speaks REST
  • Easy Configuration without setting your hair on fire
slide-19
SLIDE 19

19

RGW Metadata search with ES

Motivation

  • Objects have metadata associated with them that is often

interesting to analyze

  • Since it is an “object storage” you don’t have any traditional

filesystems tool at your disposal

  • No du, df & friends, and either way these are hard on a distributed

storage system

slide-20
SLIDE 20

20

Motivation

  • Some existing support with admin API, however the problems with

this:

  • returns specific metadata, not ideal for aggregation
  • no notifications when new objects/buckets/accounts are created
  • also permissions for users to access the admin API is tricky, since admin API

was meant for administering

  • As an storage administrator you'd be interested in finding out for
  • eg. the top 10 users, average object size etc., no of objects held on

a user account etc.

slide-21
SLIDE 21

21

Design

  • Built atop of the multisite architecture, where data & metadata is

forwarded to multiple zones

  • From Kraken, we have sync plugins
  • Allows for data & metadata to be forwarded to external tiers, allows

for building of:

  • Interesting solutions analyzing bucket/object/user metadata (ES for starts)
  • Backup solutions (S3/cloud sync plugin for Mimic)
slide-22
SLIDE 22

22

Elastic Sync Plugin

  • Forwards metadata from other zones onto a ES instance
  • Requires a read only zone that doesn't cater to user requests &
  • nly forwards to ES
  • No off the shelf authentication module that can work with RGW
  • Recommendation to not expose ES endpoint to public
slide-23
SLIDE 23

23

Elastic Sync Plugin: User Requests

For normal user requests, RGW itself can authenticate the user; ensures users don't see other's data We have an attribute mentioning owners for an object and this is used to service user requests Also allows for custom metadata fields to be set up per user Elastic queries to analyze common system faults Integration into Ceph dashboard Analysis of meta and/or log data with Machine Learning Contribute

slide-24
SLIDE 24

24

me t a d a t a R G W P r i ma r y R G W s e c

  • n

d a r y E S

C e p h

S 3

slide-25
SLIDE 25

25

Example Metadata

{ _index" : "rgw-default-6cb1f916", "_type" : "object", "_id" : "86740559-297e-4487-b770- d3106b900a97.34125.1:american-gods:null", "_score" : 0.2876821, "_source" : { "bucket" : "s3bucket", "name" : "american-gods", "instance" : "null", "versioned_epoch" : 0, "owner" : {

slide-26
SLIDE 26

26

Example Query

curl -XPOST 'localhost:9200/rgw-gold/_search?size=0&pretty' -d { "aggs" : { "avg_size" : { "avg" : { "field" : "meta.size" } } } }

slide-27
SLIDE 27

27

Response

{ "took" : 22, "timed_out" : false, "_shards" : { "total" : 10, "successful" : 10, "failed" : 0 }, "hits" : { "total" : 22, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "avg_size" : { "value" : 177.72727272727272 } }

slide-28
SLIDE 28

28

Interesting queries possible

  • Object storage PUT requests on a specific time range.
  • Stats on objects with specific metadata content
  • It is possible to index metadata to non string fields on a per bucket basis
slide-29
SLIDE 29

29

Future Work

  • Support for ES 6 for RGW
  • Custom metadata fields for object tagging
  • Elastic queries to analyze common system faults
  • Integration into Ceph dashboard
  • Analysis of meta and/or log data with Machine Learning
slide-30
SLIDE 30

30

Contribute

  • https://github.com/denisok/elk_supportconfig Github Repo for
  • ngoing ELK work
  • https://ceph.com/IRC/ - Ceph upstream community mailing lists and

IRC channels

  • http://lists.suse.com/mailman/listinfo/deepsea-users - DeepSea

upstream mailing list.

  • https://groups.google.com/forum/#!forum/openattic-users -
  • penATTIC upstream mailing list.
  • https://github.com/ceph/ceph - upstream Ceph sources
slide-31
SLIDE 31

31

Questions?

slide-32
SLIDE 32