A centralised ElasticSearch service: Design ideas and status - - PowerPoint PPT Presentation

a centralised elasticsearch service design ideas and
SMART_READER_LITE
LIVE PREVIEW

A centralised ElasticSearch service: Design ideas and status - - PowerPoint PPT Presentation

A centralised ElasticSearch service: Design ideas and status Motivation for a centralised ES service Project Mandat Organisation Team Strategy and design Status and resources Issues PaaS Conclusions/summary


slide-1
SLIDE 1
slide-2
SLIDE 2

ES service ideas

  • Feb. 2 2016

2

A centralised ElasticSearch service: Design ideas and status

  • Motivation for a centralised ES service
  • Project

– Mandat – Organisation – Team

  • Strategy and design

– Status and resources – Issues – PaaS

  • Conclusions/summary
slide-3
SLIDE 3

ES service ideas

  • Feb. 2 2016

3

Motivation

  • Strategy until 12/2015:

– Go and do it yourself – Basic setup description provided – No support for user owned instances

  • Reasoning:

– Often conflicting user requirements – Risk of labor intensive installation which cannot be covered

with available man power

slide-4
SLIDE 4

ES service ideas

  • Feb. 2 2016

4

Motivation

Current status:

– O(20) clusters on the radar from checking foreman – Monitoring has 3 clusters for meter, timber and

development hosted on O(50) machines

– Many clusters use physical hardware ( because of I/O)

Significant amount of resources with unknown security setup

slide-5
SLIDE 5

ES service ideas

  • Feb. 2 2016

5

Project mandate

Setup a centralised ES service and consolidate existing installations.

  • Offer something that can be

used out of the box

  • Attractive service for

newcomers

  • Cover as many use cases as

possible

  • Offer plugins like kibana “as is”
slide-6
SLIDE 6

ES service ideas

  • Feb. 2 2016

6

Project organisation

  • Twiki as entry point for service managers:

https://twiki.cern.ch/twiki/bin/view/IT/ElasticSearchWeb

  • Agile project management

– JIRA ITES project – 4-6 weeks sprints (first just ended) – Daily scrums (but for Mondays) at 9:30 in 31-S-27

– End-user documentation on gitbook:

http://esdocs.web.cern.ch/esdocs/

slide-7
SLIDE 7

ES service ideas

  • Feb. 2 2016

7

Team

  • Team spans across sections and groups
  • Part-time participation from

– Compute and monitoring (4 people) – Databases (2 people)

slide-8
SLIDE 8

ES service ideas

  • Feb. 2 2016

8

Strategy and design

  • Want to profit from existing experiences as much

as possible

  • Get user feedback from the beginning
slide-9
SLIDE 9

ES service ideas

  • Feb. 2 2016

9

Strategy and design

  • Lots of interest from the outside

– Rumors spread quickly – Official announcement went out to experiment lists asking for their

requirements

– Announcement in ITUM

  • Received quite detailed responses already eg from

ATLAS

  • User meetings took place already with ATLAS, DB, CDA
slide-10
SLIDE 10

ES service ideas

  • Feb. 2 2016

10

Strategy and design

  • 2 options for the implementation

– Fully puppet managed and monitored – PaaS like approach

  • Going for first option for now

– PaaS in parallel – Existing experiences by monitoring team with Heat – Playing with upcoming OpenShift service – Involve DB on Demand staff

slide-11
SLIDE 11

ES service ideas

  • Feb. 2 2016

11

Status and resources

  • Plan for an as-big-as possible shared instance

– Big in terms of applications/number of users – Public data – Different QoS offerings:

  • local (SSD based)
  • Network attached with tunable IOPS
  • Dedicated instances only for specific requirements,

eg specific security constraints or redundancy

slide-12
SLIDE 12

ES service ideas

  • Feb. 2 2016

12

Status and resources

  • Default setup

– Support ES 2.x only – Kibana 4 given “as is” – 3 different host types:

  • Search nodes
  • Master nodes (redundant setup)
  • Data nodes (2 different types, depending on QoS)
  • Puppet setup

– Currently being developed

slide-13
SLIDE 13

ES service ideas

  • Feb. 2 2016

13

Status and resources

  • Resources:

– Virtual machines only

  • No containers
  • No physical hardware

– Limited amount of resources to play with for now – Pending request for more resources RQF0536509

slide-14
SLIDE 14

ES service ideas

  • Feb. 2 2016

14

Issues to address

  • Main issue: ACLs

– Used by SDC, required by others as well (eg DB) – Security module available by ES – Commercial, license fees depend on the number of nodes – Free security module in Alpha version

  • Started to test this
  • Need to have some amount of resources available for serious testing

– Current quota 25 VMs and 50 cores is not enough – Need for m1.xlarge flavor (8 cores) – Used up core quota already

slide-15
SLIDE 15

ES service ideas

  • Feb. 2 2016

15

Issues to address

  • ES and Kibana versions

– Many customers rely on kibana 3. We may have to support this somehow

  • Flume does not yet work with ES 2.X

– Patch available on github – Started testing this once the test instance is on ES 2.x

  • Implementation of QoS on shared instance

– To be done – Waiting for SSD based resources

slide-16
SLIDE 16

ES service ideas

  • Feb. 2 2016

16

Time lines

  • Aiming at initial test instance by March
  • Several customers are keen to start testing

– ATLAS needs to move out of their ES resources within 2months – Agreed with them to give them access to a non-prod testing

instance by March

– SDC is keen to move over asap – DB and CDA are ready to test – Batch monitoring is a good candidate as well

slide-17
SLIDE 17

ES service ideas

  • Feb. 2 2016

17

PaaS

  • Checked out how Amazon does it
  • Users would own their instances but IT owns the resources

– Create instances giving only few needed parameters – Shared instance could be run as PaaS by us

  • Several candidates on the marked

– Started to play with OpenShift

  • Upcoming service which will be used for gitlab, jenkings ,...
  • Very few steps needed to get an ES instance up

– Other options on the market

  • Heat, cloudify, ...
slide-18
SLIDE 18

ES service ideas

  • Feb. 2 2016

18

Collaboration with external people

  • ES meetup at CERN

– Took place on 8/2/2016 at CERN – Close collaboration with ATLAS in organising this event

  • Next: ElasticON

– 17-19 Feb. 2016, San Francisco – Pablo will represent us there

slide-19
SLIDE 19

ES service ideas

  • Feb. 2 2016

19

Conclusions

  • An ElasticSearch centralised service is being setup

– Consolidation of existing instances is overdue – Lots of interest from IT and the experiments

  • Aiming at first test instances in Q2 2016
  • Aiming at a production ready service by Q4 2016
  • Looking also into a PaaS options
slide-20
SLIDE 20