pronto elasticsearch extension practice in ebay
play

Pronto Elasticsearch Extension Practice in eBay Donggeng Yu - PowerPoint PPT Presentation

Pronto Elasticsearch Extension Practice in eBay Donggeng Yu 12/07/2019, Pronto, eBay 1 Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters


  1. Pronto Elasticsearch Extension Practice in eBay Donggeng Yu 12/07/2019, Pronto, eBay 1

  2. Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters Capability 2

  3. Elastic Stack • ELKB ‒ Elasticsearch - Search & Aggregation ‒ Logstash – ETL ‒ Kibana – Visualization ‒ Beats – Data Shipper • X-Pack ‒ security, alerting, monitoring, reporting, machine learning and etc. • Use Cases & OOTB Solutions ‒ Logs / Metrics ‒ APM / Uptime ‒ SIEM / Endpoint Security ‒ Site Search / App Search / Enterprise ‒ Maps 3

  4. Pronto Ecosystem in eBay 62% Supporting text goes here under the number 4

  5. 100+ clusters 6k+ nodes VM( openstack ) / Container( k8s ) 5

  6. Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters Capability 6

  7. Use Cases in eBay • Use Cases: ‒ Near real time search / aggregation Virtual Shop / Tire Installation / ‒ Terapeak / SEO On-Site Traffic ‒ ‒ Metrics & Logs UFES / Ceilometer / SRE / UMP ‒ More than 20T/day for a single cluster ‒ 7

  8. Vertical Shop & Tire Installation 8

  9. Terapeak - eCommerce Data Insights • Terapeak ‒ SAAS based tool for providing ecommerce data insights to online sellers ‒ Acquired by eBay • Tech Stack ‒ From RMDB + SOLR to ELK ‒ S3 and Hadoop for data staging ‒ Spark for data ETL ‒ Kafka for data queue ‒ Postgres for Data Warehouse ‒ Elasticsearch for indexing and search ‒ ReactJS for front-end application 9

  10. UFES - Anomaly Detection for SLB • Goal ‒ Unified Front-End Services - Move eBay Closer to Users so that the world shops first on eBay. UFES team built out 8 new Internet Points of Presence(POP) across the globe ‒ Need to route traffic via UFES PoPs by replacing the Netscaler Hardware SEO Load Balancers with Envoy Proxy based Software Load Balancers. • Elastic Stack ‒ Filebeats + Kafka + Elasticsearch Clusters ‒ Dashboard for monitoring and comparison ‒ Anomaly Detection for SLB 10 10

  11. Ceilometer - IT Operation Analytics 11 11

  12. Challenges of Managing Clusters Fleets at Scale • Integrated with eBay’s Platform & follow the standards ‒ Configuration management & Change management ‒ Full lifecycle management • Easy onboarding and integration ‒ Elasticsearch as a Service ‒ How to free customer to focus on domain business • Performance & High Availability Performance ‒ Search: Site facing application response time should less than 100 ms ‒ Ingesting: 20T per day for a single cluster ‒ Different deployments, like cross region deployment Cost HA • Cost Control ‒ Hardware cost ‒ License fee (support some features like security, alert and ML) Onboarding ‒ Human resource Integration ‒ Support (7*24 on-call support & on-site support, etc.) 12 12

  13. Performance Solutions for Challenges Cost HA Cluster Provision & Management Onboarding Integration • From VM to Container ‒ VM (Openstack) Fixed flavor ‒ Puppet Foreman infrastructure ‒ Puppet module for Elasticsearch ‒ ‒ Container (K8s) Flexible flavor (request/limit) ‒ Operator Pattern ‒ Deployment + Statefulset + Service ‒ • Best practices & Different deployments ‒ Important System Configuration & Best practices ‒ Anti-Affinity (High availability) ‒ Cross region deployment (High availability) ‒ Flavor chosen by traffic (Cost saving) ‒ Hot-warm architecture (Cost saving) ‒ LB for write / read 13 13

  14. Performance Solutions for Challenges Cost HA Tooling and Service Extension Onboarding Integration 14 14

  15. Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters Capability 15 15

  16. Use Case Onboarding • Capacity planning ‒ What’s the use case and use scenarios Data retention / active period ‒ ‒ Performance Index rate / search rate ‒ Document & bulk size ‒ ‒ Deployment & Cost How many nodes? ‒ What’s the hardware configuration? ‒ What kind of deployment should be used? ‒ Node Storage Memory CPU Network ‒ Best practices Software configuration Master Low Low Low Low ‒ Deployment in different Region ‒ Data Extreme High High Medium Keep the margin to ensure that traffic ‒ becomes large without performance Ingest Low Medium High Medium issues Coordinator Low Medium Medium Medium Machine Low Extreme Extreme Medium Learning 16 16

  17. Onboarding Integration Onboarding Self-Service and Sizing Tool 17 17

  18. Onboarding Cost Integration Customer Support • Support model ‒ Different SLA for different use cases Search response time should less than 100ms ‒ Cluster should NOT be in RED ‒ ‒ 7*24 support for Site-facing or Tier 2 above SEC call / Pagerduty ‒ • Support case ‒ Cluster in RED Node missing and replica is 0 ‒ Dangling index ‒ ‒ Response time Full GC because of Machine check error (MCE) ‒ Too many shards and fields ‒ 18 18

  19. Onboarding Integration Data Ingestion Pipeline • Added Value for customers ‒ Self-service, no coding/testing ‒ No onboarding required • Shared cluster ‒ 30+ use cases / 3T per day • Shared data assets ‒ Partition by application name • Shared dashboard ‒ 30+ Dashboards ‒ 300+ Charts/Visualizations 19 19

  20. Simple Steps - service onboarding a new use case pom.xml web.xml 20 20

  21. Performan ce Data Management & Optimization Onboardi ng Cost Integratio n • Backup & Restore ‒ Snapshot lifecycle management (SWIFT as the repository ) • Time series data ‒ Benefits of using time-based indices Delete index is faster than delete by query ‒ Use hot-warm architecture ‒ Close indices or force-merge read-only ‒ indices ‒ Time series data Treapeak v.s UFES (different needs) ‒ • LifeCycle management ‒ Central policy management / Web UI / OOTB Policies 21 21

  22. Index Management Tool vs. Curator vs. ILM Pronto Index Mgmt. Function Curator Elastic ILM Tool High Availability N/A YES YES Web UI N/A YES YES Version Compatibility N/A 2.x/5.x/6.x/7.x 6.8+ Multi-Clusters N/A YES N/A 22 22

  23. Performanc Cost e Diagnostic Tool • Features ‒ Find Improper settings or usage ‒ Job scheduler & Diagnostic report for potential issues • Rules ‒ Too many indices / Too many shards / Index have too many fields ‒ Shard size check (20GB to 40GB) ‒ Imbalance shards ‒ Replica number should bigger than 0 ‒ Node missing / Rack Id attribute missed / Minimum master ‒ Machine check error / Server disk full ‒ Alias & index template checking 23 23

  24. Performance & User Scenarios • Many Factors: ‒ Index / Shard ‒ Query / Scripting ‒ Mapping / Setting Behavior Use Cases Index heavy Logging / Metrics / Security / APM Search heavy App Search / Site Search / Analytics Update heavy Caching / Systems of Record 24 24

  25. Performance Issues & Optimization • Wildcard search • Performance Optimization ‒ Customer use beginning patterns ‒ Disable swapping & give memory to the with * and ?. file system cache ‒ Avoid to use * or ?. ‒ Unset or increase the refresh interval ‒ Disable refresh and replicas for initial loads • Stopwords & Shard Size ‒ Use auto generated Ids ‒ Reindex with the stop words ‒ Disable the features you do not need ‒ Use more shards to improve the ‒ Don’t use default dynamic string mapping throughput ‒ Watch your shard size / shrink index ‒ Force Merge • Too many indices / shards / fields ‒ Pre-Index data ‒ Avoid scripts ‒ Close or delete the unused indices ‒ Force-merge read-only indices ‒ Improve the document modeling ‒ Warm up global ordinals ‒ Disable the dynamic mapping ‒ Replicas might help with through, but not always 25 25

  26. Performance Performance Testing Tool • Performance testing ‒ Testing data ‒ Testing scripts ‒ Test report for analysis • Web based tool ‒ Developed based on the Gatling ‒ Web UI to select the testing scripts and testing data ‒ Test report for analysis 26 26

  27. Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters Capability 27 27

  28. Solution and security plugin for Cost Elasticsearch • Pronto Security Plugin ‒ TLS for encrypted communications ‒ Cluster / Index level RBAC control ‒ Follow eBay’s standard API Key for Application ‒ 2FA for user login ‒ Audit logs ‒ • Security Consideration ‒ Authentication / RBAC ‒ Certification retention ‒ Firewall / White IP list ‒ Vulnerability management 28 28

  29. Cost X-Pack Subscription • License cost ‒ License fee is based on the node count • How to Extend ‒ Develop the Kibana Application ‒ Integrate with the alerting and anomaly detection service 29 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend