distributed logging architecture in container era
play

Distributed Logging Architecture in Container Era LinuxCon Japan - PowerPoint PPT Presentation

Distributed Logging Architecture in Container Era LinuxCon Japan 2016 at Jun 13 2016 Satoshi "Moris" Tagomori (@tagomoris) Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.


  1. Distributed Logging Architecture in Container Era LinuxCon Japan 2016 at Jun 13 2016 Satoshi "Moris" Tagomori (@tagomoris)

  2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.

  3. Topics • Microservices and logging in various industries • Difficulties of logging with containers • Distributed logging architecture • Patterns of distributed logging architecture • Case Study: Docket and Fluentd • Why OSS are important for logging

  4. Logging

  5. Logging in Various Industries • Web access logs • Views/visitors on media • Views/clicks on Ads • Commercial transactions (EC, Game, ...) • Data from devices • Operation logs on Apps of phones • Various sensor data

  6. Microservices and Logging Users Users Service (Application) Logs Logs • Microservices • many services produce data • Monolithic service about an users access • a service produces all data • it's needed to collect logs about an users access from many services to know what is happening

  7. Logging and Containers

  8. Containers: "a must" for microservices • Dividing a service into services • a service requires less computing resources 
 (VM -> containers) • Making services independent from each other • but it is very difficult :( • some dependency must be solved even in development environment 
 (containers on desktop)

  9. Redesign Logging: Why? • No permanent storages • No fixed physical/network address • No fixed mapping between servers and roles

  10. Containers: immutable & disposable • No permanent storages • Where to write logs? • file in container 
 → be gone w/ container instance 😟 • dir shared from host 
 → hosts are shared by many services ☹ • TODO: ship logs from container to anywhere ASAP

  11. Containers: unfixed addresses • No fixed physical / network address • Where should we go to fetch logs? • Service discovery (e.g., consul) 
 → one more component 😟 • rsync? ssh+tail? or ..? Is it installed in container? 
 → one more tool to depend on ☹ • TODO: push logs to anywhere from containers

  12. Containers: instances per roles • No fixed mapping between servers and roles • How can we parse / store these logs? • Central repository about log syntax 
 → very hard to maintain 😟 • Label logs by source address 
 → many containers/roles in a host ☹ • TODO: label & parse logs at source of logs

  13. Distributed Logging Architecture

  14. Core Architecture Collector nodes (Docker containers + agent) • Collector nodes • Aggregator nodes Aggregator nodes • Destination Destination 
 (Storage, Database, ...)

  15. Collecting and Storing Data • Parse (collector) • Raw logs are not good for processing • Convert logs to structured data (key-value pairs) • Sort/Shuffle (aggregator) • Mixed logs are not good for scanning • Split whole data stream into streams • Store (destination) • Format logs(records) as destination expects

  16. Scaling Logging • Network traffic • CPU load to parse / format • Parse logs on each collector (distributed) • Format logs on aggregator (to be distributed) • Capability • Make aggregators redundant • Controlling delay

  17. Patterns

  18. Aggregation Patterns source aggregation source aggregation NO YES destination aggregation NO destination aggregation YES

  19. Source Side Aggregation Patterns w/ source aggregation w/o source aggregation collector aggregate container aggregator

  20. Without Source Aggregation collector • Pros: • Simple configuration aggregator • Cons: • fixed aggregator (endpoint) address • many network connections • high load in aggregator

  21. With Source Aggregation • Pros: • less connections aggregate • lower load in aggregator container • less configuration in containers 
 (by specifying localhost) • highly flexible configuration 
 (by deployment only for aggregate containers) • Cons: • a bit much resource (+1 container per host)

  22. Destination Side Aggregation Patterns w/o destination aggregation w/ destination aggregation collector aggregator destination

  23. Without Destination Aggregation • Pros: • Less nodes • Simpler configuration • Cons: • Storage side change affects collector side • Worse performance: many small write requests on storage

  24. With Destination Aggregation • Pros: • Collector side configuration is 
 free from storage side changes aggregator • Better performance with fine tune 
 on destination side aggregator • Cons: • More nodes • A bit complex configuration

  25. Scaling Patterns Scaling Up Endpoints Scaling Out Endpoints HTTP/TCP load balancer Round-robin clients Huge queue + workers Collector nodes Load balancer Aggregator nodes Backend nodes

  26. Scaling Up Endpoints • Pros: • Simple configuration 
 Load balancer in collector nodes • Cons: • Scaling up limit Backend nodes

  27. Scaling Out Endpoints • Pros: • Unlimited scaling 
 by adding aggregator nodes • Cons: • Complex configuration • Client features for round-robin

  28. Without 
 With 
 Destination Aggregation Destination Aggregation Collecting logs over Internet Scaling Up Systems in early stages Endpoints or Using queues Impossible :( Scaling Out Collecting logs Collector nodes must know Endpoints in datacenter all endpoints ↓ Uncontrollable

  29. Case Studies

  30. Case Study: Docker+Fluentd • Destination aggregation + scaling up • Fluent logger + Fluentd • Source aggregation + scaling up • Docker json logger + Fluentd + Elasticsearch • Docker fluentd logger + Fluentd + Kafka • Source/Destination aggregation + scaling out • Docker fluentd logger + Fluentd

  31. Why Fluentd? • Docker Fluentd logging driver • Docker container can send logs into Fluentd directly - less overhead • Pluggable architecture • Various destination systems • Small memory footprint • Source aggregation requires +1 container per host • Less additional resource usage ( < 100MB )

  32. Destination aggregation + scaling up • Sending logs directly over TCP by Fluentd logger 
 in application code • Same with patterns of New Relic Application code

  33. Source aggregation + scaling up • Kubernetes: Json logger + Fluentd + Elasticsearch • Applications write logs to STDOUT Application code • Docker writes logs as JSON in files • Fluentd 
 reads logs from file 
 Files (JSON) parse JSON objects 
 writes logs to Elasticsearch Elasticsearch http://kubernetes.io/docs/getting-started-guides/logging-elasticsearch/

  34. Source aggregation + scaling up • Docker fluentd logging driver + Fluentd + Kafka • Applications write logs to STDOUT Application code • Docker sends logs 
 to localhost Fluentd • Fluentd 
 gets logs over TCP 
 pushes logs into Kafka Kafka

  35. Source/Destination aggregation + scaling out • Docker fluentd logging driver + Fluentd • Applications write logs to STDOUT Application code • Docker sends logs 
 to localhost Fluentd • Fluentd 
 gets logs over TCP 
 sends logs into Aggregator Fluentd 
 w/ round-robin load balance

  36. What's the Best? • Writing logs from containers: Some way to do it • Docker logging driver • Write logs on files + read/parse it • Send logs from apps directly • Keep it scalable! • Source aggregation: Fluentd on localhost • Scalable storage: (Kafka, external services, ...) • No destination aggregation + Scaling up • Non-scalable storage: (Filesystems, RDBMSs, ...) • Destination aggregation + Scaling out

  37. Why OSS Are Important For Logging?

  38. Why OSS? • Logging layer is interface • transparency • interoperability • Keep it scalable • number of nodes • number of types of source/destination

  39. Use OSS, Make Logging Scalable

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend