Joy Zheng
M O N I T O R I N G C O M P L E X S Y S T E M S : L E S S O N S F R O M M O N I T O R I N G 1 0 K B A N K I N T E G R A T I O N S
#GHC18
#GHC18 WHY MONITORING #GHC18 Run reliable services at scale - - PowerPoint PPT Presentation
M O N I T O R I N G C O M P L E X S Y S T E M S : L E S S O N S F R O M M O N I T O R I N G 1 0 K B A N K I N T E G R A T I O N S Joy Zheng #GHC18 WHY MONITORING #GHC18 Run reliable services at scale Understand service
Joy Zheng
M O N I T O R I N G C O M P L E X S Y S T E M S : L E S S O N S F R O M M O N I T O R I N G 1 0 K B A N K I N T E G R A T I O N S
#GHC18
PAGE 2 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 3 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 4 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
P E R S O N A L F I N A N C E S L E N D I N G C O N S U M E R P A Y M E N T S B A N K I N G A N D B R O K E R A G E B U S I N E S S F I N A N C E S I N T E G R A T I O N P A R T N E R S
PAGE 5 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 6 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 7 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 8 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
First Platypus Bank T attersall Federal Credit Union
Hour 1
10 billion tries / 100 million failures 5 tries / 1 failures
Hour 2
10 billion tries / 200 million failures 5 tries / 5 failures
PAGE 9 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
#GHC18
PAGE 11 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 12 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 13 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 14 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 15 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
#GHC18
PAGE 17 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
Standard Pipeline Monitoring, alerting, and dashboards at scale for easy-to- generate metrics Custom Pipeline Metrics generation involving custom aggregation
PAGE 18 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 19 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 20 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
server: tasks_processed_count{ server="i-abcdef123", status="success", institution="FirstPlatypusBank" } query: sum(rate(tasks_processed_count[5m])) alert: sum(rate(tasks_processed_count{status!="success}[5m])) by(institution) >100
PAGE 21 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
server:tasks_processed_count{ server="i-abcdef123", status="success", institution="FirstPlatypusBank" } query: sum(rate(tasks_processed_count[5m])) alert: sum(rate(tasks_processed_count{status!="success}[5m])) by(institution) >100
PAGE 22 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
server:tasks_processed_count{ server="i-abcdef123", status="success", institution="FirstPlatypusBank" } query: sum(rate(tasks_processed_count[5m])) alert: sum(rate(tasks_processed_count{status!="success}[5m])) by (institution) > 100
PAGE 23 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
route: receiver:team-monitoring-email group_by: - alertname -environment routes: - match_re: alert_type:^(?:monitoring_uptime|prometheus)$ routes: -receiver:team-monitoring-email match_re: environment:^(?:testing|preprod)$ -receiver:team-monitoring-email match_re: plaid_env:^(?:testing|preprod)$ -receiver:team-monitoring-pager
PAGE 24 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
route: receiver:team-monitoring-email group_by: - alertname -environment routes: - match_re: alert_type: ^(?:monitoring_uptime|prometheus)$ routes: -receiver:team-monitoring-email match_re: environment:^(?:testing|preprod)$ -receiver:team-monitoring-email match_re: plaid_env:^(?:testing|preprod)$ -receiver:team-monitoring-pager
PAGE 25 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
route: receiver:team-monitoring-email group_by: - alertname -environment routes: - match_re: alert_type:^(?:monitoring_uptime|prometheus)$ routes:
match_re: environment: ^(?:testing|preprod)$
match_re: plaid_env: ^(?:testing|preprod)$ -receiver:team-monitoring-pager
PAGE 26 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
route: receiver:team-monitoring-email group_by: - alertname -environment routes: - match_re: alert_type:^(?:monitoring_uptime|prometheus)$ routes: -receiver:team-monitoring-email match_re: environment:^(?:testing|preprod)$ -receiver:team-monitoring-email match_re: plaid_env:^(?:testing|preprod)$ - receiver: team-monitoring-pager
PAGE 27 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 28 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 29 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
PAGE 30 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
Events per second processed Metrics exported Services monitored Engineers who have contributed monitoring changes (on a team of 45) Average delay from event to metrics generation
#GHC18
PAGE 32 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC18
#GHC18
Build the end-to- end pipeline fjrst Learn from real usage to avoid creating extra complexity Make monitoring components independently usable Other services can make use of individual pieces of the pipeline Tailor custom aggregation to the narrowest areas which require it Let services which don’t need customization use the standard pipeline Use standard components where possible Custom components have higher implementation costs and higher developer education costs
#GHC18