Sumo Logic Confidential
Understanding Software System Behavior With ML and Time Series Data
QCon.ai SF – April 11, 2018
David Andrzejewski - @davidandrzej Engineering, Sumo Logic
Understanding Software System Behavior With ML and Time Series Data - - PowerPoint PPT Presentation
Understanding Software System Behavior With ML and Time Series Data QCon.ai SF April 11, 2018 David Andrzejewski - @davidandrzej Engineering, Sumo Logic Sumo Logic Confidential Intro / context Currently: Sumo Logic since 2011
Sumo Logic Confidential
QCon.ai SF – April 11, 2018
David Andrzejewski - @davidandrzej Engineering, Sumo Logic
Sumo Logic Confidential
Intro / context
–
Sumo Logic since 2011
–
Co-organizer: SF ML Meetup
–
@davidandrzej on Twitter
–
Postdoc at LLNL
–
U Wisconsin
Sumo Logic Confidential
Continuous intelligence for machine data
Sumo Logic Confidential
Overview
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Trouble in software paradise!
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Big Data to the rescue?
DEBUG-level visibility, in production
Sumo Logic Confidential
Not so fast! “Could a Neuroscientist Understand a Microprocessor?”
Jonas & Kording (PLoS Comp Bio 2017)
Sumo Logic Confidential
– Software – Biological – Social / economic
”Grand challenge” problem
Using data to understand complex, dynamic, multi-scale systems new measurements → new science
Sumo Logic Confidential
Sumo Logic Confidential
Operational time series telemetry: the basics
– “Four Golden Signals” (Google SRE book)
– Basic resources: CPU, memory, … – More granular timings – Event counts, cache miss rates, other internals…
– “push” agents/daemons (eg, StatsD) – “pull” metrics endpoints (eg, Prometheus)
– TSDB (time series database) – OSS / Commercial systems
Sumo Logic Confidential
Operational time series telemetry: why
Q: WTF is my system actually doing?
Monitoring & troubleshooting
Sumo Logic Confidential
Operational time series telemetry: example
“Metrics 2.0”–style key-value identifier
8:01 8:02 8:03 8:04 8:05 … 64 128 72 144 96 …
Actual data: sequence
Sumo Logic Confidential
Quantization: rollup / time-based aggregation
Raw event/observation data à coarser, more regular 1-minute aggregations à 1-hour aggregations, etc
8: 8:00 00 8: 8:01 01 … 8: 8:58 58 8: 8:59 59 60.1 43.2 33.3 45.1 42.5 6: 6:00 00 7: 7:00 00 8: 8:00 00 9: 9:00 00 10: 10:00 00 … … 33.3 … …
Aggregation: map from multiset of floats to some single-valued summary Min
Sumo Logic Confidential
Quantization: rollup / time-based aggregation
Raw event/observation data à coarser, more regular 1-minute aggregations à 1-hour aggregations, etc
8: 8:00 00 8: 8:01 01 … 8: 8:58 58 8: 8:59 59 60.1 43.2 33.3 45.1 42.5 6: 6:00 00 7: 7:00 00 8: 8:00 00 9: 9:00 00 10: 10:00 00 … … 33.3 … …
Aggregation: map from multiset of floats to some single-valued summary
Sumo Logic Confidential
SRE percentiles
Percentile as guarantee p99 < 2000 ms translates into unambiguous language: “No more than 1% of customer requests take longer than 2 seconds to execute”
Sumo Logic Confidential
Percentiles via CDF-1
p60 = -1.8 etc...
https://en.wikipedia.org/wiki/Normal_distribution
Sumo Logic Confidential
Algebraic structure for fun and profit
Example: item counts
data data data
Sumo Logic Confidential
Algebraic structure for fun and profit
Example: word counts
data data data
Aggregate of combined data Combination of aggregates
Monoid homomorphism!
Sumo Logic Confidential
Percentile original sin: ! "# + "% ≠ ! "# ⊕ !("%)
– p95 of dataset X – p95 of dataset Y
Not a monoid homomorphism
Sumo Logic Confidential
Basic aggregation: across series
8: 8:01 01 8: 8:02 02 8: 8:03 03 8: 8:04 04 8: 8:05 05 … 64 128 72 144 96 … 23 33 49 57 37 … 46 101 78 58 39 … … … … … … … 8: 8:01 01 8: 8:02 02 8: 8:03 03 8: 8:04 04 8: 8:05 05 … 55.3 47.1 76.8 52.3 41.7
What is max write_latency of entire foobuzz cluster?
f = MAX( )
host=foobuzz-3 host=foobuzz-2 host=foobuzz-1
Sumo Logic Confidential
Basic aggregation: across time (aka “fold”)
8: 8:01 01 8: 8:02 02 8: 8:03 03 … 64 128 72 … 23 33 49 … 46 101 78 … … … … …
What is average queue depth of each foobuzz host over this time period?
103.4 48.6 62.1
f = AVG( )
host=foobuzz-3 host=foobuzz-2 host=foobuzz-1
Sumo Logic Confidential
Time-shifted comparisons
deployment=production cluster=indexer host=foobuzz-21 metric=write_latency units=ms 8: 8:01 01 8: 8:02 02 8: 8:03 03 8: 8:04 04 8: 8:05 05 … 64 128 72 144 96 …
How does write_latency for this foobuzz instance compare versus yesterday?
8: 8:01 01 (-24h 24h) 8: 8:02 02 (-24h 24h) 8: 8:03 03 (-24h 24h) 8: 8:04 04 (-24h 24h) 8: 8:05 05 (-24h 24h) … 23 12 18 37 24 …
20 40 60 80 100 120 140 160 8:01 8:02 8:03 8:04 8:05
Comparison
Now Timeshift
Sumo Logic Confidential
Time-shifted comparisons
deployment=production cluster=indexer host=foobuzz-21 metric=write_latency units=ms 8: 8:01 01 8: 8:02 02 8: 8:03 03 8: 8:04 04 8: 8:05 05 … 64 128 72 144 96 …
How does write_latency for this foobuzz instance compare versus yesterday?
8: 8:01 01 (-24h 24h) 8: 8:02 02 (-24h 24h) 8: 8:03 03 (-24h 24h) 8: 8:04 04 (-24h 24h) 8: 8:05 05 (-24h 24h) … 23 12 18 37 24 …
20 40 60 80 100 120 140 160 8:01 8:02 8:03 8:04 8:05
Comparison
Now Timeshift
Sumo Logic Confidential
Windowing data
– QCon SF 2016 slides – ”Beyond Batch” blog posts Part 1, Part 2
Aka “grouping over time”
Sumo Logic Confidential
Handling ”missing” data
Reality: often messy!
pandas
Fancier model / ML based approaches
– “imputation” (statistics / econometrics) – inference / sampling (probabilistic models) –
Sumo Logic Confidential
Original data Fixed value (mean) Interpolation Back fill Forward fill
(notebook code on Github)
Sumo Logic Confidential
Fixed-threshold alerting
”Wake somebody up if the site is down”
Sumo Logic Confidential
MACHINE SCALE = overwhelming complexity!
N ≈ one million series
" pairs to compare
expert human time and attention?
Sumo Logic Confidential
Sumo Logic Confidential
ML cheat sheet
Is machine learning right for you?
Do you know what you’re trying to accomplish? Can you do it with simple / deterministic analysis? YES NO YES
NO
Sumo Logic Confidential
Surprise: Your prediction is wrong!
Sumo Logic Confidential
Outlier detection via predictive modeling
KEY ASSUMPTIONS
1. In “steady-state”, data exhibit some regularity / predictability 2. Learn a model of this behavior 3. Major deviations from our expectation represent new underlying behavior or totally novel “exogenous shock” 4. These surprises are valuable to discover “It’s tough to make predictions, especially about the future”
Sumo Logic Confidential
Outlier detection via predictive modeling
KEY ASSUMPTIONS
In “steady 1.
regularity / predictability Learn a model of this behavior 2. Major 3. deviations from our expectation represent new underlying behavior or totally novel “exogenous shock” These surprises are valuable to discover 4.
KEY Qs
1. Is behavior actually regular? 2. How to model behavior? 3. How major is “major”? 4. Are surprises actually valuable? “It’s tough to make predictions, especially about the future”
Sumo Logic Confidential
Simple example: rolling window
– predict as sliding window avg
– standardize on sliding window std dev
– very simple / naïve – Doesn’t handle well:
– easy to visualize – people can understand it
aka “Bollinger bands”
Sumo Logic Confidential
Little fancier: autoregression (AR)
combination of previous N
“rolling avg” in this framework?
Estimate future based on past
%&' (
2 4 6 8 10 12 14 16 8:01 8:02 8:03 8:04 8:05
foobuzz write_latency
Sumo Logic Confidential
Little fancier: autoregression (AR)
combination of previous N
“rolling avg” in this framework?
Estimate future based on past
%&' (
2 4 6 8 10 12 14 16 8:01 8:02 8:03 8:04 8:05
foobuzz write_latency
Sumo Logic Confidential
Little fancier: autoregression (AR)
combination of previous N
“rolling avg” in this framework?
Estimate future based on past
%&' (
2 4 6 8 10 12 14 16 8:01 8:02 8:03 8:04 8:05
foobuzz write_latency
Sumo Logic Confidential
Little fancier: autoregression (AR)
combination of previous N
“rolling avg” in this framework?
Estimate future based on past
%&' (
2 4 6 8 10 12 14 16 8:01 8:02 8:03 8:04 8:05
foobuzz write_latency
Sumo Logic Confidential
Fixed-length feature vectors
NOTE: can add other variables (eg, host load) to context
Sumo Logic Confidential
Data with linear trend
Easy to fit a
Estimate future based on past
!" = $ ∗ & + ( + )"
Sumo Logic Confidential
Data with linear trend
simple differencing operation
Estimate future based on past
0.5 1 1.5 2 2.5 3 3.5 7:59 8:00 8:02 8:03 8:05
foobuzz disk_used
0.2 0.4 0.6 0.8 1 1.2 8:02 8:03 8:04 8:05
diff'ed
# = !" − !"&'
Sumo Logic Confidential
Seasonality
Very common in data linked to human activity
Sumo Logic Confidential
Seasonality
Very common in data linked to human activity
Sumo Logic Confidential
Modeling seasonal data
– (p)ACF plots (Rob J. Hyndman) – FFT spectrum
– ”manual” adjustment – ARIMA – Seasonal Holt-Winters – Fourier coefficients (eg, FB Prophet)
Detection + Modeling
Sumo Logic Confidential
Latent state models
associated with a hidden state s
– Rabiner 1989 – Jurafsky & Martin book chapter
latent/hidden state information
Observed data produced by hidden mechanism
Sumo Logic Confidential
Bayesian Change Point Detection
Ryan Prescott Adams & David J.C. MacKay
2 4 6 8 10 12 14 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149
Example: system occasionally does internal “maintenance”
Sumo Logic Confidential
Bayesian Change Point Detection
Ryan Prescott Adams & David J.C. MacKay
2 4 6 8 10 12 14 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149
5 10 15 20 40 60 80 100 120 140 160
Example: system occasionally does internal “maintenance”
Sumo Logic Confidential
Why are we doing this again?
Which aspects don’t scale “manually”?
– capacity planning – preventative maintenance
– model accurately characterizes “typical” behavior – significant surprises may therefore be interesting – useful if you:
Sumo Logic Confidential
Code & data resources
Try this at home!
– pandas – StatsModels – scikit-learn – Keras
– Numenta Anomaly Benchmark (NAB) –
Yahoo Extendible Generic Anomaly Detection System (EGADS)
– Kaggle time series datasets
Sumo Logic Confidential
Sumo Logic Confidential
Identifying similar behaviors
We have lots of machines
Which other hosts may be
Sumo Logic Confidential
Metric similarity: naïve approach
Are these ”behaving similarly”?
– ", $ = " − $ ' Spikes are “disjoint” – Distance would be large –
Intuition: can we slightly shift?
–
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 12 A B
Sumo Logic Confidential
Metric similarity: naïve approach
“Hosts who look like X”
– ! ", $ = " − $ ' – Spikes are “disjoint” – Distance would be large
– Would be very similar…
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 12 A B
Sumo Logic Confidential
Metric similarity: Dynamic Time Warping (DTW)
best alignment
Diagram: “Fast Multisegment Alignments for Temporal Expression Profiles”
(UW Madison) - Comput Syst Bioinformatics Conf. 2008
Sumo Logic Confidential
Top N similar hosts via DTW
Highly similar hosts
Sumo Logic Confidential
Top N similar hosts via DTW
Highly similar hosts
Sumo Logic Confidential
Build a graph of host-host similarity
Edge weight ∝"# DTW distance
Sumo Logic Confidential
Spectral clustering
Tutorial (von Luxborg), sklearn implementation
Sumo Logic Confidential
Spectral clustering
Tutorial (von Luxborg), sklearn implementation
Sumo Logic Confidential
Sumo Logic Confidential
Anatomy of a log message: Five W’s
65
Sumo Logic Confidential
Anatomy of a log message: Five W’s
66
When? Timestamp with time zone
Sumo Logic Confidential
Anatomy of a log message: Five W’s
67
When? Timestamp with time zone Where? Host, module, code location
Sumo Logic Confidential
Anatomy of a log message: Five W’s
68
When? Timestamp with time zone Where? Host, module, code location Who? Authentication context
Sumo Logic Confidential
Deriving time series from log data
Logs: what are they good for?
Sumo Logic Confidential
Deriving time series from log data
Logs: what are they good for?
Sumo Logic Confidential
Deriving time series from log data
Logs: what are they good for?
Sumo Logic Confidential
Deriving time series from log data
Logs: what are they good for?
Sumo Logic Confidential
02/15/2014 10:03:16 UTC Health status check: zim-5 is OK
Sumo Logic Confidential
printf(“%s Health status check: %s is %s”, timestamp, hostid, hoststatus) 02/15/2014 10:03:16 UTC Health status check: zim-5 is OK
Sumo Logic Confidential
02/15/2014 10:03:16 UTC Health status check: zim-5 is OK 02/15/2014 10:03:11 UTC Health status check: gir-3 is OK 02/15/2014 10:03:07 UTC Health status check: gir-2 is TIMED OUT 02/15/2014 10:02:45 UTC Health status check: dib-1 is OK printf(“%s Health status check: %s is %s”, timestamp, hostid, hoststatus) 02/15/2014 10:03:16 UTC Health status check: zim-5 is OK
Sumo Logic Confidential
02/15/2014 10:03:16 UTC Health status check: zim-5 is OK 02/15/2014 10:03:11 UTC Health status check: gir-3 is OK 02/15/2014 10:03:07 UTC Health status check: gir-2 is TIMED OUT 02/15/2014 10:02:45 UTC Health status check: dib-1 is OK
$DATETIME Health status check: **** is ****
printf(“%s Health status check: %s is %s”, timestamp, hostid, hoststatus) 02/15/2014 10:03:16 UTC Health status check: zim-5 is OK
Sumo Logic Confidential
Log data as (approximate) program execution trace
Logs emitted by printf()
can vary with code path Changes in printf()
behavior changes
Code + Behavior → Logs
Sumo Logic Confidential
Health check OK Request processed Txn timeout, retry
Log cluster counts as multivariate time series
Sumo Logic Confidential
Health check OK Request processed Txn timeout, retry
Log cluster counts as multivariate time series
Sumo Logic Confidential
Distances over multivariate count vectors
– track “distance” between recent time and historical average – do rolling outlier on this quantity
Comparing printf counts
5 10 15 20 25 30 35 A B C D
Histogram Distance
Vector 1 Vector 2
!(#| Q = '
(
# ) log(# )
Sumo Logic Confidential
Multiclass event classification
neighbors via cosine similarity
IDEA: categorize anomalies by difference vector
5 10 15 20 25 30 35 A B C D
Histogram Distance
Vector 1 Vector 2
5 10 15 20 25 30 A B C D
Difference
Sumo Logic Confidential
Sumo Logic Confidential
Some warnings on thresholds
Family-wise Error Rate (FWER)
– Over 1M series you can expect ~100 false positives
– Baron Schwartz (VividCortex), O’Reilly Strata San Jose 2018
– Just divide p-value by number
Sumo Logic Confidential
Finance: time series epistemology for “fun” and “profit”
“Predicting stock returns with random hybrid convolutional deep recurrent neural networks”
Rich source of errors
” (DANGER!) Blog takedowns
– blog 1, blog 2) Knight Capital –
Sumo Logic Confidential
Sanity check: historical backtesting
Simulated “replay” of past data
financial domain
data interaction
machine data modeling (?)
Sumo Logic Confidential
Sumo Logic Confidential
Bayesian methods
(one) advantage: explicit uncertainty modeling
Plot from scikit-learn docs (Vincent Dubourg , Jake Vanderplas, Jan Hendrik Metzen)
Sumo Logic Confidential
Hierarchical Bayesian methods
Representations for Scalability
– Emily Fox (Univ of Washington)
Put a prior on your prior! time machine cluster
!",$(&) (",$ )"
Exploit structure?
Sumo Logic Confidential
Sumo Logic Confidential
Sumo Logic Confidential
Everything we’ve discussed, but more parameters and stuff
– Understanding the problem domain – Framing the ML problem
– AWS Deep AR service (arXiv) – Relevant flavors:
– (potential) advantage: certainly should have plenty of machine data to train…
Probably something relevant has been published on arXiv during this talk…)
Sumo Logic Confidential
“In conclusion, machine data is a land of contrasts…”
Sumo Logic Confidential
Sumo Logic Confidential
Overfitting 101
Too much of a good thing
– Risk of “over”-fitting to idiosyncratic noise in your training data – Actually degrades true predictive performance
Training iteration Generalization error Overfitting!
Sumo Logic Confidential
Example: K=4
against “new” data See
CMU lecture notes
K-fold cross validation
Split data into k batches with train/test splits
Sumo Logic Confidential
Advanced overfitting: human-in-the-loop
Manual search: model selection, network architecture, hyperparameters, …
82% 83% 84% 85% 86% 87% 88% 89% 90% 91% A B C D
"Test" accuracy
k-fold CV
Sumo Logic Confidential
Advanced overfitting: human-in-the-loop
Manual search: model selection, network architecture, hyperparameters, …
k-fold CV
78% 80% 82% 84% 86% 88% 90% 92% A B C D
Human overfitting
"Test" accuracy True generalization
Sumo Logic Confidential
Advanced overfitting: human-in-the-loop
How to avoid this?
– Related to theory of differential privacy (Cynthia Dwork & Aaron Roth – pdf) – The reusable holdout: Preserving validity in adaptive data analysis – Generalization in Adaptive Data Analysis and Holdout Reuse – Google Research blog post
Sumo Logic Confidential
P-hacking: FWER on steroids
Big data → easy to find “significant” results
doing this!