using multi system monitoring time series to predict
play

Using Multi-System Monitoring Time Series to Predict Performance - PowerPoint PPT Presentation

Using Multi-System Monitoring Time Series to Predict Performance Events Andreas Schrgenhumer Mario Kahlhofer Peter Chalupar Hanspeter Mssenbck Paul Grnbacher 09.11.2018 Motivation t 2 Motivation t 2 Motivation t Train ML 2


  1. Using Multi-System Monitoring Time Series to Predict Performance Events Andreas Schörgenhumer Mario Kahlhofer Peter Chalupar Hanspeter Mössenböck Paul Grünbacher 09.11.2018

  2. Motivation t 2

  3. Motivation t 2

  4. Motivation t Train ML 2

  5. Motivation t Train ML t 2

  6. Motivation t Train ML Predict t 2

  7. Motivation t Train ML Predict t 2

  8. Motivation t Train Straightforward: ML Single system • Single component • Predict Univariate time series • t 2

  9. Motivation Multiple, interlinked components 3

  10. Motivation Multiple, interlinked components Multivariate time series 3

  11. Motivation Multiple, Event to data interlinked connection components Multivariate time series 3

  12. Motivation Multiple systems Multiple, Event to data interlinked connection components … Multivariate time series 3

  13. Motivation Multiple systems Multiple, Event to data interlinked connection components … Multivariate time series ML Train 3

  14. Approach Configs Multi- Preprocessing ML System CSVs Framework Data 4

  15. Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data 4

  16. Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data (2) Preprocessing 4

  17. Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data (2) Preprocessing (3) Prediction 4

  18. (1) Data System 1 * Service * * Host 1 1 * * Network Disk Interface 5

  19. (1) Data System 1 250 systems 20-day export * Service * * Host 1 1 * * Network Disk Interface 5

  20. (1) Data Service slowdowns System 1 250 systems 20-day export * Events Service * * Host 1 1 * * Network Disk Interface 5

  21. (1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Series Host 1 1 * * Network Disk Interface 5

  22. (1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Available Series Host Read time Write time 1 1 … * * Network Disk Interface 13 Time Series 5

  23. (1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Available Bytes received Series Host Read time Bytes sent Write time Packets dropped 1 1 … … * * Network Disk Interface 13 Time 10 Time Series Series ... 1-minute resolution 5

  24. (2) Preprocessing – Framework Preprocessing Framework 6

  25. (2) Preprocessing – Framework • Input: YAMLs (configurations/configs) • Contains all necessary data processing settings • Easily changeable due to YAML format systems: - “sys1” systems: - “sys2” - “sys1” systems: timeSeries: - “sys2” - “sys1” - CPU_LOAD timeSeries: - “sys2” from: “2018 -01- 19 09:00” - CPU_LOAD timeSeries: to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” - CPU_LOAD ... to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” leadTime: 0 ... to: “2018 -02- 02 09:00” observationWindowsBoxes: leadTime: 0 ... CPU_LOAD: observationWindowsBoxes: leadTime: 0 Preprocessing - size: 60 CPU_LOAD: observationWindowsBoxes: step: 1 - size: 60 CPU_LOAD: Framework aggregationFunctions: step: 1 - size: 60 - “AVG” aggregationFunctions: step: 1 combinationFunctions: - “AVG” aggregationFunctions: - “AVG” combinationFunctions: - “AVG” samplingMode: “PER_EVENT” - “AVG” combinationFunctions: missingDataPointMode: “NAN” samplingMode: “PER_EVENT” - “AVG” addAttributes: true missingDataPointMode: “NAN” samplingMode: “PER_EVENT” ... addAttributes: true missingDataPointMode: “NAN” ... addAttributes: true ... 6

  26. (2) Preprocessing – Framework • Input: YAMLs (configurations/configs) • Contains all necessary data processing settings • Easily changeable due to YAML format • Output: CSVs (feature vectors) • Portable format, directly useable for ML systems: - “sys1” systems: - “sys2” - “sys1” systems: timeSeries: - “sys2” - “sys1” - CPU_LOAD timeSeries: - “sys2” from: “2018 -01- 19 09:00” - CPU_LOAD timeSeries: to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” - CPU_LOAD ... to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” leadTime: 0 ... to: “2018 -02- 02 09:00” CPU_LOAD:AVG System Label observationWindowsBoxes: leadTime: 0 ... CPU_LOAD:AVG System Label 0.95 sys1 Event CPU_LOAD: observationWindowsBoxes: leadTime: 0 CPU_LOAD:AVG System Label Preprocessing 0.95 sys1 Event - size: 60 CPU_LOAD: observationWindowsBoxes: 0.71 sys2 No event 0.95 sys1 Event step: 1 - size: 60 CPU_LOAD: 0.71 sys2 No event Framework 0.90 sys2 Event aggregationFunctions: step: 1 - size: 60 0.71 sys2 No event 0.90 sys2 Event - “AVG” aggregationFunctions: step: 1 0.87 sys2 No event 0.90 sys2 Event combinationFunctions: - “AVG” aggregationFunctions: 0.87 sys2 No event - “AVG” combinationFunctions: - “AVG” 0.84 sys1 No event 0.87 sys2 No event samplingMode: “PER_EVENT” - “AVG” combinationFunctions: 0.84 sys1 No event missingDataPointMode: “NAN” samplingMode: “PER_EVENT” - “AVG” 0.84 sys1 No event addAttributes: true missingDataPointMode: “NAN” samplingMode: “PER_EVENT” ... addAttributes: true missingDataPointMode: “NAN” ... addAttributes: true ... 6

  27. (2) Preprocessing – Config Settings Setting Example Systems [sys1, sys2, ...] Time series [Host: CPU_LOAD, Disk: AVAILABLE, ...] From: 19-01-2018 09:00 Time frame To: 02-02-2018 09:00 Sampling mode PER_EVENT, SLIDE_THROUGH Negative sampling source NON_EVENT_SERVICES, EVENT_SERVICES, ... Lead time 10 min Observation windows 60 min, AVG aggregation, AVG combination Missing data mode DROP, NAN, LAST_VALUE, ... Metadata System, special attributes, ... ... ... 7

  28. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  29. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  30. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  31. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  32. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  33. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  34. (3) Prediction 20 days, 250 systems, 34 time series t Preprocessing Framework 9

  35. (3) Prediction 20 days, 250 systems, 34 time series t 14d 6d Train Test Preprocessing Framework 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend