using machine learning for intelligent storage
play

Using Machine Learning for Intelligent Storage Performance Anomaly - PowerPoint PPT Presentation

Using Machine Learning for Intelligent Storage Performance Anomaly Detection Ramakrishna Vadla, IBM Archana Chinnaiah, IBM Acknowledgement : Sumant Padbidri, Anbazhagan Mani Agenda Market Estimates & Forecasts Applications in


  1. Using Machine Learning for Intelligent Storage Performance Anomaly Detection Ramakrishna Vadla, IBM Archana Chinnaiah, IBM Acknowledgement : Sumant Padbidri, Anbazhagan Mani

  2. Agenda • Market Estimates & Forecasts • Applications in Storage • Cloud Architecture • Anomaly Detection • Performance Anomaly Detection

  3. AI & ML - Market Estimates & Forecasts ü Worldwide revenues for cognitive and AI systems will increase from $12.5B in 2017 to more than $46B in 2020 ü IDC forecasts spending on AI and ML will grow from $12B in 2017 to $57.6B by 2021. ü Machine learning patents grew at a 34% between 2013 and 2017, 3rd-fastest growing category of all patents granted. Source: IFI Claims Patent Services (Patent Analytics). 8 Fastest Growing Technologies SlideShare Presentation. Source:http://www.forbes.com

  4. AI & ML - Market Estimates & Forecasts Why Now? ü Enormously increased data - 90% data created in last couple of years ü Substantially more-powerful computer hardware – CPU, GPU ü Cloud makes big data more widely accessible ü Significantly improved algorithms Source: Deloitte Global Predictions 2018 Infographics

  5. Machine Learning Applications in Storage Applications Value Proposition Ø Predictive Analytics ü Prevent Issues proactively before they occur. Ø Capacity Forecasting – (Regression) Ø Power consumption in data centers – (Regression) ü Avoid downtime & Achieve Ø Tracking of known issues - Learn from other customer issues - uptime 99.999% (Classification) Ø Predicting blocks to be accessed in near future (Recommendations) ü Cost efficiency - Reduce storage & operational costs Ø Performance anomaly detection Ø Performance metrics analysis (Time-series data analysis) ü Data Storage Optimization Ø Automated Triaging and Root Cause Analysis (Classification) Ø Log analysis - (Clustering) ü Simplifying the support Ø Configuration best practices recommendations ü Proactive notification of risks Ø Manual upgrades/Automated upgrades and health checks Ø Configuration validation to avoid interruptions in service Ø Intelligent Performance Tuning

  6. Cloud Architecture - Storage Analytics The world’s most valuable resource is no longer oil, but data www.economist.com ü Cloud based scale-out architecture. Elastic IBM Hadoop Spark ü Storage systems support data collection with Search Watson high frequencies, seconds, minutes. ü More data available for analysis. ü Data lake based on NoSQL such as Cassandra Data Lake deployed on the cloud. Client ü All clients send storage metric data to cloud – performance, config and health data. ü Multi-tenancy support. Client Client Client ü Support for integration of ML tools.

  7. Machine Learning – Anomaly Detection Predict based on training data containing desired outputs. Training data contains normal and anomaly labelling • Supervised Learning • Regression, Classification, Decision trees, Random forests, K-Nearest Neighbor, SVM Doesn’t include desired outputs, goal to discover patterns No labels provided – assumption anomalies are very rare compare to normal • Unsupervised Learning • Clustering - K-Means, Hierarchical, DBSCAN, Time-series analysis, ARIMA Training data includes a few desired outputs Semi-supervised Learning Training data contains only normal labelling Rewards from sequence of actions Reinforcement Learning Agent -> Action - > Environment -> Reward & State -> Agent (Markov Decision Process)

  8. Storage Performance Challenges Bottlenecks Metrics Correlations • Disk failure/Inaccessible disks • I/O Rate R/W, • CPU & Network Traffic • Read/Write I/O errors • Data Rate R/W, • CPU & Memory • Volume issues • Response time R/W, • Port & Host counters • Port masking • Cache hit R/W, • IOPs, read rate, & CPU, • Configuration issues – Host, memory • Data block size R/W, Storage subsystem, port, Interoperability • Porta data rate R/W, • Network congestion • Port-local node queue time • Workload configurations • UPS battery failure • Port protocol errors, • Port congestion

  9. Performance Anomaly Detection Clustering – Outlier detection K-Means DBSCAN

  10. Performance Anomaly Detection Time Series Anomaly Detection • ARIMA - AutoRegressive Integrated Moving Average IOPs Rate Anomaly

  11. Log Analysis – Anomaly Detection 2018-05-05 09:11:20.672 [<Device>] [<Thread>] [INFO] Processing complete. Log Collection 2018-05-05 09:11:20.672 [<Device>] [<Thread>] [INFO] Processing complete . Log Parsing Feature Extraction [timestamp, device, process state]. Anomaly Detection Time-series Analysis

  12. Q & A Thank You

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend