WWW.MANIFOLD.AI
Optimizing Physical Assets with Machine Learning Rajendra Koppula - - PowerPoint PPT Presentation
Optimizing Physical Assets with Machine Learning Rajendra Koppula - - PowerPoint PPT Presentation
Optimizing Physical Assets with Machine Learning Rajendra Koppula WWW.MANIFOLD.AI About Us Manifold is a full-service AI development services firm that accelerates AI development for leading companies. Our team has a proven ability to design,
WWW.MANIFOLD.AI
About Us
Manifold is a full-service AI development services firm that accelerates AI development for leading companies. Our team has a proven ability to design, build, deploy, and manage data applications at scale.
WWW.MANIFOLD.AI
Audience & Agenda
Audience
- Practitioners with some
knowledge of PyData eco- system, ML workflows
Slides
www.manifold.ai/2019SensorsExpo
Agenda
- Introduction & Motivation
- Design Patterns
- Conclusion & Key Takeaways
WWW.MANIFOLD.AI
Lean AI
- 1. Build the simplest
E2E system first.
- 2. Make iterations as
quickly as possible.
WWW.MANIFOLD.AI
Case Study
- Leading industrial services company
- “We want to use AI to be more efficient across our operations. The
vision is to create a system for making better decisions.”
WWW.MANIFOLD.AI
Business Understanding Workshop
- I get paid for uptime, how can I make
that higher?
- Unplanned maintenance costs me a lot
- f time and money and erodes customer
satisfaction, how can I prevent that?
- I roll trucks every 30 days for
preventative maintenance, no matter
- what. Can I go less often?
- I have sensors on all these units and I’ve
been collecting data for a few years. I want to get more value out of this instrumentation.
- Many, many more...
What are your business problems (that you think AI can help you with)?
WWW.MANIFOLD.AI
AI Uncertainty Principle
AI AI v value ≤ bu business value x da data ta quality ty x pr predictive sign gnal
Multiplicative! If any term goes to 0, value goes to 0!
WWW.MANIFOLD.AI
Create an AI Specification
- Predict major faults where machine is
continuously down for >2 hours.
- Predict whether major fault will
happen over a horizon of 1, 2, … , 5 days.
- Use machine-generated data as input
features, e.g., ~30 continuous time series, ~20 discrete time series.
- Use demographic data about
machines, e.g., unit type, location, etc.
- Do not use human-generated service
data because of data quality issues.
WWW.MANIFOLD.AI
Typical ML Workflow
Database S3
Mod Modeling Mod Model Dep Deploy
- ymen
ent Fe Feature En Engineering Pre Prepro processin ing
WWW.MANIFOLD.AI
Lookback = 2 days Horizon = 5 days
Why This Target?
- Clear business value because company gets paid for uptime and often there is
customer call and truck is rolled if there machine is in major fault.
- Acceptable data quality because it is purely machine generated, i.e., can look at
the status register.
- Defined major as >2 hours continuously in faulted state. Most lesser faults are
automatically or manually cleared before this time.
WWW.MANIFOLD.AI
AI Uncertainty Principle
AI AI v value ≤ bu business value x da data ta quality ty x pr predictive sign gnal
De-risked as much as possible. Have to take leap of faith now.
WWW.MANIFOLD.AI
Data Engineering is the Foundation
Foundation
source: Monica Rogati
WWW.MANIFOLD.AI
Spec the Requirements
The Constants
- AI/ML is software engineering.
- You will develop locally.
- You will develop in the cloud.
- You will collaborate.
- You will experiment.
- You will deploy.
The Variables
- Volume of data
- Velocity of data
- Source of data
- Important features
- Downstream integrations
- Prediction velocity
- Training velocity
WWW.MANIFOLD.AI
Architecting the Solution
The Constants
- Docker-first ML with Orbyter
- github.com/manifoldai/orbyter-
docker
The Variables
- Sampling
- How to generate training and
test data?
- TS data subtleties
- Architecting for Volume
- Spark + DASK + HDF5
- Modeling
- Trees and Interpretability
- Feature Engineering
- Evaluation
- Deployment
WWW.MANIFOLD.AI
What is the ML Problem?
- 50+ sensors logged @ 1 minute intervals 24/7
- Pose as supervised learning problem
WWW.MANIFOLD.AI
Sampling for Supervised Learning
- Train a supervised learning algorithm using historical examples. It
learns patterns where there are failures and looks for them in the future.
- This requires us to pass historical samples in a clean manner by slicing
and dicing the time series the way we need.
ETL
X y
WWW.MANIFOLD.AI
Preventing Data Leakage
- Separate data into training set and validation set, 70%
training, 30% validation. No data leakage.
- Prevent overfitting.
700k samples 200k samples
X700k,54,2880
WWW.MANIFOLD.AI
Sample Rebalancing and Filtering
- Failure is a rare event.
- Many y=0 samples than y=1 samples. May have to rebalance training dataset.
- Invalid sample rejection, for ex - don’t let fault predict fault.
The unit is already significantly faulted at this
- point. Predicting is not really useful at this point.
Horizon = 5 days Lookback = 2 days
WWW.MANIFOLD.AI
Feature Engineering Workshop
Desired output = prioritized list of features
- Need the domain experts in the room, i.e. mechanical engineers, head
- f maintenance, SW engineering
- Feature engineering is the main way you are encoding their domain
knowledge
- Must trade off predictive power with engineering complexity
WWW.MANIFOLD.AI
Feature Engineering
- Continuous Time Series Features
- Mean over lookback
- Variance over lookback
- Fourier Transform
- Trend over lookback
- Discrete Time Series Features
- State counts over lookback
- Demographic Features
- One hot encoded
Feature Matrix
Collapse the time dimension
X700k,54,2880 F700k,54,N
WWW.MANIFOLD.AI
Architect for Volume
- Ingest is optimized for
throughput and high availability
- Data from an asset is spread
across many files in S3
- Varying sizes
- Different time periods
- Sampler Pipeline works well if all the data from an asset is in one
contiguous file => Use Spark to gather, massage and transform
WWW.MANIFOLD.AI
Tools in the Pipeline
- A (very) high-level picture of the pipeline
- Spark for ETL
- Dask for Feature Engineering
- HDF5 as storage engine
WWW.MANIFOLD.AI
Dask: Out of Core
- Create a dask array from a HDF5 dataset
- 250 GB of data on disk
- Pass the 3d dask array to feature engineering step
WWW.MANIFOLD.AI
Dask: Parallelism
- Build the series of features
- All compute is delayed until .compute() is called
WWW.MANIFOLD.AI
Dask: Parallelism
- Another example of feature engineering
- Build a lot of histograms
WWW.MANIFOLD.AI
WWW.MANIFOLD.AI
The Fun Stuff
The fun stuff
source: Monica Rogati
WWW.MANIFOLD.AI
Create a Baseline Model
- classification > regression
- class errors are easier to understand learn from
- even for continuous targets, you may want to do a binary (or
multiclass) classifier before regression
- random forest > gradient boosted trees > deep learning
- few parameters to tune, robust to overfitting, quick to train
- interpretable feature importance to learn from
- pick a few features to start, then create more features
It’s all about learning! Then iterate, iterate, iterate.
WWW.MANIFOLD.AI
Evaluate to Learn
- Aggregate Metrics
- Cross-Validated ROC and AUC =
your score to improve by iterative modelling
- Feature importance done properly
- Individual Metrics (Sample-level)
- Prediction probability distribution
- “Four corners and the middle
analysis”
- most accurate negatives
- most accurate positives
- least accurate negatives
- least accurate positives
- least certain estimates
WWW.MANIFOLD.AI
Iterate the Baseline Model
Feature Matrix
X700k,5
4,2880
F700k,54
,N
Deep Learning (CNNs) Tree Methods (RF and GBT) Feature Engineering Model Evaluation
Model to Deploy
WWW.MANIFOLD.AI
WWW.MANIFOLD.AI
User Feedback Working Sessions
- Multiple structured sessions with final end users. In our case they were
mechanical engineers and maintenance leads.
- Prototype tooling, e.g., nothing, Excel, Jupyter notebooks.
- Observe their workflow and how they integrate predictions.
WWW.MANIFOLD.AI
Not as Simple as Looking at Predictions
- Most high probability of fault units are known stressed units
- Most are in basins where line pressure is high
Example “Stressed” Unit
WWW.MANIFOLD.AI
Prediction Filtering
- Rules on historical predictions to
find “interesting events”
- Different filters for different use
cases
- Absolute probability => stressed
units
- % prob change => “surprising”
daily changes
- Tune rules to appropriate place
- Currently tuned to have low false
positives
- Look for a few things and find
them accurately—status quo for the rest.
current probability of failure: .62 average probability over past 3 days: .44 42% increase in chance of failure
WWW.MANIFOLD.AI
Need Diagnostics to be Actionable
- AI analyzes 70+ parameters to predict probability of failure
- Human spends 10+ minutes looking at the data and may not be able to
see what the AI sees
- Triage needs to be directed to within that parameter set
- Need explainable AI to point user in right direction
“Can you tell me where to look?”
WWW.MANIFOLD.AI
Tree Interpreter
- Identify what sensors are driving
the increased probability of failure
- Absolute
- Daily Change
- This is a good starting point for
the team to look for the causation
Today’s Contributions Daily Change in Contribution
WWW.MANIFOLD.AI
Deliver Workflow Tools, Not Models
- The raw predictions almost always need post processing before they
are useful.
- It is our job as AI engineers to create workflow tools that help users
derive value from the AI.
“Build the UI for the AI”
WWW.MANIFOLD.AI