Building an analytic department From Zero to TensorFlow 1 The Peter - - PowerPoint PPT Presentation

building an analytic department
SMART_READER_LITE
LIVE PREVIEW

Building an analytic department From Zero to TensorFlow 1 The Peter - - PowerPoint PPT Presentation

Building an analytic department From Zero to TensorFlow 1 The Peter principle: People in a hierarchy tend to rise to their level of incompetence : An employee is promoted based on their success in previous jobs, until they reach a level


slide-1
SLIDE 1

1

Building an analytic department

From Zero to TensorFlow

slide-2
SLIDE 2

The Peter principle: People in a hierarchy tend to rise to their “level of incompetence”: An employee is promoted based on their success in previous jobs, until they reach a level at which they are no longer competent, as skills in one job do not necessarily translate to another.

slide-3
SLIDE 3

3

Introductions

Antoine Desmet

Analytics manager – Smart Solutions, Komatsu

slide-4
SLIDE 4
slide-5
SLIDE 5

Hunter Valley

slide-6
SLIDE 6

2000 The US Defense Department ended the purposeful degradation of GPS 2008 Komatsu releases Level 4 autonomy, driverless truck fleet. Operates even if wireless link is lost

slide-7
SLIDE 7

Real-time terrain mapping

  • LIDAR on diggers
  • Scans stitched together

into terrain map

  • Compare to plan
  • Operator sees:
  • Red: over-dug
  • Blue: matches plan
  • Green: needs digging

In near real-time

slide-8
SLIDE 8

Topics This is the story of a growing analytics team. It’s a business-oriented presentation A collection of thoughts and discoveries: sorry if I don’t have all the definitive answers

  • Background
  • The beginning: small vs. big?
  • Growth
  • R&D
  • Picking your projects
  • Stakeholder management
  • What’s next
slide-9
SLIDE 9

9

Background

What data do we have, what we do with it… and WHY?

slide-10
SLIDE 10
slide-11
SLIDE 11

Ore extraction chain Cost=20,000$/hr Revenue = 40,000$/hr Profit when operating = +20,000 Profit on breakdown = -15,000 A leaking air hose: Time to fix = 1-2 hr Parts + labour = $300 Loss of production = 15-30 k$

The cost of downtime

slide-12
SLIDE 12

Payload Operator’s joysticks Temperatures Motor currents Machine’s motions Auto-lube system Air pressure Brakes status

800 sensors Sampling rate: 100ms max

slide-13
SLIDE 13

What we provide

  • The machine’s control system will “fault” if it detects a severe malfunction
  • Unplanned downtime is extremely costly in the mining industry
  • We analyse telemetry data to detect issues before they trigger a system fault
  • It’s not so much about saving the part. By the time we can detect a malfunction, often it’s

already beyond repair

  • It’s about giving customer time to plan maintenance for what would otherwise be a disruptive

unplanned breakdown

slide-14
SLIDE 14

14

In the beginning

At the peak of the “big data” hype cycle

slide-15
SLIDE 15

Day 1

  • 2014: one engineer (me) and one manager (sales)
  • At the peak of the “Big Data” craze, but…
  • In the midst of a mining downturn: 


no budget, pressure to deliver

  • 6 years prior, a visionary setup dataloggers + backend


to harvest hundreds of sensor data at high rez
 = lots of data

  • Data locked-up in antiquated time-series databases
  • Fragile infrastructure
  • Zero process

You are here

slide-16
SLIDE 16

The Skunk works

  • Hired a couple of summer interns to boost output
  • Version control = copy/paste in separate folders


That’s OK because there were only a couple of developers

  • Built an rudimentary “model factory” data-dredging algorithm – without any hypothesis or prior
  • assessment. Generally viewed as poor practice…


That’s OK because it’s machine data: correlations usually indicate something mechanically or electrically coupled. Feature engineering made it work.
 3 Months=wide “coverage” of the machine.

  • Do everything on your laptop, then straight to Production


That’s OK because there were no contracts or nothing mission critical. 
 Mission-critical was demonstrating value

slide-17
SLIDE 17
slide-18
SLIDE 18

Reflections: Small Vs. Big

Small / startup model:

  • Loose plan, objectives and strategy
  • Less capital investment from business, so lower expectations
  • Pick problems yourself: those that seem relevant, and “safe bets” = quick wins in months
  • High risk of picking the wrong projects. Fast but disorganised, bound to run into scaling issues

Big / corporate model:

  • Large investment, financial targets set from the start
  • Regimented methods, pressure to deliver may hinder creativity
  • 1 year, 10 DS: explore, investigate use cases for analytics
  • Well organised, safe-but-slow approach, prepared for the long-term
slide-19
SLIDE 19

19

Growing

Product: tick – customers: tick – what’s next?

slide-20
SLIDE 20

Another start-up that became bloated

Mech/Elec engs were very productive and creative… but things started to tear at the seams:

  • Why document when everyone knows… bus factor!
  • IT upgrading databases crippled us with rework.
  • Lack of software engineering practices = poor: reliability, readability, re-useability,
  • Things started to slow down.
  • Routine means you become blind to your own deficiencies.
  • Hard to see the paradigm shift: “remember how we used to be faster, what happened?”
  • Accept that things are the way they are. Getting a clean run or working faster isn’t possible.
slide-21
SLIDE 21

Today

  • 2-3 years later, we welcomed 3 team members, including a senior software dev.
  • The software dev went on a crusade (still going) for: unit tests, doc, libraries
  • The “old guard” had to lift their games and mature to integrate the “fresh blood”. Helped kick

the old counter-productive habits, and work towards increasing quality and pace Our team now has:

  • 2 Data scientists: the theory
  • 2 Engineers: make it work
  • 2 Software developers: make it scale
  • 1 Analyst / report developer: make it visible
  • 3 Subject matter experts: make it relevant
slide-22
SLIDE 22

Workflow challenges

The release cliff-hanger:

  • Analysts are fluent at developing models on

their laptop…

  • Releasing an analytic into production is a rare
  • event. Lack of practice = frequent fails

Trialling a solution:

  • Start with Test release of “skeleton”
  • Instead of leaving release as final step
  • DevOps 101: release early and frequently!

PROD success

  • uch
  • uch
  • uch
slide-23
SLIDE 23

Workflow challenges

From bench to streaming:

  • R&D happens on a static block of time-series data (e.g. one month).
  • Challenge = from static to live streaming: batch size, handover between batches, catching-up

(maintain full history) vs forcing forward (satisfy real-time) Standardise

  • Build high-level functions & templates to abstract real-time execution aspects.
  • Don’t lock-down the process and make it hard to build “non-standard”
  • Standardising helps maintainability, collaboration, etc.
slide-24
SLIDE 24

3 aspects of Continuous improvement

Streamline actioning the insights Streamline tools for faster analytics development Streamline analytics: generic and re-useable

slide-25
SLIDE 25

25

R&D

slide-26
SLIDE 26

Finance, industrial plants and insurance analytics

Industrial analytics are a niche application, no-one can help me! 
 What could there be to gain by outside of my industry?

  • Finance f(A,B) = Ĉ C is a share price, A and B the competitor’s share prices


If Ĉ >> C: sell, Ĉ << C:buy, Ĉ = C: do noting

  • Insurances f(A,B) = Ĉ, C is the amount claimed, A and B some parameters of the claim


Ĉ ≈ C: do nothing, Ĉ << C: investigate a potentially fraudulent claim

  • Plant analytics f(A,B) = Ĉ, C is the temperature of a motor, A and B are brearing temps.


Ĉ ≈ C: do nothing, Ĉ << C motor potentially overheating At the right level of abstraction, it all becomes the same. 
 Talk to people. But I’m preaching the choir!

slide-27
SLIDE 27

Interns for R&D

  • Autonomy: R&D can be insulated from the production systems. Low risk to business.


Here’s a dataset, install [ your favourite toolset ] and go get it, tiger!

  • This usually produces a proof-of-concept
  • An intern can clear the fog on that high risk/high value project. You can make a sound

decision to proceed forwards, without having used any precious permanent employee time

  • With the right intern: the newer the tech, the greater the challenge… the more they engage!
  • Co-supervision with an academic will inject a lot of their knowledge in your project. This is
  • ften a better solution vs. directly engaging into a research project with academics
  • You can hire the outstanding ones, risk free!
slide-28
SLIDE 28

28

Picking projects

business value vs. geeky indulgence

slide-29
SLIDE 29

A tale of two companies merging Komatsu Mainly sells dump trucks A mine owns 50-200 + spare units Less expensive, small loss is not-mission critical Analytics strategy focus on compliance to scheduled maintenance, part sales, operator abuse

P&H

P&H Mainly sells primary digging equipment A mine owns 1-5 of them, no redundancy Very expensive, “top of the pyramid” Analytics strategy focus on fault prediction & uptime maximisation: keep them running 24/7

slide-30
SLIDE 30

The “no free lunch” of analytics Leaking air hose Gearbox failure

Recurrent, low impact, easy: supervised Rare, extremely high impact, hard: unsupervised

slide-31
SLIDE 31

TensorFlow to the rescue!

Need a generic Time Series pattern recognition Weary of the deep-learning hype: “hot topic” of 2016…
 At the peak of Gartner’s “hype curve” Is it just for images? An overkill? A summer intern ran the project with great success (accurate and generalises) CNN + LSTM is our standard approach to detect failure patterns in automated systems. Interested in the details?
 Data Science Sydney Meetup - Tue 28 May

slide-32
SLIDE 32

Anomaly detection with automated data-dredging

  • Lots of correlations across coupled sensors
  • Leverage robustness of ensembles. The fault

you’re trying to detect doesn’t “spill” over everywhere

  • Estimate sensor value, based on

its coupled counterparts + lots of feature engineering

  • Compare estimate with reality
  • Build a model for each pair-wise combination where corr>threshold
slide-33
SLIDE 33

33

Stakeholder management

Warning: sarcastic content, rants,
 memes and exaggeration (for comical purposes only)

slide-34
SLIDE 34
slide-35
SLIDE 35

Data-driven vs. storytelling-driven

Data geek + subject geek = Successful analytic

  • Field experts in mining are incredibly knowledgeable on the machines. Human-Wikipedia
  • level. They are the authority. They know it better than those who designed it.
  • Data-driven approach vs. storytelling approach: 


“show me examples of what you are looking for” – “It’s a rare one, there’s none in the DB”
 “let me tell you how the machine works and fails” – “I don’t understand ”
 Generate synthetic failure data?

  • Explain the worth of machine learning, when they prefer describing things using a (long)

series of logic statements.

  • Then explain most ML algos are “black-box”, you can’t “trace” an issue, only fix is: more

training data.

slide-36
SLIDE 36
slide-37
SLIDE 37

Why unsupervised? I know everything!

  • Dealing with people who spent a large chunk of their lives getting to know a particular piece of

equipment, like the back of their hand.

  • Realistically, they know 99% of what can go wrong
  • Sell the value of anomaly detection – when they know every single way the machine can fail

and would prefer a supervised approach

  • Yet there are 800 sensors – Sure enough, anomaly detection uncovers unknown (but rare)

issues.

slide-38
SLIDE 38

The tricky bits: dealing with the impact

Some potentially unwanted results:

  • Applying proper statistics tells you “by how much you can’t be sure”. You are trading “ignorant

certainty” for “educated uncertainty”… Often people feel like they lost out.

  • Customer claims warranty on broken parts, but data shows customer abused the machine
  • Onsite maintenance crews’ job insecurity: analytics are out to take my job!
  • Some business models involve charging for machines by the “engine-running” hour. You

uncover ways for customers to reduce idle time… Sales is gong to love you

slide-39
SLIDE 39

39

What’s next

The human element and the edge

slide-40
SLIDE 40

Next steps

Optimising operations

  • Machines have complex automation, ultimately automation=low variance = good models. 


A Human operates the machine = immense variance. Yet there’s a lot of potential in

  • ptimising how they use and control the machine.

Streaming analytics

  • Batch = 1,000 lines of code. Re-written in a stream context = 300 lines

Cloud vs. Edge analytics

  • Cloud Vs. Edge. Cloud is great for development, but not for low-latency. Edge is great for fast
  • feedback. When an operator misbehaves, they need a notification within 5 seconds. Edge
  • vercomes wireless network latency and reliability issues.

Sharing the insights with the business

  • Connecting with the business: releasing insights into business databases (SalesForce, etc.)
slide-41
SLIDE 41

41

Questions?