Building an analytic department From Zero to TensorFlow 1 The Peter - PowerPoint PPT Presentation

Building an analytic department From Zero to TensorFlow 1

The Peter principle: People in a hierarchy tend to rise to their “ level of incompetence ”: An employee is promoted based on their success in previous jobs, until they reach a level at which they are no longer competent, as skills in one job do not necessarily translate to another.

Introductions Antoine Desmet Analytics manager – Smart Solutions, Komatsu 3

Hunter Valley

2000 The US Defense Department ended the purposeful degradation of GPS 2008 Komatsu releases Level 4 autonomy, driverless truck fleet. Operates even if wireless link is lost

Real-time terrain mapping • LIDAR on diggers • Scans stitched together into terrain map • Compare to plan • Operator sees: • Red: over-dug • Blue: matches plan • Green: needs digging In near real-time

Topics This is the story of a growing analytics team. It’s a business-oriented presentation A collection of thoughts and discoveries: sorry if I don’t have all the definitive answers • Background • The beginning: small vs. big? • Growth • R&D • Picking your projects • Stakeholder management • What’s next

Background What data do we have, what we do with it… and WHY? 9

The cost of downtime Ore extraction chain Cost=20,000$/hr Revenue = 40,000$/hr Profit when operating = +20,000 Profit on breakdown = -15,000 A leaking air hose: Time to fix = 1-2 hr Parts + labour = $300 Loss of production = 15-30 k$

Payload Machine’s motions Operator’s joysticks Motor currents Auto-lube system Air pressure 800 sensors Sampling rate: 100ms max Temperatures Brakes status

What we provide • The machine’s control system will “fault” if it detects a severe malfunction • Unplanned downtime is extremely costly in the mining industry • We analyse telemetry data to detect issues before they trigger a system fault • It’s not so much about saving the part. By the time we can detect a malfunction, often it’s already beyond repair • It’s about giving customer time to plan maintenance for what would otherwise be a disruptive unplanned breakdown

In the beginning At the peak of the “big data” hype cycle 14

Day 1 • 2014: one engineer (me) and one manager (sales) • At the peak of the “Big Data” craze, but… • In the midst of a mining downturn:   no budget, pressure to deliver • 6 years prior, a visionary setup dataloggers + backend   to harvest hundreds of sensor data at high rez   = lots of data • Data locked-up in antiquated time-series databases You are here • Fragile infrastructure • Zero process

The Skunk works • Hired a couple of summer interns to boost output • Version control = copy/paste in separate folders   That’s OK because there were only a couple of developers • Built an rudimentary “model factory” data-dredging algorithm – without any hypothesis or prior assessment. Generally viewed as poor practice…   That’s OK because it’s machine data: correlations usually indicate something mechanically or electrically coupled. Feature engineering made it work.   3 Months=wide “coverage” of the machine. • Do everything on your laptop, then straight to Production   That’s OK because there were no contracts or nothing mission critical.   Mission-critical was demonstrating value

Reflections: Small Vs. Big Small / startup model: • Loose plan, objectives and strategy • Less capital investment from business, so lower expectations • Pick problems yourself: those that seem relevant, and “safe bets” = quick wins in months • High risk of picking the wrong projects. Fast but disorganised, bound to run into scaling issues Big / corporate model: • Large investment, financial targets set from the start • Regimented methods, pressure to deliver may hinder creativity • 1 year, 10 DS: explore, investigate use cases for analytics • Well organised, safe-but-slow approach, prepared for the long-term

Growing Product: tick – customers: tick – what’s next? 19

Another start-up that became bloated Mech/Elec engs were very productive and creative… but things started to tear at the seams: • Why document when everyone knows… bus factor! • IT upgrading databases crippled us with rework. • Lack of software engineering practices = poor: reliability, readability, re-useability, • Things started to slow down. • Routine means you become blind to your own deficiencies. • Hard to see the paradigm shift: “remember how we used to be faster, what happened?” • Accept that things are the way they are. Getting a clean run or working faster isn’t possible.

Today • 2-3 years later, we welcomed 3 team members, including a senior software dev. • The software dev went on a crusade (still going) for: unit tests, doc, libraries • The “old guard” had to lift their games and mature to integrate the “fresh blood”. Helped kick the old counter-productive habits, and work towards increasing quality and pace Our team now has: • 2 Data scientists: the theory • 2 Engineers: make it work • 2 Software developers: make it scale • 1 Analyst / report developer: make it visible • 3 Subject matter experts: make it relevant

Workflow challenges The release cliff-hanger: • Analysts are fluent at developing models on their laptop… ouch • Releasing an analytic into production is a rare event. Lack of practice = frequent fails Trialling a solution: ouch • Start with Test release of “skeleton” PROD • Instead of leaving release as final step ouch success • DevOps 101: release early and frequently!

Workflow challenges From bench to streaming: • R&D happens on a static block of time-series data (e.g. one month). • Challenge = from static to live streaming: batch size, handover between batches, catching-up (maintain full history) vs forcing forward (satisfy real-time) Standardise • Build high-level functions & templates to abstract real-time execution aspects. • Don’t lock-down the process and make it hard to build “non-standard” • Standardising helps maintainability, collaboration, etc.

3 aspects of Continuous improvement Streamline actioning the insights Streamline tools for faster analytics development Streamline analytics : generic and re-useable

R&D 25

Finance, industrial plants and insurance analytics Industrial analytics are a niche application, no-one can help me!   What could there be to gain by outside of my industry? Finance f (A,B) = Ĉ C is a share price, A and B the competitor’s share prices   • If Ĉ >> C: sell, Ĉ << C:buy, Ĉ = C: do noting Insurance s f (A,B) = Ĉ , C is the amount claimed, A and B some parameters of the claim   • Ĉ ≈ C: do nothing, Ĉ << C: investigate a potentially fraudulent claim Plant analytics f (A,B) = Ĉ , C is the temperature of a motor, A and B are brearing temps.   • Ĉ ≈ C: do nothing, Ĉ << C motor potentially overheating At the right level of abstraction, it all becomes the same.   Talk to people. But I’m preaching the choir!

Interns for R&D • Autonomy: R&D can be insulated from the production systems. Low risk to business.   Here’s a dataset, install [ your favourite toolset ] and go get it, tiger! • This usually produces a proof-of-concept • An intern can clear the fog on that high risk/high value project. You can make a sound decision to proceed forwards, without having used any precious permanent employee time • With the right intern: the newer the tech, the greater the challenge… the more they engage! • Co-supervision with an academic will inject a lot of their knowledge in your project. This is often a better solution vs. directly engaging into a research project with academics • You can hire the outstanding ones, risk free!

Picking projects business value vs. geeky indulgence 28

A tale of two companies merging P&H P&H Mainly sells primary digging equipment A mine owns 1-5 of them, no redundancy Very expensive, “top of the pyramid” Analytics strategy focus on fault prediction & uptime maximisation: keep them running 24/7 Komatsu Mainly sells dump trucks A mine owns 50-200 + spare units Less expensive, small loss is not-mission critical Analytics strategy focus on compliance to scheduled maintenance, part sales, operator abuse

The “no free lunch” of analytics Leaking air hose Gearbox failure Recurrent, low impact, easy: supervised Rare, extremely high impact, hard: unsupervised

TensorFlow to the rescue! Need a generic Time Series pattern recognition Weary of the deep-learning hype: “hot topic” of 2016…   At the peak of Gartner’s “hype curve” Is it just for images? An overkill? A summer intern ran the project with great success (accurate and generalises) CNN + LSTM is our standard approach to detect failure patterns in automated systems. Interested in the details?   Data Science Sydney Meetup - Tue 28 May

Building an analytic department From Zero to TensorFlow 1 The Peter - PowerPoint PPT Presentation

Building an analytic department From Zero to TensorFlow 1 The Peter principle: People in a hierarchy tend to rise to their level of incompetence : An employee is promoted based on their success in previous jobs, until they reach a level

Zeros of analytic functions Lecture 14 Zeros of analytic functions Zeros of analytic functions

A Decision A Decision A Decision-Analytic Approach for A Decision Analytic Approach for

On p -adic comparison theorems for analytic spaces Wies lawa Nizio l, joint with Pierre

Analytic Combinatorics in Several Variables Robin Pemantle and Mark Wilson A of A conference, 30

Hadamard type operators for real analytic functions of several variables and moments of analytic

5. Analytic Combinatorics http://aofa.cs.princeton.edu Analytic combinatorics is a calculus for

Functional Analytic Framework Functional Analytic Framework for Model Selection for Model

Energy Complex (EnCo) (New and Existing Building) 117,859 m 2 Building A 61,45 8 m 2 Building B

Noncommutative functions: Algebraic and analytic results Dmitry Kaliuzhnyi-Verbovetskyi 1

Towards large-scale brain imaging studies: How to deal with analytic variability? April 19th,

How Video Analytic Helps to Power Broadcasting Business u Jin Huang u CTO u Arcvideo Inc. 2

Correlation-Aware Semi-Analytic Visibility for Antialiased Rendering Cyril Crassin, Chris Wyman,

Roots of Discrete Analytic Polynomials Susan Durand, Caitlin Still Mentor - Dr. Dan Volok SUMaR

Effective Behavior Analytic Supervision: A Practice Model and Considerations for the Development

Green Mountain Care Board (GMCB) Analytic Teams Proposed Research and Reporting Priorities for

6/7/2012 Analytic Framework Target Population Integrated Health Service Models Clinical

Securing Caribbean networks Bevil Wooding Executive Director, CaribNOG THE DIGITAL WORLD

Bridging the Gap on Breaches: What Makes the Difference? Sponsored By: Bridging the Gap on

IODEF Extensions for Phishing and Other E-Crimeware Patrick Cain Latest Status New draft

The South Jersey Regional Traffic Signal Inventory Andrew Tracy Transportation Engineer

CYBER SECURITY IS OUR SHARED RESPONSIBILITY WHAT ARE WE DEALING WITH AND WHAT DO WE NEED TO DO?

1. Reconstruction and the West 1.1 Reconstruction: Americas Unfinished Revolution, 1865-1877

Forging ahead, scaling the BBC into Web/2.0 Dirk-Willem van Gulik Chief Technical Architect

1 Layering of Protocols Protocol FTP Client Mail client Web Many browser others.