WELCOME WHO ARE WE? David Brader Noah Erickson Data Scientist - - PowerPoint PPT Presentation
WELCOME WHO ARE WE? David Brader Noah Erickson Data Scientist - - PowerPoint PPT Presentation
WELCOME WHO ARE WE? David Brader Noah Erickson Data Scientist Director, Process Improvement Titan America LLC Titan America LLC 3 WHAT WE LEARNED Highway To The Danger Zone -the key to staying out of it Mediators, Moderators and
WELCOME
WHO ARE WE?
David Brader
Director, Process Improvement
Titan America LLC
Noah Erickson
Data Scientist
Titan America LLC
3
WHAT WE LEARNED…
Highway To The Danger Zone
- the key to staying out of it…
Mediators, Moderators and M2’s Oh My
- when variables model differently than in the process world
Making Data Science Meaningful In Real-Time
4
EVER NOTICE EVERYONE IS A DATA SCIENTIST?
5
And they almost invariably have the wrong perspective?
- Like this one, the typical Senior Manager And Above:
CEO & E-SUITE PERSPECTIVE
6
Data Science
SIMPLE RIGHT? …JUST GET THE DATA…
7
STILL GETTING THE DATA…
8
NO SUCH THING AS BAD DATA,
JUST OPPORTUNITIES FOR A LOT OF DATA CLEANING
9
DATA SCIENCE – THE REAL PYRAMID
10
WHY DOMO AND DOMO DATA SCIENCE?
11
Each Step Is Somewhat Automated Within It’s Function But Often Manually Transported Through To The Next Stage
But Being Forward Looking, and In Order To Ensure Proper Human Capital Consumption (i.e. lazy) Our Vision Was….
Standard Playbook Data Preparation Analytics (ML) Visualize
DATA SCIENCE CONSUMPTION AS A CONTINUOUS OPERATING PROCESS
12
Data Acquisition and Preparation Low Level Alert and Notification Data Analysis (ML) High Level Alert and Notification Near Real Time Process Parameter Manipulation
At Present Within DOMO: Ø >100 Data Sources Ø 20,300 Unique Columns Ø >381 M Rows Ø Process Data Being Logged Every 10 Minutes at 5 s Frequency
Every Step Happens Automatically As New Data Is Uploaded In Near Real Time VISUALIZATION
13
WE WANTED TO BE ABLE TO RAPIDLY DEPLOY THAT MODEL
So We Teamed Up With DOMO Data Science – We just needed them to show us how to use the R and Python Tiles in DOMO… But They Made Us Do All This Other Work
– Data Glossary and Definition Sheets – Give Them Process Flow Charts (Actual Process Not Data Flow) – Have Meetings…..
Why? We Were Already In The Data Science Zone?!?
…THEN, A WEEKEND GETS RUINED…
14
SHOW ME….
15
Turns Out They Really Read That 300+ Page Dissertation I Sent Them!
– We Were Here
THE FUNDAMENTAL THEORETICAL RELATIONSHIP BETWEEN THE PRIMARY PREDICTIVE FEATURE AND THE VALUE WE WERE TRYING TO PREDICT WAS NOT PRESENT IN OUR DATA!?!
NOAH (& TONY) GOT BACK TO WORK….
16
Expanded Dataset To Encompass More Variation And Proper Relationships And…. Last Version Of Modeling Gave Us A 33% Improvement In The Baseline R2 From Historical Modeling In The Literature!
SUCCESS!....?
17
But, we had an inline model that outperforms those in the literature, why maybe? Remember we wanted something that would be inline and near real time, turns out the DOMO Datascience Process showed us something very different…
WE WEREN’T READY FOR THE REAL WORLD…
The bulk of the features that were significant in predicting the target value, are ones that we currently ONLY GET AT THE SAME TIME as the value we wanted to predict, which is 2 days after the actual processing WE NEED NEW MEANS BY WHICH TO GET THE NECESSARY DATA ANALYZED AND CAPTURED REAL TIME! THIS MEANS CAPEX! NEW ANALYTICAL EQUIPMENT ON ORDER
18
NOW FOR A LOOK UNDER THE HOOD….
19
LESSONS LEARNED
The Process Will Set You Free – Or At Least Keep You Out Of Trouble Be Prepared For Features & Variables That Will Behave Far Differently In A Real World Operating Process Than They Do In Controlled Situations Institutionalizing Your Solutions Requires Them To Operate At The Same Frequency As Your Decision Making
20