Machine Learning Machine Learning Fast & Slow Fast & Slow - - PowerPoint PPT Presentation

machine learning machine learning fast slow fast slow
SMART_READER_LITE
LIVE PREVIEW

Machine Learning Machine Learning Fast & Slow Fast & Slow - - PowerPoint PPT Presentation

Machine Learning Machine Learning Fast & Slow Fast & Slow Suman Deb Roy Suman Deb Roy Lead Data Scientist @ betaworks bot www.rundexter.com /messaging www.poncho.is www.digg.com www.digg.com/messaging www.rundexter.com


slide-1
SLIDE 1

Machine Learning Machine Learning Fast & Slow Fast & Slow

Suman Deb Roy Suman Deb Roy

Lead Data Scientist @ betaworks

slide-2
SLIDE 2
slide-3
SLIDE 3

bot

www.digg.com www.digg.com/messaging /messaging

www.poncho.is www.poncho.is www.rundexter.com www.rundexter.com

slide-4
SLIDE 4

Runway Runway Art & Art & Science Science The The Last Last 10% 10%

slide-5
SLIDE 5

1: Poncho 1: Poncho

  • A weather cat that sends you

personalized weather messages.

  • Algorithms + Humans
  • Not every feature in weather

data has equal importance – what's ac?onable?

slide-6
SLIDE 6

2: Digg Trending 2: Digg Trending

  • Ranked each day:

– 10 million RSS feeds, 200 million tweets, 7.5 million new ar?cles ranked each day

m.me/digg

slide-7
SLIDE 7

3: Digg Deeper 3: Digg Deeper

slide-8
SLIDE 8

4: 4: Instapaper’s Instapaper’s InstaRank InstaRank

slide-9
SLIDE 9

5: Scale Model 5: Scale Model

Communi?es Not Keywords

slide-10
SLIDE 10

MACHINE LEARNING MACHINE LEARNING WAS WAS HARD

HARD

ITS ITS STILL

STILL HARD

HARD

slide-11
SLIDE 11

Varied Distribu?on Historical Data Similarity between training & test distribu?ons (less varied dist) Predic?on Error Impact of a more complex algorithm Historical Data Value

VALUE of VALUE of Algorithms Algorithms

  • vs. Data
  • vs. Data
slide-12
SLIDE 12

Moving fast and slow Moving fast and slow

  • Fast:

– Experience, Similar Problems, Pre-exis?ng pipelines

  • Slow:

– New type of data, Bootstrap, Scaling

  • Main challenge:

– how to jump between states, when to change gears.

slide-13
SLIDE 13

Conscious Slow Conscious Fast Unconscious Slow Unconscious Fast Fast Fast Planned Planned Slow Slow

slide-14
SLIDE 14

Effects of moving Fast Effects of moving Fast

  • Technical debt?

– Refactoring code – improving unit tests – delete dead code – reducing dependencies – ?ghtening APIs – improving documenta?on

slide-15
SLIDE 15

Effects of moving Slow Effects of moving Slow

  • Growth debt?

– Wai?ng team mates – Uncertain quality assurance – Piling up further requests – Hypothesis might not be feedback driven – Overthinking the solu?on

slide-16
SLIDE 16

Maintenance Maintenance

  • Code Level

– How researchable, reusable, deployable

  • System Level

– Eroding abstrac?on boundaries

  • Data Level

– Data influences ML behavior.

slide-17
SLIDE 17

Data vs. Code Organization Data vs. Code Organization

  • Snapshodng .. Detects bias
  • Interface at the method , be procedural

– Easy to execute por?ons of the code.

  • Separate hyper-arguments from parameters

– Parameter: How your model is specified – Hyper-Arguments: How your algorithm should run

slide-18
SLIDE 18

Unstable APIs Unstable APIs

  • Who owns the data stream?
  • Who owns the model ?
  • Ownership by

– en?re solu?on – Exper?se? DB ? Pipelines? Algorithms? Stats

  • Debug?

– Frozen versioning instead of con?nual

slide-19
SLIDE 19

Feature Erosion Feature Erosion

  • User behavior with new model could make

features of current model unimportant

  • How can we detect this?
  • How can we prevent this?
slide-20
SLIDE 20

Predictor Variables Predictor Variables

  • Myth: If you add a few more variables, the

predictor will be befer.

  • If the predictors have realis?c priors, their

coefficients could be appropriately pulled down (in expecta?on) and over fidng shouldn’t be such a problem

slide-21
SLIDE 21

Visualizations Visualizations

Any ML algorithm must be seen to believe it.

slide-22
SLIDE 22

Visualizations Visualizations

slide-23
SLIDE 23

Research vs. Production Research vs. Production

  • Collabora?on looks very different based on

the end goals

  • Do you need to master git or just get by
  • How quickly can you move something from

iPython to produc?on grade?

slide-24
SLIDE 24

Even the best tools.. Even the best tools..

  • Lets talk about iPython notebooks:

– Version Control – Fragmented Code is deadly for produc?on grade. – Security issue : all those open ports – Code Reviews and Pull Requests.

slide-25
SLIDE 25

Heuristic Escape Heuristic Escape

“Heuristic is an algorithm in a clown suit. It’s less predictable, it’s more fun, and it comes without a 30- day, money-back guarantee.”

― Steve McConnell, Code Complete

slide-26
SLIDE 26

Domain of Impact Domain of Impact

  • Most engineers and computers scien?sts will

conceptualize domains as primarily a ra?onal, evidence-based, problem-solving enterprise focused on well-defined condi?ons.

  • But the real world is ….. more complex!
  • e.g.,: Trending News Algorithms
slide-27
SLIDE 27

Invention vs. Innovation Invention vs. Innovation

  • What is ML good at? Both ?
  • Not outside the box, instead connect them.
  • innova?on = improve significantly by adjus?ng

ML method

  • inven?on = totally new ML method.
slide-28
SLIDE 28

Fitting ML into the betaworks model Fitting ML into the betaworks model

Nexus

Product C Company A Research Company B

slide-29
SLIDE 29

Code & Data Residence Code & Data Residence

  • ML module transfer

– Code transfer

  • Core module
  • Model upda?ng component
  • Analysis component

– Data transfer

  • Infrastructure rebuild?
  • Performance
  • maintenance
slide-30
SLIDE 30

Research ready pipelines Research ready pipelines

Powered by deepNews

slide-31
SLIDE 31

Second order Analysis Second order Analysis

Powered by deepNews + Scale Model

slide-32
SLIDE 32

Conversational Conversational Software Software

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35

HBI

HUMAN HUMAN BOT BOT INTER INTER CONNECTION CONNECTION

slide-36
SLIDE 36

APIs Apps for transactional tasks Topic Modeling DBpedia Freebase trending topics digg deeper Affective Computing

MANY automated solutions ZERO

automated solutions

slide-37
SLIDE 37

APIs Apps for transactional tasks LDA LSA DBpedia Freebase Trending topics Digg deeper

LSTM ?

HIGH VALUE

  • f historical data

Tone Analyzer?

LOW VALUE of historical data

slide-38
SLIDE 38

Data Types by Company Data Types by Company

  • Digg has topic modeling/ news data
  • Scale model has social graph data
  • Poncho has weather data/editorialized

personality

  • Giphy has gifs (emo?on++)
  • Instapaper has reading data
  • Dexter has hooks to APIs
slide-39
SLIDE 39

Transfer Learning Transfer Learning

Yosinski et. al. How transferrable are deep learning features? , in NIPS 2014

slide-40
SLIDE 40

To Sum up To Sum up

  • Constraints to ML solu?ons occur at three

levels:

– Algorithmic – Data – Humans

  • These parameters lead to several oscilla?ng

cycles of fast and slow impact of ML

  • Whats good for you?
slide-41
SLIDE 41

ML 2016 ML 2016

  • Understood by few, hyped by some, revered by

most.

  • Can be the difference between a company scaling
  • vs. close shop.
  • Almost every company can have at least 1

product feature powered by ML.

  • Be careful about bias in data.
slide-42
SLIDE 42

data.betaworks.com Suman Deb Roy suman@betaworks.com | @_roysd