Machine Learning from Development to Production at Instacart - - PowerPoint PPT Presentation

machine learning
SMART_READER_LITE
LIVE PREVIEW

Machine Learning from Development to Production at Instacart - - PowerPoint PPT Presentation

Machine Learning from Development to Production at Instacart Montana Low Machine Learning Engineer, Instacart Instacart value proposition + + + = Groceries From stores Delivered to In as little you love your doorstep as an hour


slide-1
SLIDE 1

Machine Learning

from Development to Production at Instacart Montana Low

Machine Learning Engineer, Instacart

slide-2
SLIDE 2

Instacart value proposition

Groceries From stores you love Delivered to 
 your doorstep In as little 
 as an hour

+ + + =

slide-3
SLIDE 3

Four sided marketplace

Customers Shoppers

Products


(Advertisers)

SEARCH ADVERTISING SHOPPING DELIVERY CUSTOMER SERVICE INVENTORY PICKING L O Y A L T Y

Stores


(Retailers)

slide-4
SLIDE 4

Customer experience

Shop for groceries

Checkout Select delivery time

Delivered to doorstep

Choose a store

slide-5
SLIDE 5

Personal shopper experience

Find the groceries Scan barcode Out for delivery Delivered to doorstep Accept order
slide-6
SLIDE 6

Search & discovery

slide-7
SLIDE 7

Milk Milk chocolate Chocolate milk

Supervised learning

slide-8
SLIDE 8

Features

  • Brand
  • Fat Content
  • USDA Grade
  • Organic?
  • Pasteurized?
  • Homogenized?
  • Volume
  • Geography
  • Dominant Color
slide-9
SLIDE 9

Encoding

slide-10
SLIDE 10

Supervised learning

Milk

slide-11
SLIDE 11

Milk

New products

  • Kirkland signature
  • 2% Fat
  • Milk
  • Vitamin A
  • Vitamin D
  • Grade A
  • Pasteurized
  • Homogenized
  • 1 gallon
  • 2 count
  • Tertiary color
slide-12
SLIDE 12

Competitive products

Cola

slide-13
SLIDE 13

Recommended products

Peanut butter

slide-14
SLIDE 14

Project implementation w/ Lore

slide-15
SLIDE 15
slide-16
SLIDE 16

$ pip install lore $ lore init loss_prevention $ lore generate scaffold delivery_disputes --regression loss_prevention in development on montanalow@localhost CREATED loss_prevention/models/delivery_disputes.py CREATED loss_prevention/estimators/delivery_disputes.py CREATED loss_prevention/pipelines/delivery_disputes.py CREATED tests/unit/test_delivery_disputes.py CREATED notebooks/delivery_disputes/features.ipynb CREATED notebooks/delivery_disputes/architecture.ipynb

Create a project and model

slide-17
SLIDE 17
slide-18
SLIDE 18

SELECT visits.ip_address, deliveries.latitude, deliveries.longitude, charge_logs.is_disputed FROM deliveries JOIN visits ON visits.id = deliveries.visit_id JOIN charge_logs ON charge_logs.id = deliveries.charge_id

Extract

loss_prevention/extracts/credit_card_disputes.sql

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

class Pipeline(lore.pipelines.holdout.Base): @timed(logging.INFO) def get_data(self): return lore.io.redshift.dataframe( filename='credit_card_disputes', cache=True )

Pipeline

loss_prevention/pipelines/credit_card_disputes.py

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24

class Pipeline(lore.pipelines.holdout.Base): ... def get_encoders(self): return ( Norm( Distance( ‘latitude’, ‘longitude’, GeoIP('ip_address', ‘latitude’), GeoIP('ip_address', ‘longitude’) ) ), )

Pipeline

loss_prevention/pipelines/credit_card_disputes.py

slide-25
SLIDE 25

class Pipeline(lore.pipelines.holdout.Base): ... def get_output_encoder(self): return Pass('is_disputed')

Pipeline

loss_prevention/pipelines/credit_card_disputes.py

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

from loss_prevention.pipelines.credit_card_diputes import Pipeline class DeepLearning(lore.models.keras.Base): def __init__(self): super(DeepLearning, self).__init__( pipeline=Pipeline(), estimator=lore.estimators.keras.BinaryClassifier() )

Model

loss_prevention/models/credit_card_disputes.py

slide-29
SLIDE 29
slide-30
SLIDE 30

$ lore test loss_prevention in test on montanalow@localhost RUNNING all tests ..

  • Ran 2 tests in 3.846s

OK

Run tests

slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

$ lore fit loss_prevention.models.delivery_disputes.DeepLearning loss_prevention in development on montanalow@localhost Using TensorFlow backend. Train on 80 samples, validate on 10 samples Epoch 1 32/80 [===========>................] - ETA: 15s - loss: 1.5831

Train the model

slide-34
SLIDE 34

$ lore fit loss_prevention.models.delivery_disputes.DeepLearning loss_prevention in development on montanalow@localhost Using TensorFlow backend. Train on 80 samples, validate on 10 samples ... Epoch 57 80/80 [========================] - loss: 0.55 val_loss: 0.58 Epoch 58 80/80 [========================] - loss: 0.53 val_loss: 0.57 Epoch 59 80/80 [========================] - loss: 0.52 val_loss: 0.58 Early Stopping

Early Stopping

slide-35
SLIDE 35
slide-36
SLIDE 36

requirements.txt runtime.txt config/ database.cfg data/query_cache/ loss_prevention.pipelines.delivery_disputes.Pipeline.get_data.XY.pickle models/loss_prevention.models.delivery_disputes/DeepLearning/1/ model.pickle weights.h5 logs/ development.log

Important Files

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39

SELECT visits.ip_address, deliveries.latitude, deliveries.longitude, charge_logs.is_disputed FROM deliveries JOIN visits ON visits.id = deliveries.visit_id JOIN charge_logs ON charge_logs.id = deliveries.charge_id

Extract

loss_prevention/extracts/credit_card_disputes.sql

slide-40
SLIDE 40

SELECT visits.ip_address, deliveries.latitude, deliveries.longitude, charge_logs.is_disputed FROM deliveries JOIN visits ON visits.id = deliveries.visit_id JOIN charge_logs ON charge_logs.id = deliveries.charge_id {% if delivery_id %} WHERE deliveries.id = {delivery_id} {% endif %}

Extract

loss_prevention/extracts/credit_card_disputes.sql.j2

slide-41
SLIDE 41

class Pipeline(lore.pipelines.holdout.Base): @timed(logging.INFO) def get_data(self): return lore.io.redshift.dataframe( filename='credit_card_disputes', cache=True )

Pipeline

loss_prevention/pipelines/credit_card_disputes.py

slide-42
SLIDE 42

class Pipeline(lore.pipelines.holdout.Base): @timed(logging.INFO) def get_data(self, delivery_id=None): if delivery_id: interpolate = {'delivery_id': delivery_id} connection = lore.io.postgres cache = False else: interpolate = {} connection = lore.io.redshift cache = True sql=connection.template('delivery_disputes', delivery_id=delivery_id) return connection.dataframe(sql=sql, cache=cache, **interpolate)

Pipeline

loss_prevention/pipelines/credit_card_disputes.py

slide-43
SLIDE 43

from loss_prevention.pipelines.credit_card_diputes import Pipeline class DeepLearning(lore.models.keras.Base): def __init__(self): super(DeepLearning, self).__init__( pipeline=Pipeline(), estimator=lore.estimators.keras.BinaryClassifier() )

Model

loss_prevention/models/credit_card_disputes.py

slide-44
SLIDE 44

from loss_prevention.pipelines.credit_card_diputes import Pipeline class DeepLearning(lore.models.keras.Base): def __init__(self): super(DeepLearning, self).__init__( pipeline=Pipeline(), estimator=lore.estimators.keras.BinaryClassifier() ) @timed(logging.INFO) def predict(self, dataframe): data = self.pipeline.get_data(delivery_id=dataframe.delivery_id) return self.estimator.predict(data)

Model

loss_prevention/models/credit_card_disputes.py

slide-45
SLIDE 45

$ lore server & loss_prevention in development on montanalow@localhost Using TensorFlow backend. * Serving Flask app "lore.www" $ curl http://localhost:5000/delivery_disputes.DeepLearning/predict.json -d "delivery_id=123" True

Lore Server

slide-46
SLIDE 46
slide-47
SLIDE 47

Transformers

  • GeoIP
  • Distance
  • DateTime
  • String
  • EmailDomain
  • AreaCode
  • Log/PlusOne
  • ...
  • NameAge
  • NameSex
  • NameFamilial
  • NamePopulation
slide-48
SLIDE 48

Encoders

  • Norm
  • Quantile
  • Discrete
  • Boolean
  • Enum
  • Unique
  • Token
  • Glove
  • MiddleOut
  • Equals
slide-49
SLIDE 49

Algorithms

  • Keras/Tensorflow
  • XGBoost
  • SciKit Learn
slide-50
SLIDE 50

WE’RE HIRING!

montana@instacart.com

slide-51
SLIDE 51

It is not the strongest of species that survives, nor the most intelligent that

  • survives. It is the one that is most

adaptable to change.

Charles Darwin

“ ”