Minimal Time Avoiding Pain in ML Projects Dr Janet Bastiman - - PowerPoint PPT Presentation

minimal time
SMART_READER_LITE
LIVE PREVIEW

Minimal Time Avoiding Pain in ML Projects Dr Janet Bastiman - - PowerPoint PPT Presentation

From POC to Production in Minimal Time Avoiding Pain in ML Projects Dr Janet Bastiman @yssybyl 1 StoryStream.ai Project timings Dr Janet Bastiman @yssybyl 2 StoryStream.ai About StoryStream The worlds leading automotive content


slide-1
SLIDE 1

1

StoryStream.ai

From POC to Production in Minimal Time – Avoiding Pain in ML Projects

Dr Janet Bastiman @yssybyl

slide-2
SLIDE 2

2

StoryStream.ai

Project timings

Dr Janet Bastiman @yssybyl

slide-3
SLIDE 3

3

StoryStream.ai

The world’s leading automotive content platform

StoryStream is a dedicated automotive content platform, trusted by some of the world’s leading car brands. Specifically created to help automotive brands provide a more relevant, engaging customer experience, fuelled with authentic content and designed for efficiently scaling content operations across global teams.

  • Grow customer engagement and conversions by up to 25%
  • Reduce content creation and management costs by up to 60%
  • Provide a more authentic customer experience
  • Understand your customer in a deeper way

About StoryStream

The Core StoryStream Benefits

slide-4
SLIDE 4

4

StoryStream.ai

slide-5
SLIDE 5

5

StoryStream.ai

Dr Janet Bastiman @yssybyl

slide-6
SLIDE 6

6

StoryStream.ai

“[Client] needs this to go live at the end of the month, I promised them we could deliver...”

Every salesperson ever

Dr Janet Bastiman @yssybyl

slide-7
SLIDE 7

7

StoryStream.ai

Project timings

Dr Janet Bastiman @yssybyl

  • 35 models = 1050 days (one person linear)
  • ~ 5 years for one person working Mon-Fri - who is allowed

holidays :)

  • 250 days with parallelisation of tasks and data upfront
  • 150 days on worksheet, balanced by an increase in ongoing

license

slide-8
SLIDE 8

8

StoryStream.ai

Can you guess what happened next?

Dr Janet Bastiman @yssybyl

slide-9
SLIDE 9

9

StoryStream.ai

What would it take to get it done in that time?

Dr Janet Bastiman @yssybyl

The Core (2003) Paramount Pictures

slide-10
SLIDE 10

10

StoryStream.ai

“They don’t have any data to give us”

Dr Janet Bastiman @yssybyl

slide-11
SLIDE 11

11

StoryStream.ai

If you are dealing with any critical inferencing do not take shortcuts, do it properly and do it rigorously and stand up to the company and say no - make sure it’s clear that the timelines will be longer to get it right.

Dr Janet Bastiman @yssybyl

slide-12
SLIDE 12

12

StoryStream.ai

Without Data ML is just a Random Result

Dr Janet Bastiman @yssybyl

  • Legal public sources
  • https://github.com/awesomedata/awesome-public-datasets
  • https://www.kaggle.com/datasets
  • Take your own pictures/videos
  • access/permission?
  • Slow and inconsistent
  • Scrape the client site with permission
slide-13
SLIDE 13

13

StoryStream.ai

How much data?

Dr Janet Bastiman @yssybyl

  • Vision: 1000 images per output class but depends on

complexity of the problem

  • Time series: at least double the time period over which you

are predicting, but be cautious of data becoming irrelevant

  • Text: very variable depending on the problem
  • This also changes if you already have pre-trained networks

that you’re updating

slide-14
SLIDE 14

14

StoryStream.ai

What do you do with the Data?

Dr Janet Bastiman @yssybyl

  • Selection bias
  • Random Sampling
  • Over coverage
  • Undercoverage
  • Measurement (Response) error
  • Processing errors
  • Participation bias
slide-15
SLIDE 15

15

StoryStream.ai

What do you do with the Data?

Dr Janet Bastiman @yssybyl

Photos Scrape S3 bucket

  • Unique filename
  • source
  • Set uuid (if multiple images of

same car)

  • Date taken
  • S3 bucket per vehicle variant
slide-16
SLIDE 16

16

StoryStream.ai

What do you do with the Data?

Dr Janet Bastiman @yssybyl

Photos Scrape Car Detector S3 Bucket Manual

verification

  • Extra field for label
  • S3 bucket name became

mostly irrelevant

slide-17
SLIDE 17

17

StoryStream.ai

Crowdsource labelling

Dr Janet Bastiman @yssybyl

https://xkcd.com/1897/

slide-18
SLIDE 18

19

StoryStream.ai

Data Pipeline

Dr Janet Bastiman @yssybyl

Data In Object detector Images saved Auxiliary info saved Temp public access Extract for Turk Import of results Dashboard Expert clean Data Ready

slide-19
SLIDE 19

21

StoryStream.ai

Transfer Learning

Dr Janet Bastiman @yssybyl

  • Use transfer learning - fix most of the weights of

a good network and adapt the last few layers

  • Fast and easy retraining and works with smaller

data sets in a variety of fields

  • (image) https://arxiv.org/abs/1903.02196
  • (series) https://arxiv.org/abs/1907.01332
  • (audio) https://arxiv.org/abs/1909.07526

Deep Learning for Vision Systems, Mohamed Elgendy

slide-20
SLIDE 20

22

StoryStream.ai

Unbalanced Data

Dr Janet Bastiman @yssybyl

slide-21
SLIDE 21

23

StoryStream.ai

https://www.designhacks.co/products/cognitive-bias-codex-poster

slide-22
SLIDE 22

25

StoryStream.ai

Stand on the shoulders

  • f giants…

Dr Janet Bastiman @yssybyl

  • For some problems CNNs are robust to

noisy labels and up to 20 time noise to real labels can still give business level accuracy https://arxiv.org/pdf/1705.10694.pdf

  • Find the right architecture

http://www.asimovinstitute.org/neural-network-zoo/

slide-23
SLIDE 23

26

StoryStream.ai

Go old school

Dr Janet Bastiman @yssybyl Reduce the dimensionality of the problem and use Bayesian approach, KNN or SVM https://xkcd.com/2059/

slide-24
SLIDE 24

27

StoryStream.ai

Choose wisely

Dr Janet Bastiman @yssybyl

slide-25
SLIDE 25

28

StoryStream.ai

Simplify the problem

Dr Janet Bastiman @yssybyl Removal of camera artefacts in eye images to make detection easier - Jeffrey De Fauw

http://blog.kaggle.com/2015/08/10/detecting-diabetic- retinopathy-in-eye-images/

Image Image Specific Vehicle Specific Vehicle Car? Make? Removal of Doppler effect on moving source using fractional octave band shifting, F Mobley

https://asa.scitation.org/doi/pdf/10.1121/2.0000578?class=pdf

Δ𝑜=−r[𝑚𝑝𝑕2(1−𝑁cos𝜄sin𝜒)]

slide-26
SLIDE 26

29

StoryStream.ai

Get every last drop from what you have

Dr Janet Bastiman @yssybyl

Statistical anatomical modelling for efficient and personalised spine biomechanical models - I Castro Mateos PhD thesis Have a toolkit of augmentation approaches but choose what’s relevant to your needs...

slide-27
SLIDE 27

30

StoryStream.ai

Augmentation - detail

Dr Janet Bastiman @yssybyl

  • Flip L/R U/D
  • Rotations
  • Reduce or enlarge bounding box coordinates by N%
  • Add occlusions

https://www.umbc.edu/rssipl/people/aplaza/Papers/Journals/2019 .GRSL.Occlusion.pdf

  • Change hue saturation and value of colours in the image

https://arxiv.org/pdf/1902.06543.pdf

  • Copypairing - https://arxiv.org/abs/1909.00390#
slide-28
SLIDE 28

34

StoryStream.ai

Infrastructure

Dr Janet Bastiman @yssybyl

Data In Data Store Taxonomy Classifier Definition Test Set DockerHub Setup Codeship Project GitHub Setup

Notification

Slack Email Template AWS Image Scripts Dashboard

slide-29
SLIDE 29

35

StoryStream.ai

Cloud Formation

Dr Janet Bastiman @yssybyl

slide-30
SLIDE 30

36

StoryStream.ai

Automation

Dr Janet Bastiman @yssybyl

Delete local data Build container Get model and key Run test harness Validate container Run container Report results Dashboard Commit Build new Container

slide-31
SLIDE 31

37

StoryStream.ai

Stack Automation

Dr Janet Bastiman @yssybyl Add new container Start stack Run stack test harness Better? Compare results Create docs Yes Update CF Live No Human investigation

slide-32
SLIDE 32

38

StoryStream.ai

Automatic Documentation

Dr Janet Bastiman @yssybyl

LaTeX templates Pweave .tex files and images Save with model files Convert to PDF Run LaTeX If live, save in live docs Email to team

slide-33
SLIDE 33

40

StoryStream.ai

Did we make it?

Dr Janet Bastiman @yssybyl

  • Some really difficult images
  • Only expected images were

given

  • Where it was wrong it was

(mostly) sensibly wrong

  • Client happy
  • Cool automated system
slide-34
SLIDE 34

41

StoryStream.ai

The Playbook

Dr Janet Bastiman @yssybyl ai-playbook.com

slide-35
SLIDE 35

42

StoryStream.ai

Dr Janet Bastiman @yssybyl

Thank You

https://xkcd.com/2191/