Machine learning on mobile and edge devices with TensorFlow Lite - - PowerPoint PPT Presentation

machine learning on mobile and edge devices with
SMART_READER_LITE
LIVE PREVIEW

Machine learning on mobile and edge devices with TensorFlow Lite - - PowerPoint PPT Presentation

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for TensorFlow Lite Co-wrote this book Daniel Situnayake @dansitu TensorFlow Lite is a production ready, cross-platgorm framework for deploying ML on


slide-1
SLIDE 1

Machine learning on mobile and edge devices with TensorFlow Lite

slide-2
SLIDE 2

Daniel Situnayake @dansitu Developer advocate for TensorFlow Lite Co-wrote this book →

slide-3
SLIDE 3

TensorFlow Lite is a production ready, cross-platgorm framework for deploying ML on mobile devices and embedded systems

slide-4
SLIDE 4

Goals

Inspiration

See what’s possible with machine learning on-device

Understanding

Learn how on-device machine learning works, the things it can do, and how we can use it

Actionable next steps

Know what to learn next, and decide what to build first

slide-5
SLIDE 5

What is machine learning?

slide-6
SLIDE 6

Data

calcPE(stock){ price = readPrice(); earnings = readEarnings(); return (price/earnings); }

Rules

(Expressed in Code)

Answers

(Returned From Code)

slide-7
SLIDE 7

if (ball.collide(brick)){ removeBrick(); ball.dx=-1*(ball.dx); ball.dy=-1*(ball.dy); }

slide-8
SLIDE 8

Rules Data Traditional Programming Answers

slide-9
SLIDE 9

Rules Data Traditional Programming Answers Answers Data Rules Machine Learning

slide-10
SLIDE 10

Activity recognition

if(speed<4){ status=WALKING; }

slide-11
SLIDE 11

if(speed<4){ status=WALKING; } if(speed<4){ status=WALKING; } else { status=RUNNING; }

Activity recognition

slide-12
SLIDE 12

if(speed<4){ status=WALKING; } if(speed<4){ status=WALKING; } else { status=RUNNING; } if(speed<4){ status=WALKING; } else if(speed<12){ status=RUNNING; } else { status=BIKING; }

Activity recognition

slide-13
SLIDE 13

if(speed<4){ status=WALKING; } if(speed<4){ status=WALKING; } else { status=RUNNING; } if(speed<4){ status=WALKING; } else if(speed<12){ status=RUNNING; } else { status=BIKING; } // Oh crap

Activity recognition

slide-14
SLIDE 14

Rules Data Traditional Programming Answers Answers Data Rules Machine Learning

slide-15
SLIDE 15

Activity Recognition

0101001010100101010 1001010101001011101 0100101010010101001 0101001010100101010 Label = WALKING 1010100101001010101 0101010010010010001 0010011111010101111 1010100100111101011 Label = RUNNING 1001010011111010101 1101010111010101110 1010101111010101011 1111110001111010101 Label = BIKING 1111111111010011101 0011111010111110101 0101110101010101110 1010101010100111110 Label = GOLFING

slide-16
SLIDE 16

Demo: Machine learning in 2 minutes

slide-17
SLIDE 17

Key terms

Model Training Dataset Inference

slide-18
SLIDE 18

Load your model Transform data Use the resulting

  • utput

Run inference

What inference looks like

slide-19
SLIDE 19

Application code Pre-processing Transforms input to be compatible with model Model Trained to make predictions on data Post-processing Interprets the model’s

  • utput and makes

decisions Interpreter Runs inference using the model Input data

Abc

Output

slide-20
SLIDE 20

Application code Pre-processing Transforms input to be compatible with model Model Trained to make predictions on data Post-processing Interprets the model’s

  • utput and makes

decisions Interpreter Runs inference using the model Input data

Abc

Output

slide-21
SLIDE 21

Understanding TensorFlow Lite

  • Introduction
  • Getuing starued with TensorFlow Lite
  • Making the most of TensorFlow Lite
  • Running TensorFlow Lite on MCUs
slide-22
SLIDE 22

Edge ML Explosion

slide-23
SLIDE 23
  • Lower latency & close knit

interactions

Edge ML Explosion

slide-24
SLIDE 24
  • Lower latency & close knit

interactions

  • Network connectivity

Edge ML Explosion

slide-25
SLIDE 25
  • Lower latency & close knit

interactions

  • Network connectivity
  • Privacy preserving

Edge ML Explosion

slide-26
SLIDE 26

On device ML allows for a new generation

  • f products
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

30

Photos GBoard Cloud YouTube Assistant Hike Uber ML Kit iQiyi Tencent

1000’s of production apps use it globally.

slide-31
SLIDE 31

Have now deployed TensorFlow Lite in production.

More than 3B+ mobile devices globally.

slide-32
SLIDE 32

TensorFlow Lite

Android & iOS Microcontrollers Embedded Linux (Raspberry Pi) Hardware Accelerators (Edge TPU)

slide-33
SLIDE 33

Why on-device ML is amazing

33

What makes it difgerent?

slide-34
SLIDE 34

ML on the edge

34

slide-35
SLIDE 35

ML on the edge

35

slide-36
SLIDE 36

Bandwidth

36

slide-37
SLIDE 37

Latency

37

slide-38
SLIDE 38

Privacy & security

38

slide-39
SLIDE 39

Complexity

39

slide-40
SLIDE 40
  • Uses litule compute power

Challenges

slide-41
SLIDE 41
  • Uses litule compute power
  • Works on limited memory

platgorms

Challenges

slide-42
SLIDE 42
  • Uses litule compute power
  • Works on limited memory

platgorms

  • Consumes less batuery

Challenges

slide-43
SLIDE 43

Converu once, deploy anywhere

We’re simplifying

  • n-device ML
slide-44
SLIDE 44

Getuing Starued with TensorFlow Lite

Model conversion and deployment

slide-45
SLIDE 45

Dance Like

Built on TensorFlow Lite using the latest in segmentation, pose and GPU techniques all on-device.

slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48

We’ve made it easy to deploy ML on-device

slide-49
SLIDE 49
slide-50
SLIDE 50
  • 1. Get a TensorFlow Lite model
  • 2. Deploy and run on edge device

Workfmow

slide-51
SLIDE 51
slide-52
SLIDE 52

Image Segmentation

Bokeh efgect Replace background

slide-53
SLIDE 53

PoseNet

Estimate locations of body and limbs

slide-54
SLIDE 54

MobileBERT

Answer users’s questions

  • n a corpus of text
slide-55
SLIDE 55

Classification Prediction Recognition Text to Speech Speech to Text Object detection Object location OCR Gesture recognition Facial modelling Segmentation Clustering Compression Super resolution Translation Voice synthesis Video generation Text generation Audio generation

Audio Image Speech Text Content

slide-56
SLIDE 56

TensorFlow

(keras or estimator)

SavedModel

TF Lite model

TF Lite converuer

Converuing Your Model

slide-57
SLIDE 57

57

# Build and save Keras model. model = build_your_model() tf.keras.experimental.export_saved_model(model, saved_model_dir) # Convert Keras model to TensorFlow Lite model. converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) tflite_model = converter.convert()

slide-58
SLIDE 58

Load your model

Preprocess data

Use the resulting

  • utput

TF Lite interpreter

Running Your Model

slide-59
SLIDE 59

59

private fun initializeInterpreter() { val model = loadModelFile(context.assets) this.interpreter = Interpreter(model) } private fun classify(bitmap: Bitmap): String { val resizedImage = Bitmap.createScaledBitmap(bitmap, ...) val inputByteBuffer = convertBitmapToByteBuffer(resizedImage) val output = Array(1) { FloatArray(OUTPUT_CLASSES_COUNT) } this.interpreter?.run(inputByteBuffer, output) ... }

slide-60
SLIDE 60

60

private fun initializeInterpreter() { val model = loadModelFile(context.assets) this.interpreter = Interpreter(model) } private fun classify(bitmap: Bitmap): String { val resizedImage = Bitmap.createScaledBitmap(bitmap, ...) val inputByteBuffer = convertBitmapToByteBuffer(resizedImage) val output = Array(1) { FloatArray(OUTPUT_CLASSES_COUNT) } this.interpreter?.run(inputByteBuffer, output) ... }

slide-61
SLIDE 61

61

private fun initializeInterpreter() { val model = loadModelFile(context.assets) this.interpreter = Interpreter(model) } private fun classify(bitmap: Bitmap): String { val resizedImage = Bitmap.createScaledBitmap(bitmap, ...) val inputByteBuffer = convertBitmapToByteBuffer(resizedImage) val output = Array(1) { FloatArray(OUTPUT_CLASSES_COUNT) } this.interpreter?.run(inputByteBuffer, output) ... }

slide-62
SLIDE 62
  • APIs for simplifying pre- and post-processing (launched)
  • Autogenerates pre- and post-processing (in progress)

The new TF Lite Supporu Library makes development easier

slide-63
SLIDE 63

/** Without TensorFlow Lite Support Library */

/** 1. Load your model. */

ImageClassifier(Activity activity) throws IOException { tfliteModel = loadModelFile(activity); tflite = new Interpreter(tfliteModel, tfliteOptions); imgData = ByteBuffer.allocateDirect( DIM_BATCH_SIZE * getImageSizeX() * getImageSizeY() * DIM_PIXEL_SIZE * getNumBytesPerChannel()); imgData.order(ByteOrder.nativeOrder()); }

** 2. Transform data. */

protected void loadAndProcessBitmap(Bitmap rgbFrameBitmap) { Bitmap croppedBitmap = Bitmap.createBitmap(classifier.getImageSizeX(), classifier.getImageSizeY(), Config.ARGB_8888); final Canvas canvas = new Canvas(croppedBitmap); canvas.drawBitmap(rgbFrameBitmap, frameToCropTransform, null); imgData.rewind(); croppedBitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight()); for (int i = 0, pixel = 0; i < getImageSizeX(); ++i) { for (int j = 0; j < getImageSizeY(); ++j) { final int val = intValues[pixel++]; imgData.putFloat((((val >> 16) & 0xFF) - IMAGE_MEAN) / IMAGE_STD); imgData.putFloat((((val >> 8) & 0xFF) - IMAGE_MEAN) / IMAGE_STD); imgData.putFloat(((val & 0xFF) - IMAGE_MEAN) / IMAGE_STD); } } }

63

/** 3. Run inference. */

public List<Classification> classifyImage(Bitmap rgbFrameBitmap) { loadAndProcessBitmap(rgbFrameBitmap) tflite.run(imgData, labelProbArray);

/** 3. Use the resulting output. */

PriorityQueue<Classification> pq = new PriorityQueue<Classification>( 3, new Comparator<Classification>() { public int compare(Classification lhs, Classification rhs) { return Float.compare( rhs.getConfidence(), lhs.getConfidence()); } }); for (int i = 0; i < labels.size(); ++i) { pq.add(new Classification( "" + i, labels.size() > i ? labels.get(i) : "unknown", getNormalizedProbability(i))); } final ArrayList<Classification> results = new ArrayList<Classification>(); int resultSize = Math.min(pq.size(), MAX_RESULTS); for (int i = 0; i < resultSize; ++i) { results.add(pq.poll()); } return results; }

slide-64
SLIDE 64

64

/** With TensorFlow Lite Support Library */

// 1. Load your model. MyImageClassifier classifier = new MyImageClassifier(activity); MyImageClassifier.Inputs inputs = classifier.createInputs(); // 2. Transform your data. inputs.loadImage(rgbFrameBitmap); // 3. Run inference. MyImageClassifier.Outputs outputs = classifier.run(inputs); // 4. Use the resulting output. Map<String, float> labeledProbabilities = outputs.getOutput():

slide-65
SLIDE 65

Converuer

Running Your Model

Interpreter Op Kernels Delegates

slide-66
SLIDE 66

Language Bindings

  • New language bindings

(Swift, Obj-C, C# and C) for iOS, Android and Unity

  • Community language bindings

(Rust, Go, Flutter/Dart)

Swift Obj-C C & C# Rust Go Flutter/Dart

slide-67
SLIDE 67

Running TensorFlow Lite on Microcontrollers

slide-68
SLIDE 68
  • No operating system
  • Tens of KB of RAM & Flash
  • Only CPU, memory & I/O peripherals
  • Exist all around us

Small computer on a single circuit

IO RAM CPU ROM

MCU

What are they?

slide-69
SLIDE 69

Input

MCU Is there any sound?

Class 1 Class 2 Output Input

MCU Is that human speech?

Class 1 Class 2 Output

Deeper Network

Application Processor

slide-70
SLIDE 70

TensorFlow Lite for microcontrollers

TensorFlow provides you with a single framework to deploy on Microcontrollers as well as phones

TensorFlow Saved Model TensorFlow Lite Flat Buffer Format TensorFlow Lite Interpreter TensorFlow Lite Micro Interpreter

slide-71
SLIDE 71
slide-72
SLIDE 72

Example

slide-73
SLIDE 73
  • Simple speech recognition
  • Person detection using a camera
  • Gesture recognition using an accelerometer
  • Predictive maintenance

What can you do on an MCU?

slide-74
SLIDE 74
  • Recognizes “Yes” and “No”
  • Retrainable for other words
  • 20KB model
  • 7 million ops per second

Speech Detection on an MCU

slide-75
SLIDE 75
  • Recognizes if a person is visible in camera feed
  • Retrainable for other objects
  • 250KB MobileNet model
  • 60 million ops per inference

Person Detection on an MCU

slide-76
SLIDE 76
  • Spots wand gestures
  • Retrainable for other gestures
  • 20KB model

Gesture Detection on an MCU

slide-77
SLIDE 77

Improving your model pergormance

slide-78
SLIDE 78

Incredible Pergormance

Enable your models to run as fast as possible on all hardware

CPU GPU DSP NPU

slide-79
SLIDE 79

Incredible Pergormance

CPU 37 ms CPU 2.8x 13 ms GPU 6.2x 6 ms EdgeTPU 18.5x 2 ms

Quantized Fixed-point OpenCL Float16 Quantized Fixed-point Floating point

Mobilenet V1

Pixel 4 - Single Threaded CPU, October 2019

slide-80
SLIDE 80
  • Use quantization
  • Use pruning
  • Leverage hardware accelerator
  • Use mobile optimized model architecture
  • Per-op profjling

Common techniques to improve model pergormance

slide-81
SLIDE 81

Reduce precision of static parameters (e.g. weights) and dynamic values (e.g. activations)

Utilizing quantization for CPU, DSP & NPU optimizations

slide-82
SLIDE 82

Pruning

Remove connections during training in order to increase sparsity.

slide-83
SLIDE 83

Converuer

Running Your Model

Interpreter Op Kernels Delegates

Highly optimized for ARM Neon instruction set Accelerators like GPU, DSP and Edge TPU Integrate with Android Neural Network API

slide-84
SLIDE 84

Interpreter Core

Op Op Input Op Op Op Activation Op

CPU Operation Kernels Accelerator Delegate

Utilizing Accelerators via Delegates

slide-85
SLIDE 85
  • 2–7x faster than the fmoating point CPU implementation
  • Uses OpenGL & OpenCL on Android and Metal on iOS
  • Accepts fmoat models (fmoat16 or fmoat32)

GPU Delegation enables faster fmoat execution

slide-86
SLIDE 86
  • Use Hexagon delegate on Android O & below
  • Use NN API on Android P & beyond
  • Accepts integer models (uint8)
  • Launching soon!

DSP Delegation through Qualcomm Hexagon DSP

slide-87
SLIDE 87
  • Enables graph acceleration on DSP, GPU and NPU
  • Supporus 30+ ops in Android P, 90+ ops in Android Q
  • Accepts fmoat (fmoat16, fmoat32) and integer models (uint8)

Delegation through Android Neural Networks API

slide-88
SLIDE 88

88

/** Initializes an {@code ImageClassifier}. */ ImageClassifier(Activity activity) throws IOException { tfliteModel = loadModelFile(activity); delegate = new GpuDelegate(); tfliteOptions.addDelegate(delegate); tflite = new Interpreter(tfliteModel, tfliteOptions); ... }

slide-89
SLIDE 89

89

/** Initializes an {@code ImageClassifier}. */ ImageClassifier(Activity activity) throws IOException { tfliteModel = loadModelFile(activity); delegate = new NnApiDelegate(); tfliteOptions.addDelegate(delegate); tflite = new Interpreter(tfliteModel, tfliteOptions); ... }

slide-90
SLIDE 90

90

Model Comparison

Inception v3 Mobilenet v1 Top-1 accuracy 77.9% 68.3%

  • 11%

Top-5 accuracy 93.8% 88.1%

  • 6%

Inference latency 1433 ms 95.7 ms 15x faster Model size 95.3 MB 10.3 MB 9.3x smaller

slide-91
SLIDE 91

91

Per-op Profjling

bazel build -c opt \

  • -config=android_arm64 --cxxopt='--std=c++11' \
  • -copt=-DTFLITE_PROFILING_ENABLED \

//tensorflow/lite/tools/benchmark:benchmark_model adb push .../benchmark_model /data/local/tmp adb shell taskset f0 /data/local/tmp/benchmark_model

slide-92
SLIDE 92

92

Number of nodes executed: 31 ============================== Summary by node type ============================== [node type] [count] [avg ms] [avg %] [cdf %] CONV_2D 15 1.406 89.270% 89.270% DEPTHWISE_CONV_2D 13 0.169 10.730% 100.000% SOFTMAX 1 0.000 0.000% 100.000% RESHAPE 1 0.000 0.000% 100.000% AVERAGE_POOL_2D 1 0.000 0.000% 100.000%

Per-op Profjling

slide-93
SLIDE 93

Improving your operator coverage

slide-94
SLIDE 94
  • Utilize TensorFlow ops if op is not natively supporued
  • Only include required ops to reduce the runtime’s size

Expand operators, reduce size

slide-95
SLIDE 95
  • Enables hundreds more ops from TensorFlow on CPU
  • Caveat: Binary size increase (~6MB compressed)

Using TensorFlow operators

slide-96
SLIDE 96

96

import tensorflow as tf converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS] tflite_model = converter.convert()

  • pen("converted_model.tflite", "wb").write(tflite_model)
slide-97
SLIDE 97
  • Selectively include only the ops required by the model
  • Pares down the size of the binary

Reduce overall runtime size

slide-98
SLIDE 98

98

/* my_inference.cc */ // Forward declaration for RegisterSelectedOps. void RegisterSelectedOps(::tflite::MutableOpResolver* resolver); … ::tflite::MutableOpResolver resolver; RegisterSelectedOps(&resolver); std::unique_ptr<::tflite::Interpreter> interpreter; ::tflite::InterpreterBuilder(*model, resolver)(&interpreter); …

slide-99
SLIDE 99

99

gen_selected_ops( name = "my_op_resolver" model = ":my_tflite_model" ) cc_library( name = "my_inference", srcs = ["my_inference.cc", ":my_op_resolver"] )

slide-100
SLIDE 100

How to get starued

slide-101
SLIDE 101

+

Brand new course launched on Udacity for TensorFlow Lite

slide-102
SLIDE 102

Intro to embedded deep learning with

TensorFlow Lite

slide-103
SLIDE 103

Monthly meetups on embedded ML

  • Santa Clara
  • Austin
  • More coming soon!

tinyurl.com/tinyml-santaclara tinyurl.com/tinyml-austin

slide-104
SLIDE 104

Visit tensorglow.org/lite

Questions?

tflite@tensorflow.org