Sense Making in an IOT World: Sensor Data Analysis with Deep - - PowerPoint PPT Presentation

sense making in an iot world sensor data analysis with
SMART_READER_LITE
LIVE PREVIEW

Sense Making in an IOT World: Sensor Data Analysis with Deep - - PowerPoint PPT Presentation

GTC 2016 Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager Deep learning proof points as of today Vision Speech Text Other Search & information Interactive voice


slide-1
SLIDE 1

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Natalia Vassilieva, PhD Senior Research Manager

GTC 2016

slide-2
SLIDE 2

Deep learning proof points as of today

2

Text Vision Speech Other

Search & information extraction Security/Video surveillance Self-driving cars Robotics Interactive voice response (IVR) systems Voice interfaces (Mobile, Cars, Gaming, Home) Security (speaker identification) Health care People with disabilities Search and ranking Sentiment analysis Machine translation Question answering Recommendation engines Advertising Fraud detection AI challenges Drug discovery Sensor data analysis Diagnostic support

slide-3
SLIDE 3

Why Deep Learning & Sensor Data?

Deep Learning is about …

– Huge volumes of training data (labeled and unlabeled) – Multidimensional and complex data with non- trivial patterns (spatial or temporal) – Replacement of manual feature engineering with unsupervised feature learning – Cross modality feature learning

Sensor Data is about …

– Huge volumes of data (mostly unlabeled) – Complex data with non-trivial patterns (mostly temporal) – Variety of data representations, feature engineering is hard – Multiple modalities 3

Works well for speech! Most sensor data is time series

slide-4
SLIDE 4

This talk

4

Does Deep Learning work for sensor data? Do existing infrastructure and algorithms fit sensor data? The Machine and Distributed Mesh Computing

slide-5
SLIDE 5

Part I

Case Study: Sensor Data Analysis with Deep Learning

5

slide-6
SLIDE 6

Patient activity recognition from accelerometer data

– Scripted video and accelerometer data from one sensor and 52 subjects (~20 min per subject) – Accelerometer data: 500Hz x 4 dimensions = 12000 measurements per minute per person – 16 classes

slide-7
SLIDE 7

Accelerometer data (X axes)

7

slide-8
SLIDE 8

Data distribution

8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Total number of frames: ~3.35M

slide-9
SLIDE 9

Approaches

9

Baselines

  • ZeroR (majority class predictor)
  • Support Vector Machines
  • Decision Trees (C50 implementation)
  • Shallow Neural Networks (FANN library)
  • Features: manually engineered

Deep Neural Networks

  • Fully-connected hidden layers (pre-trained with stacked sparse autoencoders) + softmax
  • Time-delay layers
  • Recurrent layers
  • Deep Neural Networks and Conditional Random Fields
  • Multiple meta-parameter configurations
  • Features: amplitude spectrum
slide-10
SLIDE 10

Approaches

10

Baselines

  • ZeroR (majority class predictor)
  • Support Vector Machines
  • Decision Trees (C50 implementation)
  • Shallow Neural Networks (FANN library)
  • Features: manually engineered

Deep Neural Networks

  • Fully-connected: stacked sparse autoencoders + softmax
  • Time-delay layers
  • Recurrent layers
  • Deep Neural Networks and Conditional Random Fields
  • Multiple meta-parameter configurations
  • Features: amplitude spectrum
slide-11
SLIDE 11

Growing class-separation power of deeper representations

Raw data First level of representations Second level of representations

slide-12
SLIDE 12

Results: single person

Baseline methods, engineered features

12

set type size train 388518 cv 129510 test 129510 total 647538

ZeroR SVM (binary) Shallow NN c50 SVM Accuracy 71.2 97.6 98.6 99.6 98.03

Deep Neural Networks, amplitude spectrum

1533-200-200-16 Accuracy 99.7

slide-13
SLIDE 13

Results: 52 subjects, subject-independent

13

ZeroR c50 DNN DNN + CRF Accuracy 69.7 71.6 84.5 95.1

set type size train 2608637 cv test 738180 total 3346817

Baselines v.s. DNN

DNN DNN + CRF True labels

slide-14
SLIDE 14

Results: 52 subjects, subject-independent

14

ZeroR c50 DNN DNN + CRF Accuracy 69.7 71.6 84.5 95.1

set type size train 2608637 cv test 738180 total 3346817

Baselines v.s. DNN Deep models:

  • are better at classification on sensor data (generalize better)
  • do not require sophisticated feature engineering
  • require significant amount of iterations to converge
slide-15
SLIDE 15

Part II

Today’s infrastructure and Deep Learning

15

slide-16
SLIDE 16

Today’s scale

Model size, data size, compute requirements

Application Model Training data FLOP per epoch Training time Vision

1.7 * 109 ~6.8 GB 14*106 images ~2.5 TB (256x256) ~10 TB (512x512) 6*1.7*109*14*106 ~1.4*1017 3 days x 16000 cores 2 days x 16 servers x 4 GPUs 8 hours x 36 servers x 4 GPUs

Speech

60 * 106 ~240 MB 100K hours of audio ~34*109 frames ~50 TB 6*60*106*34*109 ~1.2*1019 days x 8 GPUs

Text

6.5 * 106 ~260 MB 856*106 words 6*6.5*106*856*106 ~3.3*1016 4 weeks

Signals

1.2 * 106 ~4.8 MB 3*106 frames 6*1.2*3*106*3*106 6.5*1013 days

slide-17
SLIDE 17

Challenges of DNN training

Slow and expensive

– Very large number of parameters (>106), huge (unlabeled) data sets for training (106 - 109) – Computationally expensive: requires O(model size * data size) FLOPs per epoch – Needs many iterations (and epochs) to converge – Needs frequent synchronization to converge fast

17

Today’s hardware:

NVIDIA Titan X: 7 TFLOPS SP, 12 GB memory NVIDIA Tesla M40: 7 TFLOPS SP, 12 GB memory NVIDIA Tesla K40: 4.29 TFLOPS SP, 12 GB memory NVIDIA Tesla K80: 5.6 TFLOPS SP, 24 GB memory INTEL Xeon Phi: 2.4 TFLOPS SP

Compute requirements today:

1013 – 1019 FLOPs per epoch 1 epoch per hour: ~10x TFLOPS SP

slide-18
SLIDE 18

Scalability of DNN training for time series

Hard to scale

18

  • J. Dean et. al, Large Scale Distributed Deep Networks

– Google Brain: 1000 machines (16000 CPUs) x 3 days – COTS HPC systems: 16 machines x 4 GPUs x 2 days – Deep Image by Baidu: 36 machines x 4 GPUs x ~8 hours – Deep Speech by Baidu: 8 GPUs x ~weeks – Deep Speech 2 by Baidu: 8 or 16 GPUs x 3 to 5 days Limited scalability of training for speech/time-series data!

slide-19
SLIDE 19

Types of artificial neural networks

Topology to fit data characteristics

Images: Locally Connected Convolutional Speech, time series, sequences: Fully Connected, Recurrent

19 Input Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 Output Input Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 Output

slide-20
SLIDE 20

Today’s hardware (scale-out or scale-up)

20 NUMA node 1 CPU Memory NUMA node 2 CPU Memory QPI link NUMA node 3 CPU Memory NUMA node 4 CPU Memory QPI link

CPU/GPU cluster Multi-socket large memory machine

GPU GPU Memory CPU

PCIe

GPU GPU Memory CPU

PCIe

InfiniBand

InfiniBand: ~12 GB/s PCIe: ~16 GB/s QPI link: ~12.8 GB/s per direction

slide-21
SLIDE 21

Part III

The Machine and Distributed Mesh Computing

21

slide-22
SLIDE 22

Processor-centric computing

22

Memory-Driven Computing

SoC SoC SoC SoC

Memory Memory Memory Memory

Memory +

Fabric

SoC SoC SoC SoC

slide-23
SLIDE 23

23

I/O Copper

slide-24
SLIDE 24

24

Copper

slide-25
SLIDE 25

25

Copper

slide-26
SLIDE 26

26

slide-27
SLIDE 27

Processor-centric computing

27

GPU ASIC

Quantum

RISC V

Open Architecture

Memory-Driven Computing

Memory Memory Memory Memory

SoC SoC SoC SoC

slide-28
SLIDE 28

The Machine will be ported to different scales and form-factors

28

NVM = Non-volatile memory

The Machine

Many cores Massive pool of NVM Photonics

slide-29
SLIDE 29

The evolution of the IoT

Things on a network

Still works well for small, local, custom systems with strict performance needs

Gen 1 Today Gen 2 Tomorrow Gen 3 The future

The Cloud- centric IoT

Good choice for low-cost “things” where data can easily be moved, with few ramifications

Edge analytics

Ideal for “things” producing large volumes of data that are difficult, costly or sensitive to move

Distributed Mesh Computing

Multi-party “things” autonomously collaborate with privacy intact

Gen 0 Yesteryears

29

slide-30
SLIDE 30

Tomorrow: Deep Learning and Edge Analytics

30

Center Collects all data Trains model Sends model to edge nodes Edge Node Gets trained model Uses the model in real-time Collects data Sends some data to center

slide-31
SLIDE 31

The Future: Deep Learning and Distributed Mesh Computing

31

The Mesh Distributed training Sends model as needed Edge Node Participate in Training Uses the model in real-time Collects data Sends some data in mesh

slide-32
SLIDE 32

Summary Does Deep Learning work for sensor data? Do existing infrastructure and algorithms fit sensor data? The Machine and Distributed Mesh Computing Yes, we have proof points No, training deep models for sensor data is slow and expensive today We believe this changes everything

24

slide-33
SLIDE 33

Thank you!

33

Natalia Vassilieva nvassilieva@hpe.com

To learn more about Hewlett Packard Labs, visit: http://www.labs.hpe.com To learn more, visit www.hpe.com/themachine