Model Assertions for Monitoring and Improving ML Models Daniel Kang* - PowerPoint PPT Presentation

Model Assertions for Monitoring and Improving ML Models Daniel Kang* , , Deepti Raghavan*, Peter Bailis, Matei Zaharia DAWN Project, Stanford InfoLab http://dawn.cs.stanford.edu/ 1

Machine learning is deployed in mission-critical settings with few checks » Errors can have life-changing consequences Tesla’s autopilot repeatedly » No standard way of quality accelerated towards lane dividers assurance! Uber autonomous vehicle involved in fatal crash 2

Software 1.0 is also deployed in mission- critical settings! Important software goes through rigorous engineering / QA process » Assertions » Unit tests » Regression tests » Fuzzing Software powers medical devices, etc. » … 3

Our research: Can we design QA methods that work across the ML deployment stack? This talk: Model assertions a method for checking outputs of models for both runtime monitoring and improving model quality 4

Key insight: models can make systematic errors Cars should not flicker in and Boxes of cars should not out of video highly overlap (see paper for examples) We can specify errors in models without knowing root causes or fixes! 5

“As the [automated driving system] changed the classification of the pedestrian several times— alternating between vehicle, bicycle, and an other — the system was unable to correctly predict the path of the detected object,” the board’s report states. 6

Model assertions at deployment time Frame 2 Frame 3 Frame 1 assert(cars should not flicker in and out) Runtime Corrective monitoring action 7

Model assertions at train time Active learning Human- generated labels Set of inputs Model that triggered retraining assertion Weak labels Weak supervision via correction rules 8

Outline » Using model assertions » Ov Overview » For active learning » For weak supervision » For monitoring » Model assertions API & examples » Evaluation of model assertions 9

Model assertions in context Many users, potentially not the model builders, can collaboratively add assertions 10

Outline » Using model assertions » Overview » Fo For active learni ning ng » For weak supervision » For monitoring » Model assertions API & examples » Evaluation of model assertions 11

How should we select data points to label for active learning? Assertion 1 » Many assertions can flag the same data point » The same assertion can flag many data points » Which points should we label? Assertion 2 12

How should we select data points to label for active learning? » We designed a bandit algorithm for data selection (BAL) » Idea: select model assertions with highest reduction in assertions triggered 13

Outline » Using model assertions » Overview » For active learning » Fo For weak k sup upervision » For monitoring » Model assertions API & examples » Evaluation of model assertions 14

Correction rules for weak supervision: flickering Frame 1 Frame 2 Frame 3 Frame two is filled in from surrounding frames 15

Automatic correction rules: consistency API Identifier Time Attribute 1 Attribute 2 stamp (gender) (hair color) 1 1 M Brown 1 2 M Black 1 4 F Brown 2 5 M Grey Propose ‘M’ as an updated label 16

Outline » Using model assertions » Mo Model a asse ssertions A s API & & e exa xamples » Evaluation of model assertions 17

Specifying model assertions: black-box functions over model inputs and outputs def flickering( recent_frames: List[PixelBuf], recent_outputs: List[BoundingBox] ) -> Float Model assertion inputs are a history Model assertions output a severity of inputs and predictions score, where a 0 is an abstension 18

Predictions from different AV sensors should agree 19

Assertions can be specified in little code def sensor_agreement(lidar_boxes, camera_boxes): failures = 0 for lidar_box in lidar_boxes: if no_overlap(lidar_box, camera_boxes): failures += 1 return failures 20

Specifying model assertions: consistency API Identifier Time Attribute 1 Attribute 2 stamp (gender) (hair color) 1 1 M Brown 1 2 M Black 1 4 F Brown 2 5 M Grey Attributes with the same Transitions cannot happen identifier must agree too quickly 21

Model assertions for TV news analytics Overlapping boxes in the same Automatically specified via scene should agree on attributes consistency assertions 22

Model assertions for ECG readings Normal AF Normal Classifications should not change from Automatically specified via normal to AF and back within 30 seconds consistency assertions 23

Outline » Using model assertions » Model assertions and examples » Evaluation of model assertions » Ev Evaluation se setup » Evaluating the precision of model assertions (monitoring) » Evaluating the accuracy gains from model assertions (training) 24

Evaluation setup: datasets and tasks Setting Task Model Assertions Visual Object SSD Flicker, appear, analytics detection multibox Autonomous Object SSD, Consistency, vehicles detection VoxelNet multibox ECG analysis AF detection ResNet-34 Consistency TV news Identifying TV Several Consistency news hosts 25

Evaluation Setup: Examples Medical time Security camera footage, Point cloud data series data original SSD (NuScenes) 26

Outline » Model assertions and examples » Using model assertions » Evaluation of model assertions » Evaluation setup » Ev Evaluating the precisi sion of model asse ssertions s (monitoring) » Evaluating the accuracy gains from model assertions (training) 27

Evaluating Model Assertion Precision: Can assertions catch mistakes? Assertion True Positive Rate Flickering 96% Multibox 100% v Appearing 88% LIDAR 100% ECG 100% 28

Outline » Model assertions and examples » Using model assertions » Evaluation of model assertions » Evaluation setup » Evaluating the precision of model assertions (monitoring) » Ev Evaluating the accuracy gains s from model asse ssertions s (training) 29

Evaluating Model Quality after Retraining: Metrics » Video analytics: box mAP » Autonomous vehicle sensing: box mAP » AF classification: accuracy 30

Evaluating Model Quality after Retraining (multiple assertions): Can collecting training data via assertions improve model quality via active learning? » Finetuned model with 100 100 exam examples les each each rou ound » 3 assertions to choose frames from: » Flickering » Multibox » Appearing » Compare against: » Random sampling » Uncertainty sampling » Randomly sampling from assertions 31

Model assertions can be used for active learning more efficiently than alternatives (video analytics) Using assertions outperforms uncertainty and random sampling Our bandit algorithm outperforms uniformly sampling from assertions 32

Model assertions also outperform on autonomous vehicle datasets (NuScenes) Using assertions outperforms uncertainty and random sampling 33

Evaluating Model Quality after Retraining: Can correction rules improve model quality without human labeling via weak supervision? Using weak .45 supervision to label training examples caught by assertions improves model quality. Full experimental details in paper 34

Further results in paper » Model assertions can find high confidence errors » Model assertions for validating human labels (video analytics) » Active learning results with a single model assertion (ECG) Incorrect annotation from Scale AI 35

Future work » What is the language to specify model assertions? » How can we choose thresholds in model assertions automatically? » How can we apply model assertions to other domains such as text? 36

Conclusion: Assertions can be Useful in ML! No standard way of doing quality assurance for ML » Model assertions can be used for: » Monitoring ML at deployment time » Improving models at train time » Preliminary results show significant model improvement ddkang@stanford.edu 37

Model Assertions for Monitoring and Improving ML Models Daniel Kang* - PowerPoint PPT Presentation

Model Assertions for Monitoring and Improving ML Models Daniel Kang* , , Deepti Raghavan*, Peter Bailis, Matei Zaharia DAWN Project, Stanford InfoLab http://dawn.cs.stanford.edu/ 1 Machine learning is deployed in mission-critical settings with

Prediction of Prediction of Class and Property Assertions Class and Property Assertions on OWL

Assertions and Triggers Rose-Hulman Institute of Technology Curt Clifton Assertions Like

1 Java assert statement Java assert (cont.) an assert statement can be placed anywhere you

ASYNCHRONOUS ASSERTIONS What are assertions? public class ATM { public void

Assertions, pre/post- conditions Assertions: Section 4.2 in Savitch (p. 239) Programming as a

Types of Formal Specifications for Assertions Concurrent and Reactive Systems Assertions

Assertions, Denials Questions, Answers & the Common Ground Greg Restall dianoia institute (

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Improving security using data flow assertions Alex Yip, Xi Wang, Nickolai Zeldovich , Frans

2016 Coordinated Monitoring Schedule 1 Navigation of Coordinated Monitoring website

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

for innovation improving for innovation improving Design Thinking for innovation improving New

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

From Conceptual Models From Conceptual Models to Simulation Models to Simulation Models Model

Surveillance Programs - GLNPO Cooperative Monitoring Coordinated Science and Monitoring

Fuel Monitoring Presentation Fuel Monitoring We specialize in fuel monitoring also can customize

An Edge-Cloud System Model for Autonomous Vehicles Yu Sasaki ,Tomoya Sato , Hiroyuki

Project Goal An integrated control system that improves fuel economy 20% on light-duty vehicles

Motion Planning in Urban Environments

Outline New Developments in Point-Stabilization and Path-Following First Part: Point

An Ultra-large Scale Perspective on Autonomous Vehicles John D. McGregor johnmc@clemson.edu 1

Iterative Learning of Feed Forward Corrections for High Performance Tracking Fabian L.

Analysis and Optimization of Yee_Bench using Hardware Performance Counters Ulf Andersson , Philip

Automatic Management of TurboMode David Lo Christos Kozyrakis Stanford University

Model Assertions for Monitoring and Improving ML Models Daniel Kang* - PowerPoint PPT Presentation

Model Assertions for Monitoring and Improving ML Models Daniel Kang* , , Deepti Raghavan*, Peter Bailis, Matei Zaharia DAWN Project, Stanford InfoLab http://dawn.cs.stanford.edu/ 1 Machine learning is deployed in mission-critical settings with

Prediction of Prediction of Class and Property Assertions Class and Property Assertions on OWL

Assertions and Triggers Rose-Hulman Institute of Technology Curt Clifton Assertions Like

1 Java assert statement Java assert (cont.) an assert statement can be placed anywhere you

ASYNCHRONOUS ASSERTIONS What are assertions? public class ATM { public void

Assertions, pre/post- conditions Assertions: Section 4.2 in Savitch (p. 239) Programming as a

Types of Formal Specifications for Assertions Concurrent and Reactive Systems Assertions

Assertions, Denials Questions, Answers &amp; the Common Ground Greg Restall dianoia institute (

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Improving security using data flow assertions Alex Yip, Xi Wang, Nickolai Zeldovich , Frans

2016 Coordinated Monitoring Schedule 1 Navigation of Coordinated Monitoring website

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

for innovation improving for innovation improving Design Thinking for innovation improving New

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

From Conceptual Models From Conceptual Models to Simulation Models to Simulation Models Model

Surveillance Programs - GLNPO Cooperative Monitoring Coordinated Science and Monitoring

Fuel Monitoring Presentation Fuel Monitoring We specialize in fuel monitoring also can customize

An Edge-Cloud System Model for Autonomous Vehicles Yu Sasaki ,Tomoya Sato , Hiroyuki

Project Goal An integrated control system that improves fuel economy 20% on light-duty vehicles

Motion Planning in Urban Environments

Outline New Developments in Point-Stabilization and Path-Following First Part: Point

An Ultra-large Scale Perspective on Autonomous Vehicles John D. McGregor johnmc@clemson.edu 1

Iterative Learning of Feed Forward Corrections for High Performance Tracking Fabian L.

Analysis and Optimization of Yee_Bench using Hardware Performance Counters Ulf Andersson , Philip

Automatic Management of TurboMode David Lo Christos Kozyrakis Stanford University

Assertions, Denials Questions, Answers & the Common Ground Greg Restall dianoia institute (