supervised sequential classification under budget
play

Supervised Sequential Classification Under Budget Constraints Kirill - PowerPoint PPT Presentation

Supervised Sequential Classification Under Budget Constraints Kirill Trapeznikov and Venkatesh Saligrama Boston University May 1st, 2013 1 / 50 Overview Introduce sequential decision problem Myopic approach: relies on current uncertainty to


  1. Supervised Sequential Classification Under Budget Constraints Kirill Trapeznikov and Venkatesh Saligrama Boston University May 1st, 2013 1 / 50

  2. Overview Introduce sequential decision problem Myopic approach: relies on current uncertainty to make a decision Consider synthetic examples Why it does not always work Our approach: incorporate future uncertainty in current decision Examine a two stage system Reduce to supervised learning Experiment Extend to Multiple Stages Generalization Results 2 / 50

  3. The Problem: Sequential Decision System cheap/fast slow/costly sensor sensor reject reject reject f K ( f 1 ( f 2 ( ) ) ) classify classify classify K stage decision system : Stage k can use sensor k for a cost c k Measurements can be high dimensional Order of stages/sensors is fixed 3 / 50

  4. The Problem: Sequential Decision System cheap/fast slow/costly sensor sensor reject reject reject f K ( f 1 ( f 2 ( ) ) ) classify classify classify K stage decision system : Stage k can use sensor k for a cost c k Measurements can be high dimensional Order of stages/sensors is fixed Decision at each stage: classify using current measurements, or request (reject to) next sensor 3 / 50

  5. The Problem: Sequential Decision System cheap/fast slow/costly sensor sensor reject reject reject f K ( f 1 ( f 2 ( ) ) ) classify classify classify K stage decision system : Stage k can use sensor k for a cost c k Measurements can be high dimensional Order of stages/sensors is fixed Decision at each stage: classify using current measurements, or request (reject to) next sensor Goal: Find decisions: F = { f 1 , f 2 , . . . , f K } trade-off error rate vs average acquisition cost 3 / 50

  6. Example Sensors of Increasing Resolutions classify handwritten digit images high resolution low resolution (expensive) (cheap) f 4 ( f 1 ( f 2 ( f 3 ( ) ) ) ) ? Do we need all sensors for every decision? 4 / 50

  7. Difficult Decision f 1 ( f 2 ( f 3 ( f 4 ( ) ) ) ) ? classify 8 5 / 50

  8. Difficult Decision f 1 ( reject f 2 ( f 3 ( f 4 ( ) ) ) ) ? classify 8 5 / 50

  9. Difficult Decision f 4 ( f 1 ( f 2 ( reject f 3 ( ) ) ) ) ? classify 8 5 / 50

  10. Difficult Decision reject f 1 ( f 2 ( f 3 ( f 4 ( ) ) ) ) ? classify 8 5 / 50

  11. Difficult Decision f 1 ( f 2 ( f 3 ( f 4 ( ) ) ) ) ? classify 8 high acquisition cost: need full resolution to make a decision 5 / 50

  12. Easy Decision f 1 ( f 2 ( f 3 ( f 4 ( ) ) ) ) ? classify 1 6 / 50

  13. Easy Decision reject f 3 ( f 4 ( f 1 ( f 2 ( ) ) ) ) ? classify 1 6 / 50

  14. Easy Decision f 1 ( f 2 ( f 3 ( f 4 ( ) ) ) ) ? classify 1 6 / 50

  15. Easy Decision f 1 ( f 2 ( f 3 ( f 4 ( ) ) ) ) ? classify 1 small acquisition cost: full resolution is unnecessary 6 / 50

  16. How to reduce sensor cost? Sensor 1 is cheap, Sensor 2 is expensive Sensor 2 Centralized Sensor 1 Non-Adaptive Sensor 2 Sensor 1 Sensor 1 7 / 50

  17. How to reduce sensor cost? Sensor 1 is cheap, Sensor 2 is expensive Sensor 2 Centralized Sensor 1 Non-Adaptive Sensor 2 Sensor 1 Sensor 1 Centralized strategy: use both sensors high cost, low error 7 / 50

  18. How to reduce sensor cost? Sensor 1 is cheap, Sensor 2 is expensive Sensor 2 Centralized Sensor 1 Non-Adaptive Sensor 2 Sensor 1 Sensor 1 Centralized strategy: Non-adaptive strategy: use both sensors only use sensor 1 high cost, low error low cost, high error 7 / 50

  19. A better strategy: be adaptive Only request 2nd sensor on difficult examples Stage 2 Decision Stage 1 Decision reject Sensor 2 Sensor 2 Sensor 1 Sensor 1 Sensor 1 classify Sensor 1 8 / 50

  20. How does it compare? Same error rate as centralized for half the cost Centralized 2nd sensor Error Rate .2 Non-adaptive cost = 1 .1 Adaptive .5 1 1st sensor cost=0 Average Cost / Sample 9 / 50

  21. Deciding to reject How to decide if to use the next sensor? cheap/fast expensive/slow sensor sensor x reject f 1 ( ) f 2 ( ) classify classify 10 / 50

  22. Deciding to reject How to decide if to use the next sensor? cheap/fast expensive/slow sensor sensor x reject f 1 ( ) f 2 ( ) classify classify Risk of a decision: min [ current uncertainty , α × cost + future uncertainty ] � �� � � �� � classify reject to next stage (uncertainty is in correct classification) Acquisition cost justify the reduction in uncertainty? 10 / 50

  23. Deciding to reject Risk = min [ current uncertainty , α × cost + future uncertainty ] � �� � � �� � reject to next stage classify Difficulty: sensor output is not known since it has not been acquired How to determine future uncertainty? Must base decision on collected measurements! 2nd sensor 1st sensor 11 / 50

  24. Myopic Approach Not clear how to determine uncertainty of the future: min [ current uncertainty , α × cost + future uncertainty ] � �� � � �� � classify reject to next stage 12 / 50

  25. Myopic Approach Not clear how to determine uncertainty of the future: min [ current uncertainty , α × cost + future uncertainty ] � �� � � �� � classify reject to next stage Ignore the future, and only use current uncertainty to make a decision: min [ current uncertainty , α × cost ] � �� � � �� � reject to next stage classify 12 / 50

  26. Myopic Approach Not clear how to determine uncertainty of the future: min [ current uncertainty , α × cost + future uncertainty ] � �� � � �� � classify reject to next stage Ignore the future, and only use current uncertainty to make a decision: min [ current uncertainty , α × cost ] � �� � � �� � reject to next stage classify Reduces to: � classify , uncertainly < threshold decision = reject , uncertainty ≥ threshold 12 / 50

  27. Myopic In Discriminative Setting Train a classifier at a stage h ( x ) Classifier uncertainty ≈ distance to decision boundary (margin) Small distance → high uncertainty Large distance → low uncertainty threshold reject to next stage h ( x ) Related work: [Liu et al., 2008] 13 / 50

  28. Example 1 Data: − 6 − 4 − 2 Sensor 2 0 2 4 6 − 6 − 4 − 2 0 2 4 6 Sensor 1 14 / 50

  29. Example 1 1st Stage Classifier: only utilizes Sensor 1 − 6 − 4 − 2 Sensor 2 0 2 4 6 − 6 − 4 − 2 0 2 4 6 Sensor 1 14 / 50

  30. Example 1 2nd Stage Classifier: utilizes Sensors 1 and 2 − 6 − 4 − 2 Sensor 2 0 2 4 6 − 6 − 4 − 2 0 2 4 6 Sensor 1 14 / 50

  31. Example 1 Myopic Reject Classifier Stage 1 Decision Stage 2 Decision − 6 − 6 − 4 − 4 Reject − 2 − 2 0 0 2 2 4 4 6 6 − 6 − 4 − 2 0 2 4 6 − 6 − 4 − 2 0 2 4 6 Classify − 6 − 4 − 2 0 2 4 6 − 6 − 4 − 2 0 2 4 6 15 / 50

  32. Example 1 Myopic Reject Classifier Requests sensor 2 where sensor 1 is ambiguous Current uncertainty seems to be a good criteria to reject reject to 2nd stage (request 2nd sensor) − 6 − 4 − 2 Sensor 2 0 2 4 6 − 6 − 4 − 2 0 2 4 6 Sensor 1 16 / 50

  33. Example 1: Error vs Budget sweep threshold to generate different operating points − 6 0.09 optimal myopic 0.08 − 4 0.07 − 2 0.06 Sensor 2 Error 0.05 0 0.04 2 0.03 4 0.02 0.01 6 0 0.2 0.4 0.6 0.8 1 − 6 − 4 − 2 0 2 4 6 Budget Sensor 1 Good performance: close to optimal, seems to work 17 / 50

  34. Example 2 2.5 Sensor 2 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 Sensor 1 18 / 50

  35. Example 2 1st Stage Classifier: only utilizes Sensor 1 2.5 Sensor 2 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 Sensor 1 18 / 50

  36. Example 2 2nd Stage Classifier: utilizes Sensors 1 and 2 2.5 Sensor 2 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 Sensor 1 18 / 50

  37. Example 2 Region 1 separable only with sensor 2 2.5 Sensor 2 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 Sensor 1 18 / 50

  38. Example 2 Region 2 neither sensor helps 2.5 Sensor 2 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 Sensor 1 18 / 50

  39. Example 2 Myopic Reject Decision Sensor 1 uncertainty is equally distributed between regions 1 and 2 Uniformly rejects in both regions reject to 2nd stage 2.5 2 Sensor 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 Sensor 1 19 / 50

  40. Example 2 Myopic Reject Decision Current uncertainty is equally distributed between regions 1 and 2 Without future uncertainty cannot tell where sensor 2 is useful 0.26 myopic optimal 0.24 2.5 0.22 2 0.2 error 1.5 0.18 1 0.16 0.5 0.14 0 0.12 0 0.5 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 budget 20 / 50

  41. Myopic Myopic Works Myopic Fails 0.26 0.09 myopic optimal optimal myopic 0.08 0.24 0.07 0.22 0.06 0.2 Error error 0.05 0.18 0.04 0.16 0.03 0.02 0.14 0.01 0.12 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 budget Budget 21 / 50

  42. Future Uncertainty is Important Need to incorporate future uncertainty in the decision min [ current uncertainty , α × cost + future uncertainty ] � �� � � �� � classify reject to next stage 22 / 50

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend