t rust b ut v erify
play

T RUST B UT V ERIFY Optimistic Visualizations of Approximate Queries - PowerPoint PPT Presentation

T RUST B UT V ERIFY Optimistic Visualizations of Approximate Queries for Exploring Big Data Dominik Moritz @domoritz Danyel Fisher @FisherDanyel Bolin Ding @AtlasDing Chi Wang Paul G. Allen School of CSE HCI and DMX University of Washington


  1. T RUST B UT V ERIFY Optimistic Visualizations of Approximate Queries for Exploring Big Data Dominik Moritz @domoritz Danyel Fisher @FisherDanyel Bolin Ding @AtlasDing Chi Wang Paul G. Allen School of CSE HCI and DMX University of Washington Microsoft Research 1

  2. What's the distribution of flight distances? 2

  3. Visual Analysis $ wget https://www.transtats.bts.gov/download.zip ==========================================> 70GB -> Done $ import download.zip -> Done $ SELECT bin(distance), count(*) 
 FROM flights -> Running Query. Please wait ... Computer by Simple Icons from the Noun Project 3 analyst by Gregor Cresnar from the Noun Project

  4. Visual Analysis Computer by Simple Icons from the Noun Project 6 analyst by Gregor Cresnar from the Noun Project

  5. Big Data Visual Analysis Query finished! 7 Coffee by jeff from the Noun Project

  6. State of the Art in Big Data Exploration $ SELECT bin(distance), count(*) 
 FROM flights -> $ SELECT bin(distance), count(*) 
 FROM flights 
 WHERE airline = 'hi' -> Running Query. Please wait ... 8

  7. State of the Art in Big Data Exploration Distributed Systems 
 Expensive and high latency. Indexes (Data Cubes) 
 Requires pre computation and limited queries. Sampling 
 Sampling Use a representative subset of the data. Rubik's Cube by Aleks from the Noun Project 10 Cluster servers by Branis Panos from the Noun Project

  8. Sampling and Approximate Query Processing (AQP) Use a representative subset of the data and estimate the true values of aggregate results. 11

  9. Sampling and Approximate Query Processing (AQP) Use a representative subset of the data and estimate the true values of aggregate results. Decide on acceptable uncertainty or timeout Sum of 25% = 42 Sum of 100 % = 168 ±10 Uncertainty Estimate 12

  10. Progressive Visualization with Online Aggregation Growing sample ➞ continuously improving results Analysts watch updates until bounds errors are low enough Sum of 25% = 42 Sum of 35% = 59 Sum of 50% = 84 Sum of 100 % = 168 ±10 Sum of 100 % = 168 ±5 Sum of 100 % = 168 ±1 Query finished! 13

  11. Challenges with AQP $ SELECT bin(distance), count(*) 
 FROM flights 
 WHERE airline = 'hi' -> No Results $ SELECT bin(distance), count(*) 
 FROM flights 
 WHERE airline = 'ha' -> Running Query. Please wait ... 14

  12. Challenges with AQP Max Approximate results ➞ Convey uncertainty Probabilistic guarantees Unbounded errors Estimate Arbitrary aggregation or joins 15

  13. Optimistic Visualization A UX approach to challenges with AQP traditionally treated as database problems. 16

  14. Optimistic Visualization Assume that approximation is mostly right but offer a way to detect and recover from mistakes. Analysts use initial estimates, run precise query in background, and confirm results later. Gives users confidence in using AQP. 17

  15. Pangloss implements Optimistic Visualization Query Specification 18

  16. Pangloss implements Optimistic Visualization Visualization View 19

  17. Pangloss implements Optimistic Visualization Approximation Expected Error (Uncertainty) 20

  18. Pangloss implements Optimistic Visualization Annotation + Remember Button 21

  19. Pangloss implements Optimistic Visualization History 22

  20. Pangloss implements Optimistic Visualization 23

  21. 170 Million flights (30 years). ~100ms query time

  22. Text annotations help analysts clarify observations.

  23. "Remember" button moves query into the background

  24. Continue exploration without waiting

  25. Orange ➞ Approximate Blue ➞ Precise

  26. Difference Visualization

  27. Evaluation Lab Study Case Study 5 users 3 teams Flight delay data 
 Product insights, 
 (170 Million records) 
 Social media, 
 Bing 1 hour each ~1+ hour exploration 30

  28. Findings from the study AQP works : “seeing something right away at first glimpse is really great” Optimism works : “I was thinking what to do next— and I saw that it had loaded, so I went back and checked it . . . [the passive update is] very nice for not interrupting your workflow.” Need for guarantees : “[with a competitor] I was willing to wait 70-80 seconds. It wasn’t ideally interactive, but it meant I was looking at all the data.” 31

  29. Findings from the study (cont) “When I’m using your system, there is a path that I need to follow.” “Now that I’ve been sitting here for an hour, after I go back, it makes a lot of sense [to have these annotations], but as I was doing it, I was thinking, ‘I want to move on, I want to move on.” 32

  30. Conclusions Fundamental problems with AQP addressed as UX problem Gives analysts confidence in AQP Future: Alerting, Remembering, Progressive + Optimistic 33

  31. AQP needs Multi-Disciplinary Solutions Danyel - HCI Chi - DB Dominik - Vi+DB Bolin - DB 34

  32. Implications for the Database Community HILDA at SIGMOD 2017

  33. Trust But Verify: Optimistic Visualizations for AQP Dominik Moritz @domoritz Fundamental problems with AQP Danyel Fisher @FisherDanyel addressed as UX problem Bolin Ding @AtlasDing Optimistic Visualization gives analysts Chi Wang confidence in AQP Integrates well into existing Visual Analysis tools Future: Alerting, Remembering, Progressive Details: bit.ly/2pwQQg7 Query finished! 36

  34. Backup Slides 37

  35. Histogram of Distances for Hawaiian Airlines 38

  36. Distribution Uncertainty Approximation Within Distribution Uncertainty Distribution Uncertainty: 4 Error: 4 Sum: 12 2 2 2 2 2 2 2 ⅔ 1 ⅓ 2 ⅔ 1 ⅓ 2 ⅔ 1 ⅓ Outside Distribution Uncertainty Error: 4 Error: 6 Sum: 12 Sum: 12 3 1.8 1.8 1.8 1.8 1.8 3 1 3 1 3 1 39

  37. Distribution Uncertainty 40

  38. Filtering can show new groups new predicate → new query → different sample → different groups 41

  39. Precise results can show new groups Approximate Precise 42

  40. Vocabulary of visual cues Heatmap Barchart 43

Recommend


More recommend