The human understanding, on account of its own nature, readily - - PowerPoint PPT Presentation

the human understanding on account of its own nature
SMART_READER_LITE
LIVE PREVIEW

The human understanding, on account of its own nature, readily - - PowerPoint PPT Presentation

The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there. Francis Bacon, 1620


slide-1
SLIDE 1

“The human understanding, on account of its

  • wn nature, readily supposes a greater order

and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.” —Francis Bacon, 1620

Monday, May 16, 2011
slide-2
SLIDE 2

“The human understanding, on account of its

  • wn nature, readily supposes a greater order

and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.” —Francis Bacon, 1620

Is what we see really there?

Monday, May 16, 2011
slide-3
SLIDE 3

May 2011

Hadley Wickham, Dianne Cook, Heike Hofmann, Andreas Buja, Mahbubul Majumder

Graphical inference

Monday, May 16, 2011
slide-4
SLIDE 4
  • 1. Line up protocol
  • 2. Rorschach protocol
  • 3. Case study
  • 4. Future work
Monday, May 16, 2011
slide-5
SLIDE 5

Line up

Monday, May 16, 2011
slide-6
SLIDE 6 Monday, May 16, 2011
slide-7
SLIDE 7

7 of those plots were null plots, plots of data drawn from the null hypothesis: a quadratic relationship between x and y. 1 plot was the real data. Under the null hypothesis, there is a 1/20 chance of picking the correct

  • plot. If we do pick it as being

different, we have a p-value of 0.05 We have just performed a statistically valid test!

Monday, May 16, 2011
slide-8
SLIDE 8

Protocol

Generate n-1 decoys (null datasets) Plot the decoys + the real data (randomly positioned) Show to an impartial observer. Can they spot the real data? If so, you have evidence for true difference (p-value = 1/n)

Monday, May 16, 2011
slide-9
SLIDE 9
  • E. L. Scott, C. D. Shane, and M. D. Swanson. Comparison of the synthetic and actual distribution of galaxies on a

photographic plate. Astrophysical Journal, 119:91–112, Jan. 1954.

Monday, May 16, 2011
slide-10
SLIDE 10
  • A. M. Noll. Human or machine: A subjective comparison of Piet Mondrian’s “composition with lines” (1917) and a computer-

generated picture. The Psychological Record, 16:1–10, 1966.

Monday, May 16, 2011
slide-11
SLIDE 11

Plot Task Scatterplot Are the two variables independent? Tag cloud Do the words come from the same distribution? Time series Is there a trend in mean or variability? Choropleth map Is there a spatial trend?

Monday, May 16, 2011
slide-12
SLIDE 12

believe believe

case

case closely

closely descendants

descendants few few

long long modified

modified variations variations very

very view view

believe believe

case case closely

closely descendants

descendants few few

long long modified

modified variations

variations very

very view view

believe believe

case

case closely

closely descendants

descendants few few

long long modified

modified variations variations very

very view view

believe believe

case case closely

closely descendants descendants few few

long long modified

modified variations

variations very

very view view

believe believe

case

case closely

closely descendants

descendants few few

long long modified

modified variations

variations very

very view view

Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it?

Monday, May 16, 2011
slide-13
SLIDE 13

believe believe

case

case closely

closely descendants

descendants few few

long long modified

modified variations variations very

very view view

believe believe

case case closely

closely descendants

descendants few few

long long modified

modified variations

variations very

very view view

believe believe

case

case closely

closely descendants

descendants few few

long long modified

modified variations variations very

very view view

believe believe

case case closely

closely descendants descendants few few

long long modified

modified variations

variations very

very view view

believe believe

case

case closely

closely descendants

descendants few few

long long modified

modified variations

variations very

very view view

Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it?

Monday, May 16, 2011
slide-14
SLIDE 14 Monday, May 16, 2011
slide-15
SLIDE 15 Monday, May 16, 2011
slide-16
SLIDE 16

Once we’ve seen the plot, we’re no longer impartial

Monday, May 16, 2011
slide-17
SLIDE 17

Solutions

Show to colleagues/collaborators Automated visual testing service using amazon mechanical turk

Monday, May 16, 2011
slide-18
SLIDE 18 Monday, May 16, 2011
slide-19
SLIDE 19
  • vs. classical tests

Of course, if we know what we’re looking for, we can always develop an algorithm

  • r numerical test.

The advantage of visual inference is that works for very general tasks, including when you don’t know exactly what you’re looking for.

Monday, May 16, 2011
slide-20
SLIDE 20

! Power

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 sigma = 12 !15 !10 !5 5 10 15 sigma = 5 !15 !10 !5 5 10 15 sample size = 100 sample size = 300 power_curve Theoretical test Visual test lower_CL upper_CL

Recent work suggest that power

  • nly a little worse than classical test
Monday, May 16, 2011
slide-21
SLIDE 21

Rorschach

Monday, May 16, 2011
slide-22
SLIDE 22

Rorschach

We’re surprisingly bad at appreciating the amount of variation in random data. Showing only null plots is a good way to calibrate our intuition. We also plan on using these plots as an empirical tool to understand what features people pick up on. Anecdotally, undergrads focus too much on outliers

Monday, May 16, 2011
slide-23
SLIDE 23

result count

20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 1 4 7 0.0 0.2 0.4 0.6 0.8 1.0 2 5 8 0.0 0.2 0.4 0.6 0.8 1.0 3 6 9 0.0 0.2 0.4 0.6 0.8 1.0

Monday, May 16, 2011
slide-24
SLIDE 24

Case study

Monday, May 16, 2011
slide-25
SLIDE 25

displ cty

10 15 20 25 30 35

  • 2

3 4 5 6 7 factor(year)

  • 1999
  • 2008
Monday, May 16, 2011
slide-26
SLIDE 26

displ 1/cty * 100

4 6 8 10

  • 2

3 4 5 6 7 factor(year)

  • 1999
  • 2008
Monday, May 16, 2011
slide-27
SLIDE 27

displ 1/cty * 100

4 6 8 10 4 6 8 10 4 6 8 10 4 6 8 10 1

  • ●●
  • 6
  • 11
  • ● ●
  • 16
  • ●●
  • 2

3 4 5 6 7 2

  • ● ●
  • ● ●
  • 7
  • ●●
  • ● ●
  • 12
  • ●●
  • 17
  • ● ●
  • ●●
  • 2

3 4 5 6 7 3

  • ● ● ●
  • ● ●
  • ● ●
  • 8
  • ●●
  • 13
  • ●●
  • ● ●
  • 18
  • ● ●
  • 2

3 4 5 6 7 4

  • ● ●
  • ●●
  • ● ●
  • 9
  • ●●
  • ●●
  • 14
  • ● ●
  • 19
  • ●● ●
  • 2

3 4 5 6 7 5

  • 10
  • ●●
  • 15
  • ●●
  • ● ●
  • 20
  • ● ●
  • ● ●
  • ●●
  • 2

3 4 5 6 7 factor(year)

  • 1999
  • 2008
Monday, May 16, 2011
slide-28
SLIDE 28

displ 1/cty * 100

4 6 8 10 4 6 8 10 4 6 8 10 4 6 8 10 1

  • ●●
  • 6
  • 11
  • ● ●
  • 16
  • ●●
  • 2

3 4 5 6 7 2

  • ● ●
  • ● ●
  • 7
  • ●●
  • ● ●
  • 12
  • ●●
  • 17
  • ● ●
  • ●●
  • 2

3 4 5 6 7 3

  • ● ● ●
  • ● ●
  • ● ●
  • 8
  • ●●
  • 13
  • ●●
  • ● ●
  • 18
  • ● ●
  • 2

3 4 5 6 7 4

  • ● ●
  • ●●
  • ● ●
  • 9
  • ●●
  • ●●
  • 14
  • ● ●
  • 19
  • ●● ●
  • 2

3 4 5 6 7 5

  • 10
  • ●●
  • 15
  • ●●
  • ● ●
  • 20
  • ● ●
  • ● ●
  • ●●
  • 2

3 4 5 6 7 factor(year)

  • 1999
  • 2008
Monday, May 16, 2011
slide-29
SLIDE 29

Is a linear model with displacement as single predictor adequate?

Monday, May 16, 2011
slide-30
SLIDE 30

displ gp100m

2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 1

  • 6
  • 11
  • ●●
  • 16
  • 2

3 4 5 6 7 2

  • 7
  • 12
  • 17
  • ●●
  • 2

3 4 5 6 7 3

  • ●●
  • 8
  • 13
  • 18
  • 2

3 4 5 6 7 4

  • 9
  • ●● ●
  • 14
  • 19
  • 2

3 4 5 6 7 5

  • 10
  • ●●
  • 15
  • 20
  • 2

3 4 5 6 7 factor(year)

  • 1999
  • 2008
Monday, May 16, 2011
slide-31
SLIDE 31

displ gp100m

2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 1

  • 6
  • 11
  • ●●
  • 16
  • 2

3 4 5 6 7 2

  • 7
  • 12
  • 17
  • ●●
  • 2

3 4 5 6 7 3

  • ●●
  • 8
  • 13
  • 18
  • 2

3 4 5 6 7 4

  • 9
  • ●● ●
  • 14
  • 19
  • 2

3 4 5 6 7 5

  • 10
  • ●●
  • 15
  • 20
  • 2

3 4 5 6 7 factor(year)

  • 1999
  • 2008
Monday, May 16, 2011
slide-32
SLIDE 32

Maybe there are fewer bigger cars?

Monday, May 16, 2011
slide-33
SLIDE 33

displ count

10 20 30 40 10 20 30 40 10 20 30 40 10 20 30 40 1 6 11 16 2 3 4 5 6 7 2 7 12 17 2 3 4 5 6 7 3 8 13 18 2 3 4 5 6 7 4 9 14 19 2 3 4 5 6 7 5 10 15 20 2 3 4 5 6 7 factor(year) 1999 2008

Monday, May 16, 2011
slide-34
SLIDE 34

displ count

10 20 30 40 10 20 30 40 10 20 30 40 10 20 30 40 1 6 11 16 2 3 4 5 6 7 2 7 12 17 2 3 4 5 6 7 3 8 13 18 2 3 4 5 6 7 4 9 14 19 2 3 4 5 6 7 5 10 15 20 2 3 4 5 6 7 factor(year) 1999 2008

Monday, May 16, 2011
slide-35
SLIDE 35

Future work

Monday, May 16, 2011
slide-36
SLIDE 36

Future work

How can visual inference be integrated into visualisation software at a fundamental level? Is it possible to guess plausible null hypotheses from the plot specification? How does training affect results? How do novices and experts differ? What patterns do people pick up on? What are the alternatives that people respond to?

Monday, May 16, 2011
slide-37
SLIDE 37

Questions?

Monday, May 16, 2011
slide-38
SLIDE 38 Monday, May 16, 2011
slide-39
SLIDE 39

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Monday, May 16, 2011