Automatic Selection of Partitioning Variables for Small Multiple - PowerPoint PPT Presentation

Automatic Selection of Partitioning Variables for Small Multiple Displays Anushka Anand, Justin Talbot Presented by Yujie Yang, CPSC 547 Information Visualization

Agenda  Introduction  Goodness-of-Split Criteria  Algorithm  Validation  Conclusion  Comments 2 CPSC547 Presentation - Yujie Yang 2015/11/26

Introduction  Authors – from Tableau Research  Anushka Anand  Justin Talbot  IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS(TVCG)  January 2016 3 CPSC547 Presentation - Yujie Yang 2015/11/26

Introduction  What: multidimensional data sets  Why: For small multiples, automatically select the partitioning variables?  How?  Cognostics  Firstly introduced by John and Paul Tukey  Wilkinson extended original idea  “Judge the relative interest of different displays”  Scagnostics – scatterplot diagnostics 4 CPSC547 Presentation - Yujie Yang 2015/11/26

Introduction - Scagnostics 5 CPSC547 Presentation - Yujie Yang 2015/11/26

Goodness-of-Split Criteria  Visually rich  Convey rich visual patterns  Informative  More informative than the input  Well-supported  Convey robust and reliable patterns  Parsimonious  All things being equal, then fewer partitions 6 CPSC547 Presentation - Yujie Yang 2015/11/26

Algorithm Automatically select interesting partitioning dimensions Select small multiples that have scagnostic values that are unlikely to be due to chance Likelihood of a small multiple’s scagnostic value (smaller likelihood means unlikely to be due to chance) 7 CPSC547 Presentation - Yujie Yang 2015/11/26

Algorithm Scatterplot Scagnostic Scores Potential partitioning variables Input Output 8 CPSC547 Presentation - Yujie Yang 2015/11/26

Algorithm  Input:  Scatterplot  Scagnostic: skewed  Partitioning Variable: distance to employment center Data: X: proportion of old houses built before 1940 for census tracts in Boston Y: median value of owner-occupied houses 9 CPSC547 Presentation - Yujie Yang 2015/11/26

Algorithm (a) (a)Input scatterplot (b)Partitioned by distance (c)Partitioned by random permutation (d)Distribution of Skewed value (b) (c) (d) 10 CPSC547 Presentation - Yujie Yang 2015/11/26

Algorithm  Permutation test  Chebyshev’s inequality:  z-score:  Output: Where Xi is the true scagnostic value of the i- th partition and μ i and σ i are the mean and standard deviation of the scagnostic measures over the repeated random permutations of the i -th partition. 11 CPSC547 Presentation - Yujie Yang 2015/11/26

Algorithm Algorithm Automatic Selection of partitioning variables What: Data multidimensional data sets; scatterplot Why: Task Automatically select variables to divide scatterplot into small multiples How: Facet Small multiples How: Input Scatterplot; scagnostic; partitioning variables How: Output Max of z-scores Scale Items: thousands; dimensions: dozens 12 CPSC547 Presentation - Yujie Yang 2015/11/26

Validation - Visually rich  Visually striking clumps and striation patterns Data: X: linolenic measurement in olive oil specimens in Italy Y: linoleic measurement in olive oil specimens in Italy 13 CPSC547 Presentation - Yujie Yang 2015/11/26

Validation - Visually rich  Scagnostic: striated  Partitioning Variable: region 14 CPSC547 Presentation - Yujie Yang 2015/11/26

Validation - Informative  Increasing and decreasing trends seem to be overlaid Data: X: death rate of world countries Y: birth rate of world countries 15 CPSC547 Presentation - Yujie Yang 2015/11/26

Validation - Informative  Best case  Worst case  Scagnostic: monotonic  Scagnostic: monotonic  Partitioning Variable: GDP  Partitioning Variable: category dominant religion 16 CPSC547 Presentation - Yujie Yang 2015/11/26

Validation – Well-supported  Run the algorithm for different size of the input data Data: X: admission rate at US universities Y: graduation rate at US universities 17 CPSC547 Presentation - Yujie Yang 2015/11/26

Validation – Well-supported Random 10% of full dataset Full dataset   Scagnostic: monotonic Scagnostic: monotonic   Partitioning variable: admit ACT scores Partitioning variable: admit ACT scores   Z-score: 3.6 Z-score: 16.4   18 CPSC547 Presentation - Yujie Yang 2015/11/26

Validation - Parsimonious  Artificially generated dataset  Scagnostic: clumpy Best case Second best case Worst case 19 CPSC547 Presentation - Yujie Yang 2015/11/26

Conclusion  Described a set of goodness criteria for evaluating small multiples  Proposed a method for automatically ranking the small multiple displays created by the partitioning variables in a data set  Demonstrated the method meets the criteria  Future:  Scatterplot -> different visualization type  Scagnostics -> wide range of quality measures  Evaluating small multiple -> different analytic goals 20 CPSC547 Presentation - Yujie Yang 2015/11/26

Comments  As mentioned in their discussion:  Lack of examples about different visualization types or analytic goals  Not deal with correlation between input and partitioning variables  Max of z-scores VS average of z-scores  More critiques:  Their method meets their criteria?  Use the idea of permutation test, but lack of exact likelihood (or p-value) of the cognostic score in the examples  Weak proof of the support to the criterias 21 CPSC547 Presentation - Yujie Yang 2015/11/26

Thank you! 22 CPSC547 Presentation - Yujie Yang 2015/11/26

Reference [1] Anand A, Talbot J. Automatic Selection of Partitioning Variables for Small Multiple Displays[J]. 2016. [2] Friedman J H, Stuetzle W. John W. Tukey's work on interactive graphics[J]. Annals of Statistics, 2002: 1629-1639. [3] Wilkinson L, Anand A, Grossman R L. Graph-Theoretic Scagnostics[C]//INFOVIS. 2005, 5: 21. [4] Wilkinson L, Wills G. Scagnostics distributions[J]. Journal of Computational and Graphical Statistics, 2008, 17(2): 473-491. 23 CPSC547 Presentation - Yujie Yang 2015/11/26

Automatic Selection of Partitioning Variables for Small Multiple - PowerPoint PPT Presentation

Automatic Selection of Partitioning Variables for Small Multiple Displays Anushka Anand, Justin Talbot Presented by Yujie Yang, CPSC 547 Information Visualization Agenda Introduction Goodness-of-Split Criteria Algorithm

Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System

Partitioning Problem and Usage Lecture 8 CSCI 4974/6971 26 Sep 2016 1 / 14 Todays Biz 1.

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning Goals Assignment of

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Closures & Scoping Variables Parameters Local variables Free variables

PtrSplit: Supporting General Pointers in Automatic Program Partitioning Shen Liu Gang Tan

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Background MapReduce Model SCOPE Language and Cosmos system Advanced partitioning

Communcation over interference channels Dustin Cartwright 1 February 24, 2011 1 work in progress

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Two-output models ADVAN CED DEEP LEARN IN G W ITH K ERAS Zach Deane-Mayer Data Scientist

Structure Functions and Low-x Working Group Summary Convenors A. Glazov S. Moch K. Nagano

Computer Architecture: Lecture 6 Multicycle MIPS Implementation Severe 100% midterm

Exokernel An Operating System Architecture for Application-Level Resource Management Josh

State Of The Performance Tools (A Personal Opinion) Douglas Pase, PhD (CSE) Sandia National

Outline Get Off of My Cloud 8271 discussion of cloud computing security (combined)

Automatic Selection of Partitioning Variables for Small Multiple - PowerPoint PPT Presentation

Automatic Selection of Partitioning Variables for Small Multiple Displays Anushka Anand, Justin Talbot Presented by Yujie Yang, CPSC 547 Information Visualization Agenda Introduction Goodness-of-Split Criteria Algorithm

Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply

Partitioning Introduction to Partitioning Mahapatra-Texas A&amp;M-Spring02 1 System

Partitioning Problem and Usage Lecture 8 CSCI 4974/6971 26 Sep 2016 1 / 14 Todays Biz 1.

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning Goals Assignment of

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Closures &amp; Scoping Variables Parameters Local variables Free variables

PtrSplit: Supporting General Pointers in Automatic Program Partitioning Shen Liu Gang Tan

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Background MapReduce Model SCOPE Language and Cosmos system Advanced partitioning

Communcation over interference channels Dustin Cartwright 1 February 24, 2011 1 work in progress

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Two-output models ADVAN CED DEEP LEARN IN G W ITH K ERAS Zach Deane-Mayer Data Scientist

Structure Functions and Low-x Working Group Summary Convenors A. Glazov S. Moch K. Nagano

Computer Architecture: Lecture 6 Multicycle MIPS Implementation Severe 100% midterm

Exokernel An Operating System Architecture for Application-Level Resource Management Josh

State Of The Performance Tools (A Personal Opinion) Douglas Pase, PhD (CSE) Sandia National

Outline Get Off of My Cloud 8271 discussion of cloud computing security (combined)

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Closures & Scoping Variables Parameters Local variables Free variables