information theory in visual analytics min chen professor
play

Information Theory in Visual Analytics Min Chen Professor of - PowerPoint PPT Presentation

http://www.bslhands4u.com/fingerspelling/4545036827 http://www.infoplease.com/ipa/A0200808.html Information Theory in Visual Analytics Min Chen Professor of Scientific Visualization including recent work in collaboration with Amos Golan,


  1. http://www.bslhands4u.com/fingerspelling/4545036827 http://www.infoplease.com/ipa/A0200808.html Information Theory in Visual Analytics Min Chen Professor of Scientific Visualization including recent work in collaboration with Amos Golan, American University, USA min.chen@oerc.ox.ac.uk Oxford e-Research Centre The 41st CREST Open Workshop University of Oxford UCL, London, 27-28 April 2015

  2. Three Visualization Subsystems raw data information geometry & labels image D I G V N N N Visual Source Filtering Rendering Mapping vis-encoder image optical signal optical signal image V S S' V' N N N Optical Displaying Viewing Transmission vis-channel image information knowledge V' I' K N N Perception Cognition Destination vis-decoder message signal signal message M S S' M' N Encoder Decoder Source Channel Destination (Transmitter) (Receiver) A General Communication System

  3. Existing Uses of Information Theory  Data processing  View optimization  Glyph design  ...  Theoretical framework  Measuring visualization capacity, and related quantities  Explaining phenomena in visualization processes  Defining laws (mathematically-validated guidelines)  Defining algorithm- or data-driven metrics  Confirming the significance of visual analytics

  4. Example: Visual Multiplexing 70 60 50 Location p can be associated with X spatial domain D Perceived information may include in the source data or determined by estimated values and relationships a spatial mapping. other signals and noise with data conveyed by other signals. c 3 c 4 c 3 c 2 c 1 at p MUX DEMUX information about at p c 2 c k c k p X can be a data record or a set of partially temporal domain T encoded visual attributes. vis-encoder vis-link (consisting of many vis-channels) vis-decoder M. Chen et al. , “Visual multiplexing ,” Computer Graphics Forum , 2014

  5. Data Processing Inequality p ( x , y , z ) = p ( x ) p ( y | x ) p ( z | y )  “No clever manipulation of p ( y | x ) p ( z | y ) p ( x ) data can improve the Process 1 Process 2 inferences that can be X Y Z I ( X; Y ) I ( Y; Z ) made from the data” [Cover and Thomas, 2006]  ( ; ) ( ; ) I X Y I X Z

  6. Data Processing Inequality: Big Data Input? alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z 3 Z L-1 Z L Z L+1 Process Process Process Process Big Data ...... Decision 1 2 L-1 L entropy entropy H ( Z L+1 ) H ( Z 1 ) I ( Z 1 ; Z L+1 ) mutual information

  7. DPI is not Ubiquitous p ( x , y , z ) = p ( x ) p ( y | x ) p ( z | y ) p ( y | x ) p ( z | y ) p ( x ) Process 1 Process 2 X Y Z  Markov chain conditions I ( X; Y ) I ( Y; Z )  Closed coupling: (X, Y), (Y,Z)  ( ; ) ( ; ) I X Y I X Z  X and Z are conditionally independent  What if one of the interaction U 1 interaction U 2 conditions is broken? Process 1 Process 2 X Y Z  In visual analytics, both conditions are usually domain knowledge about X broken. Process 1 Process 2 X Y Z  ( ; ) ( ; ) I X Y I X Z M. Chen and H. Jänicke , “An information-theoretic framework for visualization ,” IEEE Transactions on Visualisation and Computer Graphics , 2010

  8. Soft Knowledge in Decision Space alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z 3 Z L-1 Z L Z L+1 Process Process Process Process Big Data ...... Decision 1 2 L -1 L All possible decisions under different conditions a) totally data-driven b) totally instinct-driven c) data-informed d d) due to unknown or uncontrollable factors b a entropy entropy c H ( X ) H ( Z 1 ) x  X is a piece of soft knowledge I ( Z 1 ; X ) mutual information

  9. An Example Data Analysis and Visualization Process  r time series  r decisions  720 data point each series  3 valid values each  2 32 valid value each point (e.g., buy, sell, hold) H max =420r H max =30r Z 3 Z 4 M H Time Series Feature Plots Recognition H max  1.58r H max =23040r H max =1920r r time series r time series Z 1 Z 2  60 data points  10 features Z 7 Aggregated Raw Data  128 valid values  8 valid values M H Data 1 hour long Decision at 5-second at 1-minute H max  15r(r-1) H max  1.16r(r-1) resolution resolution r time series r time series r decisions  720 data points  60 data points Z 5 Z 6  3 valid values  2 32 valid values  2 32 valid values M M Correlation Graph Indices Visualization machine human r ( r -1)/2 data points r ( r -1)/2 connections M H alphabet  2 30.7 valid values  5 valid values ... process process

  10. A Sequential Workflow and Two Basic Metrics alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z s Z s+1 Z L Z L+1 Process Process Process Data ...... ...... Decision 1 s L  The s th Function (Process):  Alphabetic Compression Ratio (ACR):  A Reverse “Guessing” Process:  Potential Distortion Ratio (PDR): M. Chen and A. Golan, “What may visualization processes optimize?,” under review , 2015

  11. Cost-Benefit Ratio alphabet alphabet alphabet alphabet alphabet alphabet Z 1 Z 2 Z s Z s+1 Z L Z L+1 Process Process Process Data ...... ...... Decision 1 s L  Effectual Compression Ratio (ECR):  Incremental Cost-Benefit Ratio (ICBR):  Cost can be measured in energy, time, money, etc. M. Chen and A. Golan, “What may visualization processes optimize?,” under review , 2015

  12. Four Levels of Visualization Disseminative Level (This is A!) 1.  A presentational aid for disseminating information or insight to others.  The creator does not expect to gain much new knowledge. Observational / Operational Level (What, when, where?) 2. An operational aid that enables intuitive and/or speedily observation  of captured data. Often part of routine operations. Confirmatory observation, anomaly detection., etc.  Analytical Level (Does A relate to B? Why) 3.  An investigative aid for examining and understanding complex relationships (e.g., correlation, causality, contradiction).  Evaluating hypotheses, models, methods, algorithms and systems. Model-developmental Level (How does A lead to B?) 4.  A developmental aid for improving existing models, methods, algorithms and systems, as well as the creation of new ones.

  13. Levels 1, 2, 3 V D W 1 H V H Data V D : Disseminative Visualization  V O : Observational Visualization  V A : Analytical Visualization  V O When will workflow W 3 work and  W 2 V H V D when will not? W 3 M V D Model Visual mapping with V V A interaction (optional) Human perception, H W 4 M V H cognition, and action V D Predefined machine M processing Model Dynamically- M modifiable machine processing

  14. Level 4 W 3 M Data V D When workflow W 3 does not  Model work well, then ... V M : Model-developmental  Visualization V M W 5 M V H V D Model V O Visual mapping with V interaction (optional) V M W 6 Human perception, H cognition, and action M V H V D Predefined machine M processing Dynamically- H Model M modifiable machine processing

  15. Example: Level 1 minimal 64 pixels 7 6 5 4 3 2 1 0 256 byte 1 X X X X X X X X X X X X X X X X X X X X X  Entropy of Data Alphabet X X X X X X X X X X X X X X 64 255 X X X X 1 1   X X X X X   X X ( ) log 512 H Z byte 16 X X X 192 2 X X X X 256 256 X X   X X X 0 0 t i X X X X X X X X X X X X X X X X X  Binary Pixel Plot X X X X X X X X X minimal 256 pixels X X X X minimal 64 pixels X X X X X X X X X  4x4 pixels per bit  2 13 bits X X X X X X X X X byte 32 X X 128 X X X X X X X X X X X  Time Series Plot X X X X X X X X X X X X X X X X X X  Minimal 256x64 pixels ( 2 14 bits ) X X X X X X X X X X X X X X X X X X X X X X X X X X X X X byte 48 X X X X 64 X X X X X X X X X  The more compact, the better? X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X byte 64 X X X X X 0 minimal 0 8 16 24 32 40 48 56 64 8 pixels

  16. Example: Level 2  Real-time or offline annotation results in a huge spreadsheet of events Legg et al. , “ MatchPad: interactive glyph-Based visualization for real-time sports performance analysis,” Computer Graphics Forum , 2012

  17. Example: Level 3 D. Oelke, D. Spretke, A. Stoffel, D. Keim , “Visual readability analysis: How to make your writings easier to read,” IEEE VAST , 2010

  18. Example: Level 4  Expression Recognition  Humans are very good at  Machine vision is far behind  Limited understanding  Data  Video  Feature changes  Time series  Challenges  A lot of features  A lot of ways of measuring features  Non-uniform temporal behavior Tam et al ., “Visualization of time-series data in parameter space for understanding facial dynamics,” Computer Graphics Forum , 2011

  19. Parallel Coordinates  Multi-dimensional data visualization Y Y X X

  20. Interactive Visualization: Formulating Decisions

  21. Telephone  In the 1870s, Bell travelled around to give demos ‘in concert halls, where full orchestras and choruses played “America” and “Auld Lnag Syne into his gadgetry.’  Around 1880, Queen Victoria installed a pair of telephones at Winsor and Buckingham Palace Primary source: J. Gleick, book, 2012

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend