Visual Analytics and Information Retrieval Giuseppe Santucci - - PowerPoint PPT Presentation

visual analytics and information retrieval
SMART_READER_LITE
LIVE PREVIEW

Visual Analytics and Information Retrieval Giuseppe Santucci - - PowerPoint PPT Presentation

Visual Analytics and Information Retrieval Giuseppe Santucci Dipartimento di Informatica e Sistemistica Sapienza Universit di Roma santucci@dis.uniroma1.it Who am I? (University of Rome is so big) VisDis and the Database & User


slide-1
SLIDE 1

Visual Analytics and Information Retrieval

Giuseppe Santucci Dipartimento di Informatica e Sistemistica Sapienza Università di Roma santucci@dis.uniroma1.it

slide-2
SLIDE 2

Who am I?

(University of Rome is so big…)

  • VisDis and the Database & User Interface groups are two tightly connected

research groups at the Department of Computer and System Science (32 full professors, 19 associate ,and 13 assistant professors) of Rome Faculty of Engineering & ICT ?

  • The VisDis and the Database/Interface group background is about:

– Visual Information Access – Data quality – Data integration – User Centered Design – Usability and Accessibility – Infovis evaluation – Visual quality metrics – Visual Analytics

  • Data sampling
  • Density map optimization

– Information Retrieval (&VA)

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 2

slide-3
SLIDE 3

Outline

  • Information Visualization

– Definitions – Main issues

  • Data overloading

– Visual Analytics – Visual Analytics challenges

  • One methodological examples
  • VA and Information Retrieval
  • Demo

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 3

slide-4
SLIDE 4

Information Visualization?

  • Old stuff…

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 4

slide-5
SLIDE 5

Visualization for Problem Solving

  • Mystery: what is causing a cholera

epidemic in London in 1854?

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 5

slide-6
SLIDE 6

Visualization for Problem Solving

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 6

Illustration of Dr. John Snow (1854) Dots indicate location

  • f deaths

X indicate the location

  • f water pumps

[From Visual Explanations by Edward Tufte, Graphics Press, 1997]

slide-7
SLIDE 7

Visualization for Problem Solving

7

  • Dr. Snow deducted that the cholera epidemic

was caused by a contaminated water pump !!! Closing that pump quickly solved the problem B.T.W., workers at the nearby brewery were noted to be relatively free of cholera… The actual John Snow pub in London close to the water pump !!!

slide-8
SLIDE 8

Visualization for Explaining

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 8

What happened during the Napoleon’s Russian Campaign?

slide-9
SLIDE 9

The Charles Joseph Minard’s map (1861)

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 9

slide-10
SLIDE 10

Visualization for Making decision

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 10

Traveling in London by underground How can I get Queens Park from Victoria station?

slide-11
SLIDE 11

London Underground Map 1927

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 11

slide-12
SLIDE 12

The Harry Beck’s idea

  • Real position (when traveling in

underground) does not matter

  • Only station sequences matter

together with their connections

  • Beck proposed a “distorted” map
  • Actually all the underground

maps in the world follow the Beck’s approach

  • He got a little payment (London

underground was not sure about the idea)

  • Still true right now: infovis people

do not become rich…

  • Likely that holds for VA and IR as

well

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 12

slide-13
SLIDE 13

London Underground Map 1990s

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 13

slide-14
SLIDE 14

Moving to the present time

  • What is modern Information Visualization ?
  • First of all, what is Visualization ?
  • Visualize: to form a mental model or mental

image of something

  • It is a cognitive activity and it has nothing to

do with computers

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 14

slide-15
SLIDE 15

What is Information Visualization?

Information visualization is the use of computer- supported, interactive, visual representations

  • f abstract data to amplify cognition.

[Card et al. ‘99]

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 15

slide-16
SLIDE 16

Information visualization !

1. Infovis is perfect for exploration, when we don’t know exactly what to look at. It supports vague goals 2. Infovis is perfect to explain complex data and to support decisions

  • Other approaches to data analysis

– Statistics: strong verification but does not support exploration and vague goals – Data mining: actionable and reliable but black box, not interactive, question-response style – Visual Analytics (formerly Visual Data Mining) is trying to join the two worlds

slide-17
SLIDE 17

…computer supported and interactive

  • Computer-supported

– Yes we use computers, but we have to always remember that a cognitive activity is involved in the process

  • Interactive

– To exploit the full power of Infovis techniques interaction is mandatory.

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 17

slide-18
SLIDE 18

Interaction example

  • Agronomists are experimenting 7 treatments

(anti-parasite, fertilizer, etc.) on 10 different crops (corn, tomatoes, etc.)

  • A black square indicates success
  • Does this visualization help?

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 18

Treatments

A B C D E F G 1 2 3 4 5 6 7 8 9 10

Crops

Re

slide-19
SLIDE 19

Interaction example

  • Let’s rearrange the rows

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 19

Treatments

A B C D E F G 1 2 3 4 5 6 7 8 9 10

Crops

10 A D C E G B F 1 3 8 2 6 4 7 9 5

Treatments Crops

Rearrange

(10! , VA can help…)

slide-20
SLIDE 20

…it is about abstract data

  • Abstract data

– Information visualization deals with images that does not refer to physical situation. In other words it is NOT scientific visualization/geographic visualization

  • Scientific visualization primarily relates to and

represents something physical or geometric

  • Examples

– Air flow over a wing – Weather over USA – Torrents inside a tornado – Organs in the human body – Molecular bonding…

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 20

slide-21
SLIDE 21

Scientific/geographic visualization

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 21

Earthquake intensity

slide-22
SLIDE 22

…abstract data

  • Items that do not have a direct physical/visual correspondence
  • Examples: sport statistics, stock trends, query results, software data,

IR metrics, etc…

  • Items are represented on a 2D / 3D physical space using their

numerical characteristics (attributes)

  • The visualization is useful for analysis and decision-making (not just

for fun or colors)

  • E.g. : Postal parcels

– Shipping date – Volume – Weight – Sender country – Receiver country – …

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 22

slide-23
SLIDE 23

Abstract data

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 23

A 2D Scatterplot showing about 200.000 postal parcels

slide-24
SLIDE 24

Mixed visualization

Byte traffic into the ANS/NSFNET T3 backbone in 1993

slide-25
SLIDE 25

Amplify cognition using the human vision

  • Highest bandwidth human sense
  • Fast, parallel
  • Pattern recognition
  • Extends memory and cognitive capacity
  • People think visually (I see… means also I understand in

most languages)

  • Amplify cognition
  • Pre-attentive (we use only the eyes, not the brain)
  • Two quick examples (4 seconds each)
slide-26
SLIDE 26

Three simple questions

slide-27
SLIDE 27

The quick answers

slide-28
SLIDE 28

One (very) simple question

  • How many 3s here ?
  • You have 4 seconds…

458757626808609928083982698028 747976296262867897187743671947 746588786758967329667287682085

slide-29
SLIDE 29

So ?

  • Time was not enough?
  • You can do that in less than 0.2

seconds !

  • Let’s try a different visualization…
slide-30
SLIDE 30
  • Color is pre-attentive (pops up)
  • No cognitive effort is required
  • A lot of issues are already clear
  • Most of people ignore them...
  • It is not enough to use wrist and

bells

slide-31
SLIDE 31

Canonical steps in Infovis – STEP 1

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 31

DATA

Internal Representation

Encoding of values Univariate data Bivariate data Trivariate data Multidimensional data Encoding of relationships Temporal data Map & Diagrams Graphs/Trees Data streams

Sport Literature Mathematics Physics History Geography Art Chemistry

slide-32
SLIDE 32

Canonical steps in infovis – STEP 2

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 32

Internal Representation

Space limitations Scrolling Overview + details Distortion Suppression Zoom & pan Semantic zoom Time limitation Perceptual issues Cognitive issues

Presentation

slide-33
SLIDE 33

Problem solved!

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 33

We have (∼) agreed and (∼) mature solutions for Presentation Representation

  • f a large variety of data

So I’m done! Questions ?

slide-34
SLIDE 34

Data size and complexity !

  • 100 million FedEx transactions per day
  • 150 million VISA credit card transactions per day
  • 300 million long distance ATT calls per day
  • 50 billion e-mails per day
  • 600 billion IP packets per day
  • 1 trillion (1012) of web pages (according to

Google), corresponding to about 3 petabytes of data

  • Google processes 20 petabytes of data per day

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci

slide-35
SLIDE 35

Size matters but complexity matters as well!

  • Formal definition of the PROMISE experimental data

(FIRE will use very similar pieces of information)

  • Metadata

– ~100 metrics per topic – It is not a BIG number but…

  • Different level of analysis

– per topic – per experiment – ...

  • Different level of abstraction

– Simple (!) metrics – Aggregate metrics – Statistics – Meta statistics (e.g., correlation, ANOVA, etc…)

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 35

slide-36
SLIDE 36

Rescuing information

  • In different situations people need to exploit and to use hidden

information resting in unexplored large and/or complex data sets

  • Several techniques exist devoted to this aim

– Automatic analysis techniques (e.g., data mining) – Manual analysis techniques (e.g., Information visualization)

  • Large and complex datasets require a joint effort:

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 36

slide-37
SLIDE 37

Visual Analytics

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 37

slide-38
SLIDE 38

VA is highly interdisciplinary

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 38 Scientific & Information Visualisation Data Management Data Mining Spatio- Temporal Data Human Perception +Cognition Infrastructure Infrastructure Evaluation Evaluation

Each component presents challenging issues

slide-39
SLIDE 39

Perception and cognition

  • A critical element is the human being

(☺)

– Visual analysis tasks require the careful design of apt human-computer interfaces – Challenges: need to integrate Psychology, Sociology, Neurosciences, and Design issues

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 39

slide-40
SLIDE 40

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 40

Let’s have fun: different kinds of blindness

In the movie, the girl with the white t-shirt is going to receive the ball several times Count how many times she receives (disregarding knocking up on the floor)

slide-41
SLIDE 41

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 41

Ready ?

slide-42
SLIDE 42

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 42

So...

  • 6 times ?
  • 7 times ?
  • 8 times ?
  • 9 times ?
  • 10 times ?
slide-43
SLIDE 43

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 43

Fine… and now another question…

  • How many gorillas were in the video ?
slide-44
SLIDE 44

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 44

Let us to have a closer look

Same video…

slide-45
SLIDE 45

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 45

Inattentional blindness

  • Just one gorilla…
  • It looks like a joke but reflects real problems that

we have not to neglect

  • Inattentional (and change) blindness must to be

carefully considered when designing (critical) systems

  • Animation, interaction, and alternative

communication means (e.g., sounds) can mitigate the problem

slide-46
SLIDE 46

A methodological example

slide-47
SLIDE 47

A Visual Analytics example

Deriving new values from the dataset for ad- hoc visualization

  • How to visually compare J. London and M. Twain

books ?

  • [D. A. Keim and D. Oelke. Literature Fingerprinting: A New Method for Visual

Literary Analysis. 2007 IEEE Symp.

  • n

Visual Analytics Science and Technology (VAST '07) ]

  • 1. Split the book in several text block (e.g., pages,

paragraph, sentences)

  • 2. Measure, for each text block, a relevant feature

(e.g., average sentence length, word usage, etc. )

  • 3. Associate the relevant feature to a visual attribute

(e.g., color)

  • 4. Visualize it

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 47

slide-48
SLIDE 48

J.London vs M.Twain average sentence lengths

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 48

slide-49
SLIDE 49

User interaction (a non uniform book?)

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 49

slide-50
SLIDE 50

Details of a book

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 50

slide-51
SLIDE 51

What about the Bible?

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 51

slide-52
SLIDE 52

The Vismaster CA European project

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 52

slide-53
SLIDE 53

The new (European) book on VA

  • Illuminating the path : The

Research and Development Agenda for Visual Analytics

– 2005, focusing on USA homeland security

  • Managing the Information

Age Solving Problems with Visual Analytics

– One of the major outcome of Vismaster – 2010, much broader focus

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 53

slide-54
SLIDE 54

Now, let’s move to IR A case study: the Promise Project

  • Step 1: Data preprocessing

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 54

slide-55
SLIDE 55

1: Clear understanding of experimental data

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 55

Enriching the data Define a data structure Define formal transformations on the data

slide-56
SLIDE 56

2: Visualization

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 56

slide-57
SLIDE 57

2a: Define a visual reference architecture

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 57

slide-58
SLIDE 58

2a: Define a set of visualizations

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 58

slide-59
SLIDE 59

3: Defining analytical models and their relationship with visualizations

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 59

slide-60
SLIDE 60

3: Automated analysis (machine learning, clustering, etc.) for ranking analysis

  • Emanuele Di Buccio, Marco Dussin, Nicola Ferro, Ivano Masiero, Giuseppe Santucci and

Giuseppe Tino - To Re-Rank or To Re-Query: Can Visual Analytics Solve This Dilemma? - Proc of the CLEF 2011 , Amsterdam.

  • Marco Angelini, Nicola Ferro, Granato Guido, Giuseppe Santucci, Gianmaria Silvello -

Information Retrieval Failure Analysis: Visual Analytics as a Support for Interactive ”What-If” Investigation - VAST 2012 , 2012

slide-61
SLIDE 61

4: Knowledge (hum, waiting for the system being used by real IR experts…)

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 61

slide-62
SLIDE 62

Demo

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 62

slide-63
SLIDE 63

Demo!

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 63

slide-64
SLIDE 64

Conclusions

  • Visual Analytics is a new (exciting) emerging

research field

  • Infovis and data mining are core components of

VA

  • It is highly interdisciplinary and require a

collaborative approach

  • It is more a methodology than a technique
  • To succeed it has to challenge several high-risk

issues

  • It is the only chance we have to dominate large

and complex datasets

  • Including IR evaluation data

Fire 2012, Kolkata 19 December 2012 VA & IR - Giuseppe Santucci 64