Approaches, Applications, and Research Challenges Tobias Schreck - - PowerPoint PPT Presentation

approaches applications and research challenges tobias
SMART_READER_LITE
LIVE PREVIEW

Approaches, Applications, and Research Challenges Tobias Schreck - - PowerPoint PPT Presentation

Visual Search and Analysis in Textual and Non-Textual Document Repositories Approaches, Applications, and Research Challenges Tobias Schreck Visual Analytics Group Computer and Information Science University of Konstanz, Germany CLEF 2012


slide-1
SLIDE 1

Visual Search and Analysis in Textual and Non-Textual Document Repositories Approaches, Applications, and Research Challenges

Tobias Schreck

Visual Analytics Group Computer and Information Science University of Konstanz, Germany

CLEF 2012 Conference and Labs of the Evaluation Forum 2012

19.09.2012

slide-2
SLIDE 2

2

  • 1. Need for Search and Analysis in Large Data

Technological progress: Information Overload

– Acquisition, production, storage – Data integration, data mining  Large and increasing amounts of data

Data-intensive application domains

– Business – Research – Engineering

Need for new technologies

– „… to unite the seemingly conflicting requirements of scalability and usability in making sense of the data“ [VisMaster 2010]

Share of digital information 2000: 25% 2002: 50% (Begin Digital Age) 2007: 94% (300 Exabyte) Estimated growth rates (1986-2007) Storage: 23% Network: 28% Compute: 56% Source: Science, according to [F&L 3/2011]

slide-3
SLIDE 3

3

  • 1. Data Examples

Textual Data Repositories

– Digital Libraries – Web – Social Media

Non-textual Data Repositories

– Image repositories – 3D Object repositories – Data repositories

Sloan Digital Sky Survey (http://www.sdss.org/) PROBADO3D Archive (http://www.probado.de/3d.html) Victoria State Library Image Collection (http://www.slv. vic.gov.au/) Customer Reviews (Amazon.com) Digital Libraries www.facebook.com www.twitter.com

slide-4
SLIDE 4

4

  • 1. How to Make Use of Large Data Repositories?

Searching

– Find information entities of interest – Reusage, comparison – Based on specification of queries

Analyzing

– Find structures and abstractions (“Understand” data set as a whole) – Check hypotheses – Make interesting, actionable observations

Interdependence

– Cycles of searching and analyzing

slide-5
SLIDE 5

5

  • 1. Visual Search and Analysis

Visual representation of the search and analysis process [Shneiderman 1996] Goals of Visual Information Systems

– Intuitive access, direct manipulation – Leverage human visual perception – Encourage exploration

Classic visual search systems

– Filmfinder [Ahlberg and Shneiderman 1994] – Time Searcher [Hochheiser and Shneiderman 2004]

Classic visual analysis systems

– Spire/In-Spire [Wise et al 1995] – Visual decision tree construction and analysis [Teoh and Ma 2003]

[Ahlberg and Shneiderman 1994]

[Wise et al 1995]

slide-6
SLIDE 6

6

Propositions of this Talk

  • 1. Emerging large, complex data sources pose new

challenges to Information Retrieval and Understanding

  • 2. Visual-interactive methods are useful to support

retrieval and data understanding

  • 3. Promising research opportunities at intersection of

visualization, information retrieval, and evaluation

slide-7
SLIDE 7

7

Outline

  • 1. Introduction
  • 2. Overview Visualization for Large Text

2.1 Feature-based Text Visualization 2.2 Attribute-based Text Visualization 2.3 Visual Document Summarization 2.4 Geo-referenced Micro Blogging Text

  • 3. Visual Search in Non-Textual Data
  • 4. Promising Research Opportunities
  • 5. Conclusions
slide-8
SLIDE 8

8

2.1 Sentiment Analysis

  • Opinion score derived from

adjectives, nouns, and verbs

  • Identifies positive and

negative sections  Overview over large document corpora  Find articles which suit the mood of the reader

[Keim, Mansmann et al., 2008]

slide-9
SLIDE 9

9

2.1. Sentiment Analysis: News Overview

slide-10
SLIDE 10

10

2.1 Pixel-based Approach

Feature: average sentence length

[Oelke et al., 2008]

slide-11
SLIDE 11

11

2.1 Readability Features

[Oelke, Spretke et al., 2010]

slide-12
SLIDE 12

12

2.1 Readability Features: Vocabulary Difficulty

  • f 2009 German Election Programs

Feature: Vocabulary Difficulty

Die Linke Piraten

[Oelke, Spretke et al., 2010]

slide-13
SLIDE 13

13

2.2 Attribute-based: Story, Character Complexity

King‘s IT Rowling‘s Harry Potter

[Wanner, Fuchs et al., 2011]

slide-14
SLIDE 14

14

2.2 Attribute-based: Visual Review Analysis

  • User opinions abundantly

available

– Forums, Blogs – E-commerce – …

  • Many application

possibilities

– Product reviews for customers – Market analysis – Customer relationship management

Amazon customer reviews (amazon.com)

slide-15
SLIDE 15

15

2.2 Attribute-based: Visual Review Analysis

  • Basic method

– Identify product attributes – Identify positive/negative

  • pinions

– Calculate weighted attribute vector

  • Visual comparison of sets
  • f reviews

– Glyph matrix approach – Cluster analysis

  • Applied to printer product

reviews

cartridge paper tray price printer scanner software

  • 1

+1 +1 [Oelke, Hao et al., 2009]

slide-16
SLIDE 16

2.2 Attribute-based: Visual Review Analysis

16

[Oelke, Hao et al., 2009]

slide-17
SLIDE 17

17

2.2 Attribute-based: Customer Segmentation

[Oelke, Hao et al., 2009]

slide-18
SLIDE 18

18

2.3 Visual Content Overviewing

  • Visual abstract for

scientific articles

– Extraction of important figures and keyword – Layout of elements in generalized word cloud

  • Overviewing
  • Navigation
  • Comparison

[Strobelt, Oelke et al., 2009]

slide-19
SLIDE 19

19

2.3 Visual Content Overviewing

[Strobelt, Oelke et al., 2009]

slide-20
SLIDE 20

20

2.3 Visual Content Overviewing

[Strobelt, Oelke et al., 2009]

slide-21
SLIDE 21

21

2.4 Georeferenced Microblogging Text

  • Microblogging Text

(e.g., Twitter)

– Short text messages – Time stamp – GPS position

  • Potential analytic use

– Trend analysis – Marketing, Reputation monitoring – Situational awareness for civil defense or crisis management

Nice view, all fine … Stuck in a jam after traffic accident …

[www.google.com]

slide-22
SLIDE 22

22

2.4 SensePlace2 Tool

[MacEachren, Jaiswal et al., 2011]

slide-23
SLIDE 23

23

2.4 VAST Micro Blogging Challenge

  • VAST Challenge 2011

– Fictitious city including street network and POIs – 1 mio microblogging messages for 20 days incl. spatial positon – Fictitious hidden epidemic scenario

  • Task

– Find possible epidemics and its characteristics

[http://hcil.cs.umd.edu/localphp/hcil/vast11/]

slide-24
SLIDE 24

24

2.4 VAST Micro Blogging Challenge

[Bertini, Buchmüller et al., 2011]

slide-25
SLIDE 25

25

2.4 Concentration on Bridges

slide-26
SLIDE 26

26

2.4 Concentration in Hospitals

slide-27
SLIDE 27

27

2.4 Message Distribution (19.05.) – Filtered for Symptom Keywords

slide-28
SLIDE 28

28

2.4 VAST Micro Blogging Challenge

slide-29
SLIDE 29

29

Remainder of this Talk

  • 1. Introduction
  • 2. Overview Visualization for Large Text

2.1 Feature-based Text Visualization 2.2 Attribute-based Text Visualization 2.3 Visual Document Summarization 2.4 Geo-referenced Micro Blogging Text

  • 3. Visual Search in Non-Textual Data

3.1 Sketch-based 3D Object Retrieval 3.2 Retrieval in Bivariate Measurement Data

  • 4. Promising Research Opportunities
  • 5. Conclusions
slide-30
SLIDE 30

30

  • 3. Visual Search in Non-Textual Data

Multitude of complex document types

– Images – Video – 3D Objects – Multivariate Research Data – Etc.

Research questions to address

– Similarity functions? – Query types to support? – How to evaluate?

PROBADO3D Archive [http://www.probado.de/3d.html] Sloan Digital Sky Survey (http://www.sdss.org/) Victoria State Library Image Collection (http://www.slv.vic.gov.au/)

slide-31
SLIDE 31

31

3.1 Query-by-Exampe and Sketch-Based Retrieval

Problems:

  • 1. How to compare structurally

different views?

  • 2. How to evaluate different

sketching styles?

slide-32
SLIDE 32

32

3.1 Gradient Features, Suggestive Contours

[Yoon et al., 2010] [DeCarlo et at., 2003]

slide-33
SLIDE 33

33

14 classes subset of Princeton Shape Benchmark [Shilane et al 2004] Collection of 20 user sketches per class Evaluation of retrieval performance (per class, given user sketch)

3.1 Sketch-Based 3D Object Retrieval

[Yoon et al., 2010]

slide-34
SLIDE 34

34 [SHREC 2012 Sketch-based 3D Retrieval Track]

3.1 SHREC’12 Track: Sketch-Based 3D Retrieval

slide-35
SLIDE 35

35

3.1 Large-Scale Sketch Benchmark

Crowd-sourced approach of [Eitz et al., 2012a]

  • 20.000 sketches from

1300 users

  • 250 representative object

categories

  • Basis for improved

benchmarking study

[Eitz et al., 2012b]

Recognition experiment

  • Avg. human accuracy: 73%
  • Avg. automatic accuracy:

56%

[Eitz et al., 2012a]

slide-36
SLIDE 36

36

3.2 Visual Search in Bivariate (Research) Data

  • Jim Gray‘s Fourth Paradigm and

emerging research data repositories [Hey, Tansley, Tolle 2009]

  • Prominent type of quantitative

data: bivariate and multivariate data

  • Common visual representation

– Scatter plot – Scatter plot matrix

  • Content-based support for visual

search and analysis in this data?

[Pangaea]

slide-37
SLIDE 37

37

3.2 Regressional Feature Vector for Comparing Scatter Plots

Perform regressions (linear, square, log, …) Form feature vectors

  • Goodness of fit scores
  • Coefficient parameters

[Scherer, Bernard et al., 2011]

slide-38
SLIDE 38

38

3.2 Search and Analysis Application

[Scherer, Bernard et al., 2011]

cluster altitude vs PPPP (pressure hPa) sort by similarity to f(x)=e^-x query by example Spatial reference of data sets

slide-39
SLIDE 39

39

3.2 A Benchmark for Earth Observation Data

  • But how to create a benchmark data

set for automatic evaluation?

  • Input data

– BSRN earth observation data (radiation, temperature, etc.) for 40 stations – 24.700 bivariate plots generated

  • Tobler’s First Law of Geography for

Similarity Class Formation

– 18x6 Longitude/Lattitude grid – Month of year – Parameters of measurement  1608 similarity classes

  • Evaluation of nine feature vectors

– Retrieval precision – Timing

pressure temp alt CO2 O3 …

Position x Month x Parameter

[Scherer, v. Landesberger et al., 2012]

[Pangaea]

slide-40
SLIDE 40

40

3.2 A Benchmark for Scatter Plot Retrieval

[Scherer, v. Landesberger et al., 2012]

  • Benchmark used to evaluate 9

Feature Vectors for scatter plot (image) data

– Regressional features – KDE – Edge histogram – Etc.

  • Image-based methods perform

best,

  • But are not as intuitive as the

regressional method

slide-41
SLIDE 41

41

Propositions of this Talk

  • 1. Emerging large, complex data sources pose new

challenges to Information Retrieval and Understanding

  • 2. Visual-interactive methods are useful to support

retrieval and data understanding

  • 3. Promising research opportunities
slide-42
SLIDE 42

42

  • 4. Promising Research Opportunities

Information Retrieval and Analysis Information Visualization Evaluation

slide-43
SLIDE 43

43

  • 4. Research Challenges
  • Data complexity and user needs

– Multiple aspects: text, image, relations, time, geo, etc. – Compound data, streaming, large data – Which queries to support? – Users and analysis

  • Explorative search systems

– Automatic interestingness estimation – Relevance feedback for analysis – Hypothesis specification as query modality – How to objectively measure insight ?

slide-44
SLIDE 44

44

  • 5. Conclusions
  • Novel data formats rise opportunities for retrieval

and analysis systems

  • Discussed examples from textual and non-textual

data

  • Visual-interactive methods can be useful
  • Many promising research directions
  • Data collection, use case modeling and

experimentation are required!

slide-45
SLIDE 45

45

Thank you very much for your kind attention Questions and comments very welcome

Acknowledgments

Daniela Oelke, Christian Rohrdantz, Franz Wanner, Juri Buchmüller, Florian Stoffel, Sang-Min Yoon, Max Scherer, Hendrik Strobelt, Enrico Bertini, Daniel Keim

GK Explorative Analysis and Visualization of Large Information Spaces

slide-46
SLIDE 46

46 E-Mail: Tobias.Schreck(at)uni-konstanz.de Web: http://cms.uni-konstanz.de/informatik/schreck/ Phone: (+49) 07531 883375 Fax: (+49) 07531 883065 Mail: University of Konstanz Computer and Information Science Universitaetsstrasse 10, Box 78 D-78457 Konstanz, Germany

Thank you very much for your kind attention

slide-47
SLIDE 47

47

References (1)

  • [Ahlberg and Shneiderman 1994] Ahlberg, C., Shneiderman, B.: Visual information seeking: tight

coupling of dynamic query filters with starfield displays. In: Pro-ceedings of the SIGCHI conference

  • n Human factors in computing systems, pp. 313–317 (1994).
  • [Bertini, Buchmüller et al., 2011] E. Bertini, J. Buchmüller, F. Fischer, S. Huber, T. Lindemeier, F.

Maaß, F. Mansmann, T. Ramm, M. Regenscheit, C. Rohrdantz, C. Scheible, T. Schreck, S. Sellien, F. Stoffel, M. Tautzenberger, M. Zieker, and D. Keim. Visual analytics of terrorist activities related to epidemics. In IEEE Symposium on Visual Analytics Science and Technology, pages 329-330, 2011.

  • [DeCarlo et at., 2003] Doug DeCarlo, Adam Finkelstein, Szymon Rusinkiewicz, Anthony Santella.

Suggestive Contours for Conveying Shape. In SIGGRAPH 2003, pp. 848-855.

  • [Eitz et al., 2012a] Mathias Eitz, James Hays and Marc Alexa: How Do Humans Sketch Objects?

ACM Transactions on Graphics, Proc. SIGGRAPH 2012.

  • [Eitz et al., 2012b] Mathias Eitz, Ronald Richter, Tamy Boubekeur, Kristian Hildebrand and Marc

Alexa: Sketch-based Shape Retrieval ACM Transactions on Graphics, Proc. SIGGRAPH 2012.

  • [F&L 3/2011] Forschung und Lehre, Deutscher Hochschulverband DHV, 3/2011.
  • [Hey, Tansley, Tolle 2009] Hey, Tansley, Tolle (Eds.): The Fourth Paradigm: Data-Intensive

Scientific Discovery. Microsoft Research, 2009. http://research.microsoft.com/en- us/collaboration/fourthparadigm/.

  • [Hochheiser and Shneiderman 2004] Hochheiser, H., Shneiderman, B., Dynamic Query Tools for

Time Series Data Sets: Timebox Widgets for Interactive Exploration, Information Visualization 3, 1 (March 2004), 1-18.

slide-48
SLIDE 48

48

References (2)

  • [Keim, Mansmann et al., 2008] D. A. Keim, F. Mansmann, D. Oelke and H. Ziegler. Visual

Analytics: Combining Automated Discovery with Interactive Visualizations. Proceedings of the 11th International Conference on Discovery Science (DS 2008), Springer-Verlag, pages 2-14, 2008.

  • [MacEachren, Jaiswal et al., 2011] Alan M. MacEachren, Anuj R. Jaiswal, Anthony C. Robinson,

Scott Pezanowski, Alexander Savelyev, Prasenjit Mitra, Xiao Zhang, Justine Blanford: SensePlace2: GeoTwitter analytics support for situational awareness. IEEE VAST 2011: 181-190.

  • [Oelke et al., 2008] D. Oelke, P. Bak, D. A. Keim, M. Last and G. Danon. Visual evaluation of text

features for document summarization and analysis. Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST 2008), pages 75 - 82, 2008.

  • [Oelke, Hao et al., 2009] D. Oelke, M. C. Hao, C. Rohrdantz, D. A. Keim, U. Dayal, L.-E. Haug and
  • H. Janetzko. Visual Opinion Analysis of Customer Feedback Data. Proceedings of the 2009 IEEE

Symposium on Visual Analytics Science and Technology (VAST '09), pages 187-194, 2009.

  • [Oelke, Spretke et al., 2010] D. Oelke, D. Spretke, A. Stoffel and D. A. Keim. Visual Readability

Analysis: How to make your writings easier to read. Proceedings of IEEE Conference on Visual Analytics Science and Technology (VAST '10), pages 123 - 130, 2010.

  • [Pangaea] Pangaea - Data Publisher for Earth & Environmental Science. http://www.pangaea.de/.
  • [Scherer, v. Landesberger et al., 2012] M. Scherer, T. von Landesberger and T. Schreck: A

Benchmark for Content-Based Retrieval in Bivariate Data Collections. Proc. Int. Conference on Theory and Practice of Digital Libraries, 2012.

slide-49
SLIDE 49

49

References (3)

  • [Scherer, Bernard et al., 2011] M. Scherer, J. Bernard and T. Schreck. Retrieval and exploratory

search in multivariate research data repositories using regressional features. Proc. ACM/IEEE Joint Conference on Digital Libraries, pages 363-372, 2011.

  • [Shilane et al 2004] Philip Shilane, Patrick Min, Michael Kazhdan and Thomas Funkhouser: The

Princeton Shape Benchmark. Proc. Shape Modeling International, Genoa Italy, June 2004.

  • [Shneiderman 1996] Shneiderman, B.: The eyes have it: A task by data type taxonomy for

information visualizations. In: IEEE Visual Languages, pp. 336–343 (1996).

  • [SHREC 2012 Sketch-based 3D Retrieval Track] B. Li, T. Schreck, B. Bustos, A. Godil, M Alexa, T

Boubekeur, J Chen, M Eitz, T Furuya, K Hildebrand, S Huang, H Johan, A. Kuijper, R Ohbuchi, R Richter, J. Saavedra, M. Scherer, T Yanagimachi, G Yoon and S. Yoon. SHREC'12 Track: Sketch- Based 3D Shape Retrieval. Proc. EG Workshop on 3D Object Retrieval, Eurographics Association, pages 109-118, 2012.

  • [Strobelt, Oelke et al., 2009] H. Strobelt, D. Oelke, C. Rohrdantz, A. Stoffel, D. A. Keim and O.
  • Deussen. Document Cards: A Top Trumps Visualization for Documents. IEEE Transactions on

Visualization and Computer Graphics, 15(6):1145-1152, 2009.

  • [Teoh and Ma 2003] Soon Tee Teoh, Kwan-Liu Ma: PaintingClass: interactive construction,

visualization and exploration of decision trees. Proc. KDD 2003: 667-672.

  • [VisMaster 2010] D. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann, editors. Mastering The

Information Age - Solving Problems with Visual Analytics. Eurographics, 2010.

slide-50
SLIDE 50

50

References (4)

  • [Wanner, Fuchs et al., 2011] F. Wanner, J. Fuchs, D. Oelke and D. A. Keim. Are my Children Old

Enough to Read these Books? Age Suitability Analysis. POLIBITS - Research journal on Computer science and computer engineering with applications, 2011.

  • [Wise 1995] Wise, J.A., Thomas, J.J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., Crow, V.:

Visualizing the non-visual: spatial analysis and interactionwith information from text documents. In:

  • Proc. IEEE Symposium on Information Visualization, pp. 51–58 (1995).
  • [Yoon et al., 2010] S. Yoon, M. Scherer, T. Schreck and A. Kuijper. Sketch-based 3D model

retrieval using diffusion tensor fields of suggestive contours. Proc. ACM Multimedia, pages 193- 200, 2010.