Approaches, Applications, and Research Challenges Tobias Schreck - - PowerPoint PPT Presentation
Approaches, Applications, and Research Challenges Tobias Schreck - - PowerPoint PPT Presentation
Visual Search and Analysis in Textual and Non-Textual Document Repositories Approaches, Applications, and Research Challenges Tobias Schreck Visual Analytics Group Computer and Information Science University of Konstanz, Germany CLEF 2012
2
- 1. Need for Search and Analysis in Large Data
Technological progress: Information Overload
– Acquisition, production, storage – Data integration, data mining Large and increasing amounts of data
Data-intensive application domains
– Business – Research – Engineering
Need for new technologies
– „… to unite the seemingly conflicting requirements of scalability and usability in making sense of the data“ [VisMaster 2010]
Share of digital information 2000: 25% 2002: 50% (Begin Digital Age) 2007: 94% (300 Exabyte) Estimated growth rates (1986-2007) Storage: 23% Network: 28% Compute: 56% Source: Science, according to [F&L 3/2011]
3
- 1. Data Examples
Textual Data Repositories
– Digital Libraries – Web – Social Media
Non-textual Data Repositories
– Image repositories – 3D Object repositories – Data repositories
Sloan Digital Sky Survey (http://www.sdss.org/) PROBADO3D Archive (http://www.probado.de/3d.html) Victoria State Library Image Collection (http://www.slv. vic.gov.au/) Customer Reviews (Amazon.com) Digital Libraries www.facebook.com www.twitter.com
4
- 1. How to Make Use of Large Data Repositories?
Searching
– Find information entities of interest – Reusage, comparison – Based on specification of queries
Analyzing
– Find structures and abstractions (“Understand” data set as a whole) – Check hypotheses – Make interesting, actionable observations
Interdependence
– Cycles of searching and analyzing
5
- 1. Visual Search and Analysis
Visual representation of the search and analysis process [Shneiderman 1996] Goals of Visual Information Systems
– Intuitive access, direct manipulation – Leverage human visual perception – Encourage exploration
Classic visual search systems
– Filmfinder [Ahlberg and Shneiderman 1994] – Time Searcher [Hochheiser and Shneiderman 2004]
Classic visual analysis systems
– Spire/In-Spire [Wise et al 1995] – Visual decision tree construction and analysis [Teoh and Ma 2003]
[Ahlberg and Shneiderman 1994]
[Wise et al 1995]
6
Propositions of this Talk
- 1. Emerging large, complex data sources pose new
challenges to Information Retrieval and Understanding
- 2. Visual-interactive methods are useful to support
retrieval and data understanding
- 3. Promising research opportunities at intersection of
visualization, information retrieval, and evaluation
7
Outline
- 1. Introduction
- 2. Overview Visualization for Large Text
2.1 Feature-based Text Visualization 2.2 Attribute-based Text Visualization 2.3 Visual Document Summarization 2.4 Geo-referenced Micro Blogging Text
- 3. Visual Search in Non-Textual Data
- 4. Promising Research Opportunities
- 5. Conclusions
8
2.1 Sentiment Analysis
- Opinion score derived from
adjectives, nouns, and verbs
- Identifies positive and
negative sections Overview over large document corpora Find articles which suit the mood of the reader
[Keim, Mansmann et al., 2008]
9
2.1. Sentiment Analysis: News Overview
10
2.1 Pixel-based Approach
Feature: average sentence length
[Oelke et al., 2008]
11
2.1 Readability Features
[Oelke, Spretke et al., 2010]
12
2.1 Readability Features: Vocabulary Difficulty
- f 2009 German Election Programs
Feature: Vocabulary Difficulty
Die Linke Piraten
[Oelke, Spretke et al., 2010]
13
2.2 Attribute-based: Story, Character Complexity
King‘s IT Rowling‘s Harry Potter
[Wanner, Fuchs et al., 2011]
14
2.2 Attribute-based: Visual Review Analysis
- User opinions abundantly
available
– Forums, Blogs – E-commerce – …
- Many application
possibilities
– Product reviews for customers – Market analysis – Customer relationship management
Amazon customer reviews (amazon.com)
15
2.2 Attribute-based: Visual Review Analysis
- Basic method
– Identify product attributes – Identify positive/negative
- pinions
– Calculate weighted attribute vector
- Visual comparison of sets
- f reviews
– Glyph matrix approach – Cluster analysis
- Applied to printer product
reviews
cartridge paper tray price printer scanner software
- 1
+1 +1 [Oelke, Hao et al., 2009]
2.2 Attribute-based: Visual Review Analysis
16
[Oelke, Hao et al., 2009]
17
2.2 Attribute-based: Customer Segmentation
[Oelke, Hao et al., 2009]
18
2.3 Visual Content Overviewing
- Visual abstract for
scientific articles
– Extraction of important figures and keyword – Layout of elements in generalized word cloud
- Overviewing
- Navigation
- Comparison
[Strobelt, Oelke et al., 2009]
19
2.3 Visual Content Overviewing
[Strobelt, Oelke et al., 2009]
20
2.3 Visual Content Overviewing
[Strobelt, Oelke et al., 2009]
21
2.4 Georeferenced Microblogging Text
- Microblogging Text
(e.g., Twitter)
– Short text messages – Time stamp – GPS position
- Potential analytic use
– Trend analysis – Marketing, Reputation monitoring – Situational awareness for civil defense or crisis management
Nice view, all fine … Stuck in a jam after traffic accident …
[www.google.com]
22
2.4 SensePlace2 Tool
[MacEachren, Jaiswal et al., 2011]
23
2.4 VAST Micro Blogging Challenge
- VAST Challenge 2011
– Fictitious city including street network and POIs – 1 mio microblogging messages for 20 days incl. spatial positon – Fictitious hidden epidemic scenario
- Task
– Find possible epidemics and its characteristics
[http://hcil.cs.umd.edu/localphp/hcil/vast11/]
24
2.4 VAST Micro Blogging Challenge
[Bertini, Buchmüller et al., 2011]
25
2.4 Concentration on Bridges
26
2.4 Concentration in Hospitals
27
2.4 Message Distribution (19.05.) – Filtered for Symptom Keywords
28
2.4 VAST Micro Blogging Challenge
29
Remainder of this Talk
- 1. Introduction
- 2. Overview Visualization for Large Text
2.1 Feature-based Text Visualization 2.2 Attribute-based Text Visualization 2.3 Visual Document Summarization 2.4 Geo-referenced Micro Blogging Text
- 3. Visual Search in Non-Textual Data
3.1 Sketch-based 3D Object Retrieval 3.2 Retrieval in Bivariate Measurement Data
- 4. Promising Research Opportunities
- 5. Conclusions
30
- 3. Visual Search in Non-Textual Data
Multitude of complex document types
– Images – Video – 3D Objects – Multivariate Research Data – Etc.
Research questions to address
– Similarity functions? – Query types to support? – How to evaluate?
PROBADO3D Archive [http://www.probado.de/3d.html] Sloan Digital Sky Survey (http://www.sdss.org/) Victoria State Library Image Collection (http://www.slv.vic.gov.au/)
31
3.1 Query-by-Exampe and Sketch-Based Retrieval
Problems:
- 1. How to compare structurally
different views?
- 2. How to evaluate different
sketching styles?
32
3.1 Gradient Features, Suggestive Contours
[Yoon et al., 2010] [DeCarlo et at., 2003]
33
14 classes subset of Princeton Shape Benchmark [Shilane et al 2004] Collection of 20 user sketches per class Evaluation of retrieval performance (per class, given user sketch)
3.1 Sketch-Based 3D Object Retrieval
[Yoon et al., 2010]
34 [SHREC 2012 Sketch-based 3D Retrieval Track]
3.1 SHREC’12 Track: Sketch-Based 3D Retrieval
35
3.1 Large-Scale Sketch Benchmark
Crowd-sourced approach of [Eitz et al., 2012a]
- 20.000 sketches from
1300 users
- 250 representative object
categories
- Basis for improved
benchmarking study
[Eitz et al., 2012b]
Recognition experiment
- Avg. human accuracy: 73%
- Avg. automatic accuracy:
56%
[Eitz et al., 2012a]
36
3.2 Visual Search in Bivariate (Research) Data
- Jim Gray‘s Fourth Paradigm and
emerging research data repositories [Hey, Tansley, Tolle 2009]
- Prominent type of quantitative
data: bivariate and multivariate data
- Common visual representation
– Scatter plot – Scatter plot matrix
- Content-based support for visual
search and analysis in this data?
[Pangaea]
37
3.2 Regressional Feature Vector for Comparing Scatter Plots
Perform regressions (linear, square, log, …) Form feature vectors
- Goodness of fit scores
- Coefficient parameters
[Scherer, Bernard et al., 2011]
38
3.2 Search and Analysis Application
[Scherer, Bernard et al., 2011]
cluster altitude vs PPPP (pressure hPa) sort by similarity to f(x)=e^-x query by example Spatial reference of data sets
39
3.2 A Benchmark for Earth Observation Data
- But how to create a benchmark data
set for automatic evaluation?
- Input data
– BSRN earth observation data (radiation, temperature, etc.) for 40 stations – 24.700 bivariate plots generated
- Tobler’s First Law of Geography for
Similarity Class Formation
– 18x6 Longitude/Lattitude grid – Month of year – Parameters of measurement 1608 similarity classes
- Evaluation of nine feature vectors
– Retrieval precision – Timing
pressure temp alt CO2 O3 …
Position x Month x Parameter
[Scherer, v. Landesberger et al., 2012]
[Pangaea]
40
3.2 A Benchmark for Scatter Plot Retrieval
[Scherer, v. Landesberger et al., 2012]
- Benchmark used to evaluate 9
Feature Vectors for scatter plot (image) data
– Regressional features – KDE – Edge histogram – Etc.
- Image-based methods perform
best,
- But are not as intuitive as the
regressional method
41
Propositions of this Talk
- 1. Emerging large, complex data sources pose new
challenges to Information Retrieval and Understanding
- 2. Visual-interactive methods are useful to support
retrieval and data understanding
- 3. Promising research opportunities
42
- 4. Promising Research Opportunities
Information Retrieval and Analysis Information Visualization Evaluation
43
- 4. Research Challenges
- Data complexity and user needs
– Multiple aspects: text, image, relations, time, geo, etc. – Compound data, streaming, large data – Which queries to support? – Users and analysis
- Explorative search systems
– Automatic interestingness estimation – Relevance feedback for analysis – Hypothesis specification as query modality – How to objectively measure insight ?
44
- 5. Conclusions
- Novel data formats rise opportunities for retrieval
and analysis systems
- Discussed examples from textual and non-textual
data
- Visual-interactive methods can be useful
- Many promising research directions
- Data collection, use case modeling and
experimentation are required!
45
Thank you very much for your kind attention Questions and comments very welcome
Acknowledgments
Daniela Oelke, Christian Rohrdantz, Franz Wanner, Juri Buchmüller, Florian Stoffel, Sang-Min Yoon, Max Scherer, Hendrik Strobelt, Enrico Bertini, Daniel Keim
GK Explorative Analysis and Visualization of Large Information Spaces
46 E-Mail: Tobias.Schreck(at)uni-konstanz.de Web: http://cms.uni-konstanz.de/informatik/schreck/ Phone: (+49) 07531 883375 Fax: (+49) 07531 883065 Mail: University of Konstanz Computer and Information Science Universitaetsstrasse 10, Box 78 D-78457 Konstanz, Germany
Thank you very much for your kind attention
47
References (1)
- [Ahlberg and Shneiderman 1994] Ahlberg, C., Shneiderman, B.: Visual information seeking: tight
coupling of dynamic query filters with starfield displays. In: Pro-ceedings of the SIGCHI conference
- n Human factors in computing systems, pp. 313–317 (1994).
- [Bertini, Buchmüller et al., 2011] E. Bertini, J. Buchmüller, F. Fischer, S. Huber, T. Lindemeier, F.
Maaß, F. Mansmann, T. Ramm, M. Regenscheit, C. Rohrdantz, C. Scheible, T. Schreck, S. Sellien, F. Stoffel, M. Tautzenberger, M. Zieker, and D. Keim. Visual analytics of terrorist activities related to epidemics. In IEEE Symposium on Visual Analytics Science and Technology, pages 329-330, 2011.
- [DeCarlo et at., 2003] Doug DeCarlo, Adam Finkelstein, Szymon Rusinkiewicz, Anthony Santella.
Suggestive Contours for Conveying Shape. In SIGGRAPH 2003, pp. 848-855.
- [Eitz et al., 2012a] Mathias Eitz, James Hays and Marc Alexa: How Do Humans Sketch Objects?
ACM Transactions on Graphics, Proc. SIGGRAPH 2012.
- [Eitz et al., 2012b] Mathias Eitz, Ronald Richter, Tamy Boubekeur, Kristian Hildebrand and Marc
Alexa: Sketch-based Shape Retrieval ACM Transactions on Graphics, Proc. SIGGRAPH 2012.
- [F&L 3/2011] Forschung und Lehre, Deutscher Hochschulverband DHV, 3/2011.
- [Hey, Tansley, Tolle 2009] Hey, Tansley, Tolle (Eds.): The Fourth Paradigm: Data-Intensive
Scientific Discovery. Microsoft Research, 2009. http://research.microsoft.com/en- us/collaboration/fourthparadigm/.
- [Hochheiser and Shneiderman 2004] Hochheiser, H., Shneiderman, B., Dynamic Query Tools for
Time Series Data Sets: Timebox Widgets for Interactive Exploration, Information Visualization 3, 1 (March 2004), 1-18.
48
References (2)
- [Keim, Mansmann et al., 2008] D. A. Keim, F. Mansmann, D. Oelke and H. Ziegler. Visual
Analytics: Combining Automated Discovery with Interactive Visualizations. Proceedings of the 11th International Conference on Discovery Science (DS 2008), Springer-Verlag, pages 2-14, 2008.
- [MacEachren, Jaiswal et al., 2011] Alan M. MacEachren, Anuj R. Jaiswal, Anthony C. Robinson,
Scott Pezanowski, Alexander Savelyev, Prasenjit Mitra, Xiao Zhang, Justine Blanford: SensePlace2: GeoTwitter analytics support for situational awareness. IEEE VAST 2011: 181-190.
- [Oelke et al., 2008] D. Oelke, P. Bak, D. A. Keim, M. Last and G. Danon. Visual evaluation of text
features for document summarization and analysis. Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST 2008), pages 75 - 82, 2008.
- [Oelke, Hao et al., 2009] D. Oelke, M. C. Hao, C. Rohrdantz, D. A. Keim, U. Dayal, L.-E. Haug and
- H. Janetzko. Visual Opinion Analysis of Customer Feedback Data. Proceedings of the 2009 IEEE
Symposium on Visual Analytics Science and Technology (VAST '09), pages 187-194, 2009.
- [Oelke, Spretke et al., 2010] D. Oelke, D. Spretke, A. Stoffel and D. A. Keim. Visual Readability
Analysis: How to make your writings easier to read. Proceedings of IEEE Conference on Visual Analytics Science and Technology (VAST '10), pages 123 - 130, 2010.
- [Pangaea] Pangaea - Data Publisher for Earth & Environmental Science. http://www.pangaea.de/.
- [Scherer, v. Landesberger et al., 2012] M. Scherer, T. von Landesberger and T. Schreck: A
Benchmark for Content-Based Retrieval in Bivariate Data Collections. Proc. Int. Conference on Theory and Practice of Digital Libraries, 2012.
49
References (3)
- [Scherer, Bernard et al., 2011] M. Scherer, J. Bernard and T. Schreck. Retrieval and exploratory
search in multivariate research data repositories using regressional features. Proc. ACM/IEEE Joint Conference on Digital Libraries, pages 363-372, 2011.
- [Shilane et al 2004] Philip Shilane, Patrick Min, Michael Kazhdan and Thomas Funkhouser: The
Princeton Shape Benchmark. Proc. Shape Modeling International, Genoa Italy, June 2004.
- [Shneiderman 1996] Shneiderman, B.: The eyes have it: A task by data type taxonomy for
information visualizations. In: IEEE Visual Languages, pp. 336–343 (1996).
- [SHREC 2012 Sketch-based 3D Retrieval Track] B. Li, T. Schreck, B. Bustos, A. Godil, M Alexa, T
Boubekeur, J Chen, M Eitz, T Furuya, K Hildebrand, S Huang, H Johan, A. Kuijper, R Ohbuchi, R Richter, J. Saavedra, M. Scherer, T Yanagimachi, G Yoon and S. Yoon. SHREC'12 Track: Sketch- Based 3D Shape Retrieval. Proc. EG Workshop on 3D Object Retrieval, Eurographics Association, pages 109-118, 2012.
- [Strobelt, Oelke et al., 2009] H. Strobelt, D. Oelke, C. Rohrdantz, A. Stoffel, D. A. Keim and O.
- Deussen. Document Cards: A Top Trumps Visualization for Documents. IEEE Transactions on
Visualization and Computer Graphics, 15(6):1145-1152, 2009.
- [Teoh and Ma 2003] Soon Tee Teoh, Kwan-Liu Ma: PaintingClass: interactive construction,
visualization and exploration of decision trees. Proc. KDD 2003: 667-672.
- [VisMaster 2010] D. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann, editors. Mastering The
Information Age - Solving Problems with Visual Analytics. Eurographics, 2010.
50
References (4)
- [Wanner, Fuchs et al., 2011] F. Wanner, J. Fuchs, D. Oelke and D. A. Keim. Are my Children Old
Enough to Read these Books? Age Suitability Analysis. POLIBITS - Research journal on Computer science and computer engineering with applications, 2011.
- [Wise 1995] Wise, J.A., Thomas, J.J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., Crow, V.:
Visualizing the non-visual: spatial analysis and interactionwith information from text documents. In:
- Proc. IEEE Symposium on Information Visualization, pp. 51–58 (1995).
- [Yoon et al., 2010] S. Yoon, M. Scherer, T. Schreck and A. Kuijper. Sketch-based 3D model