Exploring ENRON Email with NetLens Catherine Plaisant, Benjamin B. - - PowerPoint PPT Presentation

exploring enron email with netlens
SMART_READER_LITE
LIVE PREVIEW

Exploring ENRON Email with NetLens Catherine Plaisant, Benjamin B. - - PowerPoint PPT Presentation

Joint Institute for Knowledge Discovery Exploring ENRON Email with NetLens Catherine Plaisant, Benjamin B. Bederson Hyunmo Kang, Bongshin Lee Human-Computer Interaction Laboratory University of Maryland Our research focus Alternatives UI to


slide-1
SLIDE 1

Exploring ENRON Email with NetLens

Catherine Plaisant, Benjamin B. Bederson Hyunmo Kang, Bongshin Lee

Human-Computer Interaction Laboratory University of Maryland

Joint Institute for Knowledge Discovery

slide-2
SLIDE 2

Our research focus Alternatives UI to Graph Visualization

how to avoid this… Node-Link diagrams have many limitations. Not readable, may show clusters but not much else, do not scale well.

slide-3
SLIDE 3

NetLens

Iterative Exploration of Content-Actor Network Data

User Interface for exploratory search Generalizable to a variety of data

Provide consistent interface

Easy to learn and use

Kang et al.

  • Proc. of Visual Analytics Science and Technology Conference (VAST 06)

Kang and al. Poster/Demo at Joint Conference in Digital Libraries, 2006

slide-4
SLIDE 4

NetLens

Iterative Exploration of Content-Actor Network Data

Paired networks of Content and Actors, e.g. Paired networks of Papers and Authors

  • Papers refers to other papers
  • Authors have advisors

Paired networks of Emails and People

  • Email respond to or include emails
  • People have assistants who send email for them

Paired networks of Products and Companies

  • Products replace or integrate products
  • Companies are bought or merge
slide-5
SLIDE 5

Entity E1 Entity E2 Self-relationship Self-relationship Relationship

Content-actor model

Examples for scientific papers:

slide-6
SLIDE 6

Toward SCALABILITY

Total Enron email (non duplicate) 249,760 emails, 87,673 people

slide-7
SLIDE 7

Email Overview by years People (addresses) Overview by Domain

slide-8
SLIDE 8

Alternative overviews: emails by day of the week, grouped by year People by: connectance magnitude (Low medium high)

slide-9
SLIDE 9

Multiple email search capabilities

1- Keyword Search Here a search on “California” 2- Similarity Search Find emails similar to

  • ne or more selected emails

Result set loaded in “My list” (with Doug Oard’s team)

slide-10
SLIDE 10

Social network analysis:

  • Number of neighbors
  • Connectance
  • Centrality
  • Average Path Length
  • Here selected people with

high connectance With Jen Golbeck

slide-11
SLIDE 11

Social network analysis:

  • Number of neighbors
  • Connectance
  • Centrality
  • Average Path Length
  • Here selected people with

high connectance With Jen Golbeck

Explanations of the meaning of the attributes

slide-12
SLIDE 12

People bios

Using

signatures and directory info

  • with Jen Golbeck
slide-13
SLIDE 13

Integrated Phone calls

Replay Separate conversations Direct access to mentions of : subject, names, keywords (with Carol Espy’s team)

slide-14
SLIDE 14

Thread Summaries

  • List of emails in same thread
  • Access to thread
  • Access to thread summary

With Bonnie Dorr and Doug Oard’s teams

slide-15
SLIDE 15

TreePlus to browse subset of network connections

slide-16
SLIDE 16
slide-17
SLIDE 17

TreePlus

  • Visualizing Graphs as Trees

Plant a seed and watch it grow Faster, more accurate, preferred

  • ver traditional graphs

for tasks that involve reading and exploration of connections

To show hidden graph structure

Highlight and preview of adjacent nodes Animated change of tree structure Visual hints about graph structure

  • B. Lee, C.S. Parr, C. Plaisant, B.B. Bederson, V.D. Veksler, W.D. Gray, C. Kotfila (2006)

TreePlus: Interactive Exploration of Networks with Enhanced Tree Layouts To appear in TVCG Special Issues on Visual Analytics

  • B. Lee, C.S. Parr, C. Plaisant, B.B. Bederson (2005)

Visualizing Graphs as Trees: Plant a seed and watch it grow Proceedings of GD 2005 (poster), LNCS, pp. 516-518

slide-18
SLIDE 18

Generalization to other datasets

e.g. NetLens for Scientific Publications (Papers and Authors)

slide-19
SLIDE 19

User evaluation

Heuristic review at NIST 5 PEOPLE – self trained with video) Usability Study 9 people, training, debriefing Other improvements

  • Improved feedback
  • +++ Improvement of flow management
  • Addition of My List
  • Adaptive explanations of views
  • Video training
  • Documentation of source / processing of variables
slide-20
SLIDE 20

Implementation

C# (using piccolo toolkit) MS Access Database NetLens component code available on request

slide-21
SLIDE 21

Conclusions - Future Directions

Conclusions Simple content actor model helpful Powerful yet simple Training about flow behavior Continue integration with other IJKD data E.g. Entity resolution Evaluation (case studies of analysis) Needs for Proto Tool Facilitate code customization for different applications Flexible entities switching (to handle any choice of pairs) Usability

slide-22
SLIDE 22

Thank You

plaisant@cs.umd.edu (301)405-2768 bederson@cs.umd.edu (301) 405-2764 NetLens: www.cs.umd.edu/hcil/netlens TreePlus: www.cs.umd.edu/hcil/treeplus Papers and Video demonstrations available

from website.

Source code available on request.

slide-23
SLIDE 23

OTHER relevant HCIL projects

slide-24
SLIDE 24

Temporal Data (Categorical): PatternFinder for Patient History Search

Fails, Karlson, Shahamat & Shneiderman, VAST 2006

slide-25
SLIDE 25

Systematic & Flexible Network Exploration with SocialAction

Clustering shows grouping Abstraction reveals relationships Perer & Shneiderman, InfoVis 2006

slide-26
SLIDE 26

Network Visualization with Semantic Substrates

  • Meaningful

layout of nodes

  • User controlled

visibility of links

Shneiderman & Aris, InfoVis 2006