Integrating Predictive Models with Interactive Visualization Jian - - PowerPoint PPT Presentation

integrating predictive models with interactive
SMART_READER_LITE
LIVE PREVIEW

Integrating Predictive Models with Interactive Visualization Jian - - PowerPoint PPT Presentation

Integrating Predictive Models with Interactive Visualization Jian Zhao, Ph.D. , Assistant Professor Cheriton School of Computer Science University of Waterloo www.jeffjianzhao.com | jianzhao@uwaterloo.ca Short bio Researcher Assistant


slide-1
SLIDE 1

Integrating Predictive Models with Interactive Visualization

Jian Zhao, Ph.D., Assistant Professor Cheriton School of Computer Science University of Waterloo www.jeffjianzhao.com | jianzhao@uwaterloo.ca

slide-2
SLIDE 2

Short bio

2015 2019 2016 2009

Assistant Professor @ U Waterloo Researcher @ FXPAL, Palo Alto Researcher @ Autodesk, Toronto Ph.D. @ U Toronto

slide-3
SLIDE 3

Data Machines Humans

All continuously growing fast!

slide-4
SLIDE 4

I investigate advanced visualizations (vis) that promote the interplay among data, machines (models), and humans (users) in real-world data science applications.

slide-5
SLIDE 5

Bella, Data Scientist

“My input data looks similar, but my classifier performs quite different… Why?”

slide-6
SLIDE 6

Matejka et al, Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing, CHI’17

slide-7
SLIDE 7

Bella, Data Scientist

“I’m building a neural network classifier. I tried many ways, but it doesn’t work… Why?”

Black box

slide-8
SLIDE 8

Tensor Flow Playground, http://playground.tensorflow.org/

slide-9
SLIDE 9

Bella, Data Scientist

“I finally got some good results, but my boss couldn’t understand them...”

slide-10
SLIDE 10
slide-11
SLIDE 11

Visualization is critical in data analysis workflow

Make sense of data Make sense of models Make sense of results

Data exploration Model explanation Results communication

slide-12
SLIDE 12

Top machine learning and data science methods used at work

http://businessoverbroadway.com/top-machine-learning-and-data-science-methods-used-at-work

slide-13
SLIDE 13

Creating effective visualizations is hard

Problem/domain specific

No easy one-size-fits-all solution

Technical skills

Matplotlib, D3.js, ggplot2, …

Sense of design

Huge design space

slide-14
SLIDE 14

Prediction Recommendation … Tables Networks Text & Images … Data analysts General users …

VIS

Make sense of data Make sense of models Make sense of results

slide-15
SLIDE 15

Make sense of data Make sense of models Make sense of results

MOOCex

Comprehend missing link prediction in bipartite networks Leverage video recommendations in

  • nline learning

Explore complex data with visualization recommendations

MissBiN ChartSeer

slide-16
SLIDE 16

Make sense of data

slide-17
SLIDE 17

Exploring large information space

???

slide-18
SLIDE 18

Challenges

Continuously making decision in a large parameter space

Which data variables to explore? What kind of charts to use?

Lacking a holistic view of the analysis space

How is the current status? Where am I?

slide-19
SLIDE 19

Exploring large information space with recommendation

slide-20
SLIDE 20
  • J. Zhao, M. Fan, M. Feng, ChartSeer: Interactive Steering Exploratory Visual Analysis with Machine Intelligence, TVCG

ChartSeer

slide-21
SLIDE 21
slide-22
SLIDE 22

System architecture

slide-23
SLIDE 23

Chart summarization

Chart clusters Variables used Chart glyphs Analysis space

slide-24
SLIDE 24

Controlled user study

Between-subjects design

24 participants (13 females and 11 males)

Interface conditions

ChartSeer v.s. Baseline

Dataset

US college statistics (18 variables)

Tasks

Summarization task Exploration task

slide-25
SLIDE 25

Results of user behaviors

Participants added more charts but updated less charts using ChartSeer ChartSeer led to a broader range of data variables and visual encodings ChartSeer encouraged more focused exploration of data variables ChartSeer allowed for data exploration from more heterogenous visual perspectives

ChartSeer Baseline

slide-26
SLIDE 26

Questionnaire results

ChartSeer Baseline

slide-27
SLIDE 27

Make sense of models

slide-28
SLIDE 28
slide-29
SLIDE 29

“Missing” links in bipartite networks

customer product

E 2 1 3 5 4 C A D B

???

slide-30
SLIDE 30

Missing link prediction

C – 5: 0.974 D – 2: 0.965 E – 1: 0.873 B – 3: 0.852 …

E 2 1 3 5 4 C A D B

Black box

slide-31
SLIDE 31

Analysts’ questions

What

are the missing links?

Why

is a link missing?

How

does a missing link impact?

slide-32
SLIDE 32

MissBiN

A missing link prediction algorithm An interactive visualization A comparative analysis approach

What

are the missing links?

Why

is a link missing?

How

does a missing link impact?

  • J. Zhao, M. Sun, F. Chen, P, Chiu, MissBiN: Visual Analysis of Missing Links in Bipartite Networks, VIS’19
  • J. Zhao, M. Sun, F. Chen, P, Chiu, Understanding Missing Links in Bipartite Networks with MissBiN, TVCG
slide-33
SLIDE 33

Addressing the questions with MissBiN

A missing link prediction algorithm An interactive visualization A comparative analysis approach

What

are the missing links?

Why

is a link missing?

How

does a missing link impact?

slide-34
SLIDE 34
  • 1. Predict the missing links with

standard methods (e.g., common

neighbors [Chang12])

  • 2. Discover all maximal bicliques,

complete subgraphs, of the network (e.g., using MBEA [Zhang14])

  • 3. Re-rank the missing links based
  • n the overlap of bicliques

Prediction of missing links

slide-35
SLIDE 35

In step3, for each pair of bicliques, …

Area(M1) Area(M2 + M3 + M4 + M5)

M4 M5 M1 M2 M3

Xi Xj Yj Yi

slide-36
SLIDE 36

Re-ranking predicted missing links

!′! = $! % !!

Weights computed in step3, based on bicliques information Scores computed in step1, based on standard methods

slide-37
SLIDE 37

Test on 3 datasets

Person-place network from Atlantic Storm corpus [Hughes05] User-conversation network from Slack group communication

Compare with 5 base methods

Jaccard coefficient (JA) common neighbors (CN) Adamic-Adar coefficient (AA) preferential attachment (PA) random walk (RW)

Evaluation of missing link prediction

slide-38
SLIDE 38

Link prediction results

Mostly, PA has the largest performance gain Secondly, CN performs well

Jaccard coefficient (JA), common neighbors (CN), Adamic-Adar coefficient (AA), preferential attachment (PA), random walk (RW)

Original method Our method Performance gain

slide-39
SLIDE 39

Addressing the questions with MissBiN

A missing link prediction algorithm An interactive visualization A comparative analysis approach

What

are the missing links?

Why

is a link missing?

How

does a missing link impact?

slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45

Interview study

A management school professor on exploring organizational communication networks A computer scientist on investigating relationships of crimes and locations in Washington DC

Case study

The Sign of the Crescent [Hughes03]

41 fictional intelligence reports

Extracted person-location network

49 persons and 104 locations, with 328 links

Analysis task

Identify suspicious persons and activities from the reports

Evaluation of MissBiN

slide-46
SLIDE 46

Make sense of results

slide-47
SLIDE 47

Exploring large information space with recommendation

slide-48
SLIDE 48

Current interfaces: ranked lists

slide-49
SLIDE 49

Linear ranked list is not enough

Semantic map significantly improves users’ comprehension capability compared to a ranked list [Peltonen 2017] Orienteering helps understand and trust the answers using both prior and contextual information [Teevan 2004] Support stepping behavior by clustering the information or suggesting query refinements [Teevan 2004]

slide-50
SLIDE 50

Mike, the confused

Want to solve an optimization problem in his work Just watched #19 – choosing stepsize and convergence criteria

Recommendations: 1. Sparse models selection 2. Dirichlet distribution 3. Gradient descent intuition 4. Hill climbing 5. …

slide-51
SLIDE 51
  • J. Zhao, C. Bhatt, M. Cooper, D. Shamma, Flexible Learning with Semantic Visual Exploration and Sequence-Based Recommendation of MOOC Videos, CHI’18

MOOCex

slide-52
SLIDE 52
slide-53
SLIDE 53

Current course Current video Neighboring videos (learning context) Recommendation Topics & keywords (sub-region) Projection based on semantics and context

slide-54
SLIDE 54

Zhao et al, Flexible Learning with Semantic Visual Exploration and Sequence-Based Recommendation of MOOC Videos, CHI’18

slide-55
SLIDE 55

System architecture

slide-56
SLIDE 56

Recommendation engine

Content-based recommendation

Based on TF-IDF

Sequence-based re-ranking

Topic similarity score (TS) Global sequence score (GS) Local sequence score (LS)

Sub-sequence aggregation

Greedy search down the ranked list

Dataset

~4000 videos, ~350 hours running time, from Coursera, EdX, and Udacity

slide-57
SLIDE 57

Visualization generation

Multidimensional scaling (MDS) in feature space

Rotate to comply with left-right browsing flow Tune positions to avoid overlap Merge consecutive videos

Hierarchical clustering

Context-based region division Voronoi tessellation

Topical keywords extraction

Force-directed placement

slide-58
SLIDE 58

Scenario I: “I missed anything?”

Mike Confused about this lecture. Wants to check if missed anything.

slide-59
SLIDE 59

Scenario II: “I want to know more.”

Lisa Already knows about this. Wants to extend her horizon.

slide-60
SLIDE 60

Used by MOOC instructors

Semi-structured interviews with two university instructors

“I normally don’t look at what others teach, but the tool provides the awareness of related lectures, so I could borrow some materials to enhance my lecture, and avoid unnecessary duplication.” “If you see one lecture is here [on the Exploration Canvas], then you go very far for the second lecture, and back here again for the third lecture, you should really think about reordering the content presented in the videos.”

slide-61
SLIDE 61

One more thing…

slide-62
SLIDE 62

Thank all my collaborators!

Available on https://www.jeffjianzhao.com/webapp/EgoLines/egolines.html

slide-63
SLIDE 63

Another thing…

slide-64
SLIDE 64

Welcome to apply to Waterloo HCI

http://hci.cs.uwaterloo.ca/

slide-65
SLIDE 65

Integrating Predictive Models with Interactive Visualization

Jian Zhao, Ph.D., Assistant Professor Cheriton School of Computer Science University of Waterloo www.jeffjianzhao.com jianzhao@uwaterloo.ca