Analytics Building Blocks Duen Horng (Polo) Chau Associate - - PowerPoint PPT Presentation

analytics building blocks
SMART_READER_LITE
LIVE PREVIEW

Analytics Building Blocks Duen Horng (Polo) Chau Associate - - PowerPoint PPT Presentation

poloclub.github.io/#cse6242 CSE6242/CX4242: Data & Visual Analytics Analytics Building Blocks Duen Horng (Polo) Chau Associate Professor, College of Computing Associate Director, MS Analytics Georgia Tech Mahdi


slide-1
SLIDE 1

poloclub.github.io/#cse6242


CSE6242/CX4242: Data & Visual Analytics


Analytics Building Blocks

Duen Horng (Polo) Chau


Associate Professor, College of Computing
 Associate Director, MS Analytics
 Georgia Tech
 


Mahdi Roozbahani


Lecturer, Computational Science & Engineering, Georgia Tech Founder of Filio, a visual asset management platform

Partly based on materials by Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos

slide-2
SLIDE 2

Collection Cleaning Integration Visualization Analysis Presentation Dissemination

slide-3
SLIDE 3

Building blocks. Not Rigid “Steps”.

Can skip some Can go back (two-way street)

  • Data types inform visualization design
  • Data size informs choice of algorithms
  • Visualization motivates more data cleaning
  • Visualization challenges algorithm

assumptions
 e.g., user finds that results don’t make sense

Collection Cleaning Integration Visualization Analysis Presentation Dissemination

slide-4
SLIDE 4

How “big data” affects the process?


(Hint: almost everything is harder!)

The Vs of big data (3Vs originally, then 7, now 42) Volume: “billions”, “petabytes” are common Velocity: think Twitter, fraud detection, etc. Variety: text (webpages), video (youtube)… Veracity: uncertainty of data Variability Visualization Value

Collection Cleaning Integration Visualization Analysis Presentation Dissemination

http://www.ibmbigdatahub.com/infographic/four-vs-big-data http://dataconomy.com/seven-vs-big-data/ https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx

slide-5
SLIDE 5

Two Example Projects 


from Polo Club

slide-6
SLIDE 6

Apolo Graph Exploration: 
 Machine Learning + Visualization


6 Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning. 
 Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. CHI 2011.

slide-7
SLIDE 7

7

slide-8
SLIDE 8

7

Beautiful Hairball Death Star Spaghetti

slide-9
SLIDE 9

Finding More Relevant Nodes

HCI

Paper

Data Mining


Paper

Citation network

8

slide-10
SLIDE 10

Finding More Relevant Nodes

HCI

Paper

Data Mining


Paper

Citation network

8

slide-11
SLIDE 11

Finding More Relevant Nodes

Apolo uses guilt-by-association
 (Belief Propagation)

HCI

Paper

Data Mining


Paper

Citation network

8

slide-12
SLIDE 12

Demo: Mapping the Sensemaking Literature

9

Nodes: 80k papers from Google Scholar (node size: #citation) Edges: 150k citations

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Key Ideas (Recap)

Specify exemplars Find other relevant nodes (BP)

11

slide-16
SLIDE 16

What did Apolo go through?

Collection Cleaning Integration Visualization Analysis Presentation Dissemination

Scrape Google Scholar. No API. 😪 Design inference algorithm 


(Which nodes to show next?)

Paper, talks, lectures Interactive visualization you just saw You will a new Apolo prototype 


(called Argo)

slide-17
SLIDE 17

13

Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning. Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. ACM Conference on Human Factors in Computing Systems (CHI) 2011. May 7-12, 2011.

slide-18
SLIDE 18

NetProbe: 
 Fraud Detection in Online Auction

NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007

slide-19
SLIDE 19

Find bad sellers (fraudsters) on eBay who don’t deliver their items

NetProbe: The Problem

Buyer

$$$

Seller

15

Non-delivery fraud is a common auction fraud

source: https://www.fbi.gov/contact-us/field-offices/portland/news/press-releases/fbi-tech-tuesday---building-a-digital-defense-against-auction-fraud

slide-20
SLIDE 20

16

slide-21
SLIDE 21

NetProbe: Key Ideas

  • Fraudsters fabricate their reputation by

“trading” with their accomplices

  • Fake transactions form near bipartite cores
  • How to detect them?

17

slide-22
SLIDE 22

NetProbe: Key Ideas

Use Belief Propagation

18

F A H Fraudster Accomplic e Honest

Darker means more likely

slide-23
SLIDE 23

NetProbe: Main Results

19

slide-24
SLIDE 24

20

slide-25
SLIDE 25

20

slide-26
SLIDE 26

20

“Belgian Police”

slide-27
SLIDE 27

21

slide-28
SLIDE 28

What did NetProbe go through?

Collection Cleaning Integration Visualization Analysis Presentation Dissemination

Scraping (built a “scraper”/“crawler”) Design detection algorithm Not released Paper, talks, lectures

slide-29
SLIDE 29

23

NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. International Conference on World Wide Web (WWW) 2007. May 8-12, 2007. Banff, Alberta, Canada. Pages 201-210.

slide-30
SLIDE 30

Homework 1

  • Simple “End-to-end” analysis
  • Collect data about LEGO via API
  • Store in SQLite database
  • Create graph from data
  • Analyze, using SQL queries (e.g.,

create graph’s degree distribution)

  • Visualize graph using ARGO Lite
  • Describe your discoveries

Collection Cleaning Integration Visualization Analysis Presentation Dissemination