data visual analytics
play

Data & Visual Analytics Mahdi Roozbahani Lecturer, - PowerPoint PPT Presentation

CX4242: Data & Visual Analytics Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech Assignments Overview (Tentative and subject to change) CX 4242 Assignment 1 Platforms, Languages & Technologies Python,


  1. CX4242: Data & Visual Analytics Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

  2. Assignments Overview (Tentative and subject to change) CX 4242

  3. Assignment 1 Platforms, Languages & Technologies Python, Gephi, SQLite, D3, OpenRefine Questions Q1: Collecting and visualizing data (Python & Gephi) Q2: Analysing data using SQLite Q3: D3 Warmup Q4: Analysing data through OpenRefine

  4. Assignment 2 Platforms, Languages & Technologies D3, Tableau Questions Q1: Designing a good table and visualizing data with Tableau Q2: Force directed graph using D3 Q3: Scatter plots using D3 Q4: Heatmap using D3 Q5: Interactive visualization using D3 Q6: Choropleth map using D3 Q7: Pros and cons of various visualization tools

  5. Assignment 3 Platforms, Languages & Technologies Java, Hadoop, Spark, Pig, Azure Questions Q1: Analyzing a graph withHadoop/Java Q2: Analyzing a graph with Spark/Scala on Databricks Q3: Analyzing data with Pig on AWS Q4: Analyzing a graph using Hadoop on Microsoft Azure Q5: Regression using Azure ML Studio

  6. Assignment 4 Platforms, Languages & Technologies Pypy, PageRank, Random Forest, SciKit Learn Questions Q1: Scalable single-machine PageRank Q2: Implementing a random forest classifier Q3: Using Scikit-Learn for running various classifiers

  7. Collection Cleaning Integration Analysis Visualization Presentation Dissemination

  8. Building blocks. Not Rigid “Steps”. Collection Can skip some Cleaning Can go back (two-way street) • Data types inform visualization design Integration • Data size informs choice of algorithms Analysis • Visualization motivates more data cleaning Visualization • Visualization challenges algorithm Presentation assumptions e.g., user finds that results don’t make sense Dissemination

  9. How “big data” affects the process? (Hint: almost everything is harder!) The Vs of big data (3Vs originally, then 7, now 42) Collection Volume : “billions”, “petabytes” are common Cleaning Velocity : think Twitter, fraud detection, etc. Integration Variety : text (webpages), video (youtube)… Analysis Veracity : uncertainty of data Visualization Variability Visualization Presentation Value Dissemination http://www.ibmbigdatahub.com/infographic/four-vs-big-data http://dataconomy.com/seven-vs-big-data/ https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx

  10. Three Example Projects from Polo and Mahdi Research group

  11. Apolo Graph Exploration: Machine Learning + Visualization Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning . Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. CHI 2011. 18

  12. Beautiful Hairball Death Star Spaghetti 19

  13. Finding More Relevant Nodes HCI Paper Data Mining Paper Citation network Apolo uses guilt-by-association (Belief Propagation) 20

  14. Demo : Mapping the Sensemaking Literature Nodes : 80k papers from Google Scholar (node size: #citation) Edges : 150k citations 22

  15. Key Ideas (Recap) Specify exemplars Find other relevant nodes (BP) 24

  16. What did Apolo go through? Scrape Google Scholar. No API. 😪 Collection Cleaning Integration Design inference algorithm Analysis (Which nodes to show next?) Interactive visualization you just saw Visualization Paper, talks, lectures Presentation You will a new Apolo prototype Dissemination (called Argo)

  17. Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning . Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. 26 ACM Conference on Human Factors in Computing Systems (CHI) 2011 . May 7-12, 2011.

  18. NetProbe : Fraud Detection in Online Auction NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007

  19. NetProbe: The Problem Find bad sellers ( fraudsters ) on eBay who don’t deliver their items $$$ Buyer Seller Non-delivery fraud is a common auction fraud source: https://www.fbi.gov/contact-us/field-offices/portland/news/press-releases/fbi-tech-tuesday---building-a-digital-defense-against-auction-fraud 28

  20. 29

  21. NetProbe: Key Ideas Fraudsters fabricate their reputation by “trading” with their accomplices Fake transactions form near bipartite cores How to detect them? 30

  22. NetProbe: Key Ideas Use Belief Propagation F A H Fraudster Darker means Accomplice more likely Honest 31

  23. NetProbe: Main Results 33

  24. “Belgian Police” 34

  25. 35

  26. What did NetProbe go through? Scraping (built a “scraper”/“crawler”) Collection Cleaning Integration Design detection algorithm Analysis Visualization Paper, talks, lectures Presentation Not released Dissemination

  27. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks . Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. International Conference on World Wide 37 Web (WWW) 2007 . May 8-12, 2007. Banff, Alberta, Canada. Pages 201-210.

  28. FONT TELLER

  29. Homework 1 (out next week; tasks subject to change) • Simple “End -to- end” analysis Collection • Collect data using API Cleaning • Store in SQLite database Integration • Create graph from data Analysis • Analyze, using SQL queries (e.g., Visualization create graph’s degree distribution) • Visualize graph using Gephi Presentation • Describe your discoveries Dissemination

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend