Pick up a handout on the front table 1 Welcome to DS504/CS586: - - PowerPoint PPT Presentation

pick up a handout on the front table
SMART_READER_LITE
LIVE PREVIEW

Pick up a handout on the front table 1 Welcome to DS504/CS586: - - PowerPoint PPT Presentation

Pick up a handout on the front table 1 Welcome to DS504/CS586: Big Data Analytics --Review Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Today 1. Review Key topics, techniques, discussed in the semester 2.


slide-1
SLIDE 1

1

Pick up a handout

  • n the front table
slide-2
SLIDE 2

DS504/CS586: Big Data Analytics

  • -Review
  • Prof. Yanhua Li

Welcome to

Time: 6:00pm –8:50pm R Location: AK232 Fall 2016

slide-3
SLIDE 3

Today

  • 1. Review

– Key topics, techniques, discussed in the semester

  • 2. Future opportunities

– Big data analytics – Urban Computing

10 min Break

  • 3. Team 1 presentation
  • 4. Course evaluation
  • 5. Group discussion for final projects
slide-4
SLIDE 4

Introduction

What is “Big Data”?

4

slide-5
SLIDE 5

Big Data Analytics techniques and tools for managing, analyzing and extracting knowledge from “big data”

5

slide-6
SLIDE 6

CS586/DS504-2016Fall

  • 2. Data Preprocessing/Cleaning
  • 1. Data Acquisition & Measurement
  • 3. Data Management
  • 4. Big Data Mining

Graph Mining, Data Clustering Recommender systems, Deep Learning

  • 5. Applications

Urban Computing, Social Network Analysis Networking Indexing, Query Processing Error Correction, Map-Matching Representative data collection: Sampling Techniques Sampling and index Clustering

  • 4. K-means, DBSCAN
  • 4. BFR, DENCLUE
  • 4. Trajectory Clustering
  • 5. Urban: Bike sharing
  • 1. Graph Mining
  • 3. Index, Query
  • 4. Data Collection
  • 2. Map-Matching
  • 4. Recommender Systems
  • 4. Deep Learning (Guest)

More techniques

slide-7
SLIDE 7

Big Data Mining Topics

Topics in Big Data Mining 1 Graph Mining: 2 Clustering Hierarchical K-means, BFR DBScan, DENCLUE Graph Sampling Node Importance Ranking

  • 4. Deep Learning

Deep Neural Networks 3 Recommender Systems Content-Based Collaborative Filtering User-User Based Item-Item Based Facebook/Social graph estimation Social influence Topic sensitive PageRank Trajectory clustering Location-based recommender sys Personalized Geo-Social Recom. Alpha Go

slide-8
SLIDE 8

Roadmap

  • 1. Sampling & Indexing

– Random prefix/region/zoomin/region sampling – Index structure: B-Tree, Quad-tree, R-tree, etc

  • 2. Clustering

– Hirachical – K-means, DBScan

  • 3. Recommender System, Deep learning, Map-Matching,

etc

  • 4. Applications
slide-9
SLIDE 9

9

Sampling Big Data

1.1 Random sampling (uniform & independent)

1.2 crawling

} vertex sampling } BFS sampling

9

} random walk sampling } edge sampling

slide-10
SLIDE 10

10

Class Outcomes

slide-11
SLIDE 11

11

What is DS504/CS586 about?

v We’ll learn about – Advanced Techniques for Big Data Analytics

  • Large scale data sampling and estimation,
  • Data Cleaning,
  • Graph Data Mining,
  • Data management, clustering, etc.

– Applications with Big Data Analytics

  • Urban Computing
  • Social network analysis
  • Recommender system, etc.

v Learning outcomes

– Understand & Explain challenges and advances in the state-of-art in big data analytics. – Design, develop and fully execute a big data analytics project. – Communicate the ideas effectively in the form of a presentation and written documents to a technical audience.

slide-12
SLIDE 12

CS586/DS504-2016Fall

  • 2. Data Preprocessing/Cleaning
  • 1. Data Acquisition & Measurement
  • 3. Data Management
  • 4. Big Data Mining

Graph Mining, Data Clustering Recommender systems, Deep Learning

  • 5. Applications

Urban Computing, Social Network Analysis Networking Indexing, Query Processing Error Correction, Map-Matching Representative data collection: Sampling Techniques Sampling and index Clustering

  • 4. K-means, DBSCAN
  • 4. BFR, DENCLUE
  • 4. Trajectory Clustering
  • 1. Graph Mining
  • 3. Index, Query
  • 4. Data Collection
  • 2. Map-Matching
  • 4. Recommender Systems
  • 4. Deep Learning (Guest)

More techniques

slide-13
SLIDE 13

13

Project 1 (Single Data Source)

  • T1: Allstate Claim Prediction Challenge
  • T2: Predicting YouTube 3D Videos Trends
  • T3: Sampling Method for Sum Aggregation of

Point of Interests on Map

  • T4: Mining of Stack Overflow reviews for insights
  • T5: Measuring restaurant diversity index for

different cities

  • T6: GitHub – Sizing up online social networks
slide-14
SLIDE 14

14

Project 2 (Heterogeneous Data)

  • T1: Restaurants Location Recommendation
  • T2: Online learning performance vs Offline

Geographic Information

  • T3: Airbnb user behavior prediction
  • T4: Community detection in large networks
  • T5: Demand-Supply analysis on Regional

Restaurant Distribution

  • T6: Social Network Marketing through Influence

Prediction

  • Real application problems
  • Data collection/processing/management/mining/

evaluation/visualization/

slide-15
SLIDE 15

Logistics 15

Workload

v Focus more on critical thinking, problem

solving, “heads-on/hands-on” experiences!

v Understand, formulate and solve problems v Read and critique research papers v Two Course Projects v Oral presentation v Team Work, v Coding,

slide-16
SLIDE 16
  • Grading

– Projects (40%)

  • Project 1 (10%)
  • Project 2 (30%)

– Final reports in the discussion forum (by 11:59pm 12/13); – Self-and-peer evaluation form for project 2 (by 11:59PM 12/13);

– Written work (30%):

  • Critiques + Project reports (20%)
  • Quiz (10%, with 5% each)

– Oral work (30%):

  • Presentation

Workload and Grading

slide-17
SLIDE 17

Problems

fp fg t Nt ɵ Na v dv fr w Np α

Categories Regions Categories Categories Regions Features

A

X = R×U Z Time slots Regions Y Y = T×RT X

Yt-1 Fm(t-1)

t-1

Ft(t-1) Fh(t-1) Fm(t)

t

Ft(t) Fh(t) Fm(t+1)

t+1

Ft(t+1) Fh(t+1) Yt Yt-1 cx

ANN

w'11 w'qr w1 wr wpq w11 b1 bq b'r b'1

b''

Data Models and Algorithms Data Scientist

slide-18
SLIDE 18

18

Next Session: Final Project Presentation

v 12/15 R

v 22 min each team (including Q&A)

v Team 1 v Team 2 v Team 3 v Team 4 v Team 5 v Team 6 v Snacks and Drinks will be provided.

slide-19
SLIDE 19

19

Want to learn more? Future Opportunities.

slide-20
SLIDE 20

Spring 2017

  • DS595/CS525 Special Topics in DS/CS,
  • Urban Computing, applications and methodologies
slide-21
SLIDE 21

Urban Computing Research Group at WPI

  • Hub-and-Spoke Urban Transportation
slide-22
SLIDE 22

c1 c2 c3 c4 g1 g2 g3 Tr1 Tr2 Tr3 Tr4 Tr5 n1 n2 n3

(a) EV Charging Station Placement (b) Advertisement Placement (c) Observation Station Placement

c5 g1 u2 u1 u3 u4

Urban Computing Research Group at WPI

  • Most influential k-location Mining
slide-23
SLIDE 23

Urban Computing Research Group at WPI

  • Human-in-Loop Urban Computing
slide-24
SLIDE 24

24

Research opportunities are available in my group. Contact: yli15@wpi.edu website: http://wpi.edu/~yli15/ index.html