ds504 cs586 big data analytics introduction logistics
play

DS504/CS586: Big Data Analytics --Introduction & Logistics - PowerPoint PPT Presentation

Welcome to DS504/CS586: Big Data Analytics --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm 8:50pm THURSDAY Location: KH 116 Fall 2017 Who am I? Yanhua Li , PhD Assistant Professor Computer Science & Data Science PhD,


  1. Welcome to DS504/CS586: Big Data Analytics --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm –8:50pm THURSDAY Location: KH 116 Fall 2017

  2. Who am I? Yanhua Li , PhD Assistant Professor Computer Science & Data Science PhD, Computer Science, U of Minnesota, 2013 PhD, Electrical Engineering, BUPT, 2009 Research Interests: Big data analytics, Smart Cities, Measurement, Spatio-temporal Data Mining Industrial Experience: Bell-Labs, Microsoft Research, HUAWEI research Labs

  3. What is DS504/CS586 about? v A second Level DS/CS course (primarily) for graduates v CS/DS Ph.D students in big data analytics and related areas; v then other Ph.D students or MS students with v Experience in databases and/or in data mining, or equivalent knowledge. v Sufficient programming experience is expected so that you are comfortable to undertake a course project. 3

  4. Introduction What is “Big Data”? 4

  5. Big Data – What is it? • A “big” buzzword … • No single standard definition… • Talk to 1000 people, there will be 1000 “definitions” … “ Big Data ” is data whose scale, diversity, complexity, and/or quality require new architectures, techniques, algorithms, analytics, and interfaces to manage it and extract value and hidden knowledge from it…

  6. Why Now? Big Data and Big Challenges

  7. Big Data • Volume • Variety • Velocity • Veracity

  8. Big Data • Volume • Variety • Velocity Thanks: http://www-01.ibm.com/software/data/bigdata/images/4-Vs

  9. Big Data • Volume • Variety • Velocity Thanks: http://www-01.ibm.com/software/data/bigdata/images/4- Vs-of-big-data.jpg

  10. Big Data • Volume • Variety • Velocity Thanks: http://www-01.ibm.com/software/data/bigdata/images/4-Vs data.jpg

  11. Big Data Thanks: http://www- 01.ibm.com/software/data/bigdata/i • Volume mages/4-Vs-of-big-data.jp • Variety • Velocity Thanks: http://www- 01.ibm.com/software/data/bigdata/images/4-Vs-of-big-data.jpg

  12. 4Vs 12

  13. The Model Has Changed… Old Model of Generating/Consuming Data has Changed Old Model: Few privileged companies are generating and “owning” data, all others are consuming data (in controlled packages)

  14. The Model Has Changed… • New Model of Generating/Consuming Data has Changed Producers : • Everyone - Man, Woman and Child, and Devices Consumers: • Professionals • Businesses 14 • Scientists • And us • Everyone wants a piece of this pie …

  15. What Sectors Can Benefit? • Businesses • Transportation • Science & Engineering • Governments • Energy • Healthcare • Education • Entertainment Utilize data to improve people’s life quality

  16. Big Data Analytics techniques and tools for managing, analyzing and extracting knowledge from “big data” 16

  17. Roadmap 1. Intro of Big Data Analytics 1. 5 minutes break 2. Logistics 1. 10 minutes break, talk to other students 3. Application stories Self-intro (and group forming Hand in your survey Email you for permission or not You will need to find your team and let me know

  18. Done with the high level introduction Begin with application stories

  19. Big Challenges in Big Cities

  20. Big Data in Cities

  21. Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce The Environment Air Pollution, ... Win Urban Data Analytics Data Mining, Machine Learning, Visualization Urban Computing Urban Data Management People Win Win Cities OS Spatio-temporal index, streaming, trajectory, and graph data management,... Human Meteorolo Social Road Air Energy Networks POIs Traffic mobility gy Quality Media Tackle the Big Urban Sensing & Data Acquisition challenges Participatory Sensing, Crowd Sensing, Mobile Sensing in Big cities using Big data! Urban Computing: concepts, methodologies, and applications . Zheng, Y., et al. ACM transactions on Intelligent Systems and Technology .

  22. Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce Air Pollution, ... Urban Data Analytics Data Mining, Machine Learning, Visualization Urban Data Management Spatio-temporal index, streaming, trajectory, and graph data management,... • Data sparsity and missing • Skewed sample distribution Human Air Meteorolo Social Road Energy Networks POIs • Traffic Limited resources mobility gy Quality Media Urban Sensing & Data Acquisition The Environment Participatory Sensing, Crowd Sensing, Mobile Sensing Win Urban Computing People Win Win Cities OS Zheng, Y., et al. Urban Computing: concepts, methodologies, and applications. ACM transactions on Intelligent Systems and Technology .

  23. Urban Sensing A sample of data à An entire dataset • Data sparsity and missing • Biased distribution S1 S2 S6 S7 S6 S8 S12 S14 S13 S21 S19 S15 S22 S20 S9 S16 S10 S11 S4 S3 S18 S16 Taxi flow Entire traffic flow S5 S17 Air quality monitoring stations Inferring Gas Consumption and Pollution Zheng, Y., et al. U-Air: when urban air quality Emission of Vehicles throughout a City. KDD inference meets big data. KDD 2013 2014.

  24. Urban Sensing A limited resource (budget, labors, land…) • Static sensing: Where to • Crowdsensing: How to arrange deploy sensor to maximize the the incentives dynamically? gain? S1 S2 S6 S7 S6 S8 S12 S14 S13 S21 S19 S15 S22 S20 S9 S16 S10 S11 S4 S3 S18 S16 S5 S17 Suggesting locations for monitoring stations, KDD 2015

  25. Improving Medical Emergency Services using Big Data Dispatching Center Ambulance stations Patients Save 30+% time! Hospital • Select locations for Ambulance Stations • Dynamic ambulance allocation Yilun Wang, Yu Zheng , et al. Travel Time Estimation of a Path using Sparse Trajectories.. KDD 2014 Location Selection for Ambulance Stations: A Data-Driven Approach, ACM SIGSPATIAL 2015

  26. Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce Air Pollution, ... Urban Data Analytics Data Mining, Machine Learning, Visualization Urban Data Management • Management in spatio-temporal spaces Spatio-temporal index, streaming, trajectory, and graph data management,... • Multi-modality data • Dynamic, high velocity and volume Human Air Meteorolo Social Road Energy Networks POIs Traffic mobility gy Quality Media Urban Sensing & Data Acquisition The Environment Participatory Sensing, Crowd Sensing, Mobile Sensing Win Urban Computing People Win Win Cities OS Zheng, Y., et al. Urban Computing: concepts, methodologies, and applications. ACM transactions on Intelligent Systems and Technology .

  27. Urban Data Management • Managing multi-modality data • Dynamic and big volume – Categorical and numeric data – Group query strategy – Different scales, densities, – Computing in parallel updating frequency, and ST properties Spatio-temporal Spatial Static Spatio-Temporal Static Data Temporal Dynamic Data Dynamic Data Point-Based POI Distributions Spatial-temporal Weather/AQI Station Data Crowd Souring Data Network-Based Road/Transportation Road Traffic Data Trajectory Data Networks Yu Zheng . Trajectory Data Mining: An Overview. ACM Transactions on Intelligent Systems and Technology ( ACM TIST ). 2015

  28. Service Providing • Texts and images à Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce spatial and spatio-temporal data; Air Pollution, ... • A single data source à Data cross different domains • Separate data mining algorithms à Urban Data Analytics machine learning + data management Data Mining, Machine Learning, Visualization Urban Data Management Spatio-temporal index, streaming, trajectory, and graph data management,... Human Air Meteorolo Social Road Energy Networks POIs Traffic mobility gy Quality Media Urban Sensing & Data Acquisition The Environment Participatory Sensing, Crowd Sensing, Mobile Sensing Win Urban Computing People Win Win Cities OS Zheng, Y., et al. Urban Computing: concepts, methodologies, and applications. ACM transactions on Intelligent Systems and Technology .

  29. Data Integration vs Knowledge Fusion Schema Mapping Dataset A Schema Duplicate Domain S Data Merge Object Mapping Detection Dataset B Schema Mapping Dataset C A) Paradigm of the conventional data fusion Knowledge Knowledge Domain A Extraction Dataset A Knowledge Knowledge Latent Knowledge Domain B Fusion Object Extraction Dataset B Knowledge Domain C Knowledge Extraction Dataset C Cross-Domain Data Fusion Yu Zheng . Methodologies for Cross-Domain Data Fusion: An Overview. IEEE Transactions on Big Data, 1, 1, 2015.

  30. Multi-View-Based Learning

  31. Urban Computing for Urban Planning Best Paper Nominee Award at UbiComp 2011 The Most Cited Paper

  32. City-Wide Traffic Modeling Partition a city into regions with major roads Regions are root causes of the problem Yu Zheng , et al. Urban Computing with Taxicabs, In Proc. Of UbiComp 2011

  33. Shanghai Big Data Hotpot Restaurant

  34. When Urban Air Meets Big Data KDD 2013 http://urbanair.msra.cn/

  35. Air Pollution: A Global Concern ! Air quality monitor station PM2.5, PM10, NO 2 , SO 2 , CO, O 3 S1 50kmx40km S2 S6 S7 S6 S8 S12 S14 S13 S21 S19 S15 S22 S20 S9 S16 S10 S11 S4 S3 S18 S16 S5 S17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend