patterns in a large

patterns in a large urban center ICIST 2019 - Paulo Figueiras - PowerPoint PPT Presentation

Big Data Analytics for extracting mobility patterns in a large urban center ICIST 2019 - Paulo Figueiras (paf@uninova.pt) Summary Motivation UNINOVA Big Data Architecture Processing & Analytics Performance Visualization


  1. Big Data Analytics for extracting mobility patterns in a large urban center ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  2. Summary  Motivation  UNINOVA Big Data Architecture  Processing & Analytics  Performance  Visualization  Conclusions ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  3. Motivation Public Transportation in Lisbon, Portugal  Independent public/private operators  One association (OTLIS) handles data coming from all operators   Ticket validations  Stations/stops locations and information  Users  Data sharing between operators is a challenge  Legal/business advantage issues  Privacy concerns Analytics performed with traditional techniques   Data gathered through questionnaires and human observations  Difficulty to get meaningful insights with traditional DW approaches ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  4. Research question  Which technologies can be used in order to provide useful insights about mobility patterns in large urban centers, considering large volumes of ticketing data from different operators? ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  5. UNINOVA Big Data Architecture ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  6. Processing & Analytics  Clean original data  Duplicates  Erroneous validations (e.g. consecutive entry validations on the same station)  Validations without location information  Consecutive entry and/or exit validations with less than 5 minutes between  Harmonize original data into three distinct formats: Validations, Users, Locations  Provide semantics via GTFS mappings of locations and routes  Create new knowledge/insights from collected data (about connections, transhipments and pendular movements) ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  7. Processing & Analytics ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  8. Performance  Test:  One month of data (May 2018): +55 million records  Before:  Oracle Cloud with traditional DW processes  Only pre-processing and visualization  Time span: Some days – one week  With UNINOVA Big Data Architecture:  Single node (AMD Ryzen 5 1600 - 12CPU’s, 32GB RAM (Corsair Vengeance LPX), SSD 120GB + 1TB HDD)  Pre-processing + analytics  Time span: 4hours (Reading/writing to MongoDB on each stage, no indexes) ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  9. Visualization ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

  10. Conclusions  Novel Big Data architecture for efficiently perform processing and analytics on public transportation data  The architecture spans the whole life cycle of Big Data  Development of an unsupervised approach to collect and process data, and to produce meaningful insights  Comparing with traditional DW processes, the architecture enables much better performances, even on a single machine  Less costs (with dedicated Cloud services)  Better knowledge and insights  Possibility to have an effient in-house solution ICIST 2019 - Paulo Figueiras (paf@uninova.pt)

Recommend


More recommend