spaten a spatio temporal and textual big data generator
play

Spaten : a Spatio-Temporal and Textual Big Data Generator Thaleia - PowerPoint PPT Presentation

Spaten : a Spatio-Temporal and Textual Big Data Generator Thaleia Dimitra Doudali* Ioannis Konstantinou Nectarios Koziris * Motivation 1. Geo-Social Networking Graph 2. Spatio-temporal and textual data 2 Motivation 3. Daily routes with


  1. Spaten : a Spatio-Temporal and Textual Big Data Generator Thaleia Dimitra Doudali* Ioannis Konstantinou Nectarios Koziris *

  2. Motivation 1. Geo-Social Networking Graph 2. Spatio-temporal and textual data 2

  3. Motivation 3. Daily routes with check-ins × millions of daily users = part of Big Geo-Social Data 3

  4. Motivation New or extended Big Data Engines for Spatial data. Big Spatial Data Engine Input Performance dataset Evaluation Spatial Hadoop Easy access to large ● OpenStreetMap (60 GB - real) spatial datasets. ● NASA (4.6 TB - real) (real or synthetic) ● SYNTH (128 GB - synthetic) 4

  5. Problem Statement New or extended Big Data Engines for Geo-Social data. Big Data Engine Input Performance dataset Evaluation Can we create realistic (real source, Type Real Synthetic synthetic combination) Geo-social data ✔ ✔ Small at a large scale, for performance and scalability evaluations? ❌ ✔ Large 5

  6. Our Contributions ● Build Spaten : a Spa tio- Te mporal and Textual Big Data Ge n erator. ○ configurable, open source. ● Successfully create a large realistic Geo-social dataset. ● Show how we can store and query the generated data, using state of the art NoSQL database systems. 6

  7. Overview 1. Social network graph Spaten Input Output Creates daily routes with check-ins of users to POIs 2. Points of Interest (POIs) Geo-Social network 3. Configuration Parameters 7

  8. Input Data User User 1. Social network graph POI ● Review Latitude ● ● Longitude Rating ● ● Name Title ● ● Address Text ● Review list 2. Points of Interest (POIs) 8

  9. Data Generation Process - Example Generates the day of a user who walks nearby his home or hotel and checks into POIs. 0.1 miles 9am - ⅘ stars - “you 3 min should try the french toast with homemade jam, it’s so tasty!” 11.05am - 5 stars - “the cold brew was so refreshing!” The configuration parameters control: ● 0.8 miles how many daily routes? 15 min ● when does the day start and end? ● how many check-ins in a day? ● how long will a check-in last? 12.17am - 5 stars - “delicious food and ● how far can the user walk? excellent service” 9

  10. Output Data User User Social network Check-in ● POI User ● Review ● Time - Date check-ins GPS Trace ● Latitude User ● Longitude ● Time - Date GPS traces 10

  11. Storage - Queries For a random user: News Feed: Show all friend check-ins in chronological order. Queries What are the most favorite places Indexed by “user” that his friends have visited? How many times have his friends Geo-Social Network Database been to their most favorite place? 11

  12. Use Case Twitter Graph = 14 GB Spaten HBase cluster TripAdvisor restaurants = 13 GB 32 nodes Geo-Social Network 14 + 3 = 17 GB 2 months ~10,000 users 9 am - 11 pm Concurrent (limited us of Google Maps API) ~5 check-ins / day Queries ~2 hours / check-in <0.5 miles between 12

  13. Summary Code: https://github.com/Thaleia-DimitraDoudali/Spaten Dataset: http://research.cslab.ece.ntua.gr/datasets/ikons/Spaten/ Big Data Spaten Engine Geo-Social network Performance Evaluation 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend