Spaten : a Spatio-Temporal and Textual Big Data Generator Thaleia - - PowerPoint PPT Presentation

spaten a spatio temporal and textual big data generator
SMART_READER_LITE
LIVE PREVIEW

Spaten : a Spatio-Temporal and Textual Big Data Generator Thaleia - - PowerPoint PPT Presentation

Spaten : a Spatio-Temporal and Textual Big Data Generator Thaleia Dimitra Doudali* Ioannis Konstantinou Nectarios Koziris * Motivation 1. Geo-Social Networking Graph 2. Spatio-temporal and textual data 2 Motivation 3. Daily routes with


slide-1
SLIDE 1

Spaten: a Spatio-Temporal and Textual Big Data Generator

Thaleia Dimitra Doudali* Ioannis Konstantinou Nectarios Koziris *

slide-2
SLIDE 2

2

Motivation

1. Geo-Social Networking Graph

  • 2. Spatio-temporal and textual data
slide-3
SLIDE 3

Motivation

3

  • 3. Daily routes with check-ins

× millions of daily users = part of Big Geo-Social Data

slide-4
SLIDE 4

Big Spatial Data Engine

Motivation

4

New or extended Big Data Engines for Spatial data. Input dataset Performance Evaluation

  • OpenStreetMap (60 GB - real)
  • NASA (4.6 TB - real)
  • SYNTH (128 GB - synthetic)

Easy access to large spatial datasets. (real or synthetic)

Spatial Hadoop

slide-5
SLIDE 5

Problem Statement

5

Big Data Engine New or extended Big Data Engines for Geo-Social data. Input dataset Performance Evaluation

Type Real Synthetic Small ✔ ✔ Large ❌ ✔

Can we create realistic (real source, synthetic combination) Geo-social data at a large scale, for performance and scalability evaluations?

slide-6
SLIDE 6

Our Contributions

  • Build Spaten: a Spatio-Temporal and Textual Big Data Generator.

○ configurable, open source.

6

  • Show how we can store and query the generated data,

using state of the art NoSQL database systems.

  • Successfully create a large

realistic Geo-social dataset.

slide-7
SLIDE 7

Overview

7

Spaten

  • 1. Social network graph
  • 2. Points of Interest (POIs)
  • 3. Configuration Parameters

Input

Creates daily routes with check-ins of users to POIs Geo-Social network

Output

slide-8
SLIDE 8

Input Data

8

User User

POI

  • Latitude
  • Longitude
  • Name
  • Address
  • Review list

Review

  • Rating
  • Title
  • Text
  • 1. Social network graph
  • 2. Points of Interest (POIs)
slide-9
SLIDE 9

Data Generation Process - Example

Generates the day of a user who walks nearby his home or hotel and checks into POIs.

9

9am - ⅘ stars - “you should try the french toast with homemade jam, it’s so tasty!” 11.05am - 5 stars - “the cold brew was so refreshing!” 0.1 miles 3 min 0.8 miles 15 min 12.17am - 5 stars - “delicious food and excellent service”

The configuration parameters control:

  • how many daily routes?
  • when does the day start and end?
  • how many check-ins in a day?
  • how long will a check-in last?
  • how far can the user walk?
slide-10
SLIDE 10

Output Data

10

check-ins GPS traces Social network User User User Check-in

  • POI
  • Review
  • Time - Date

User GPS Trace

  • Latitude
  • Longitude
  • Time - Date
slide-11
SLIDE 11

Storage - Queries

11

Database News Feed: Show all friend check-ins in chronological order.

For a random user:

What are the most favorite places that his friends have visited? How many times have his friends been to their most favorite place?

Queries

Geo-Social Network

Indexed by “user”

slide-12
SLIDE 12

Concurrent Queries

Use Case

12

2 months 9 am - 11 pm ~5 check-ins / day ~2 hours / check-in <0.5 miles between TripAdvisor restaurants = 13 GB Twitter Graph = 14 GB Geo-Social Network 14 + 3 = 17 GB ~10,000 users (limited us of Google Maps API) HBase cluster 32 nodes

Spaten

slide-13
SLIDE 13

Summary

13

Geo-Social network

Code: https://github.com/Thaleia-DimitraDoudali/Spaten Dataset: http://research.cslab.ece.ntua.gr/datasets/ikons/Spaten/

Spaten

Big Data Engine

Performance Evaluation