Real-Time Geo #rtgeo Who am i? Giving a real-time geo talk at - - PowerPoint PPT Presentation

real time geo rtgeo who am i giving a real time geo talk
SMART_READER_LITE
LIVE PREVIEW

Real-Time Geo #rtgeo Who am i? Giving a real-time geo talk at - - PowerPoint PPT Presentation

Real-Time Geo #rtgeo Who am i? Giving a real-time geo talk at @where20. How do you build stuff? #rtgeo. h o n e e r f o r i P v i a T w i t t 1 9 A p r C e n t e r n t i o n C o n v e a C l a r a m


slide-1
SLIDE 1

Real-Time Geo #rtgeo

slide-2
SLIDE 2

Who am i?

slide-3
SLIDE 3

Giving a real-time geo talk at @where20. How do you build stuff? #rtgeo.

1 9 A p r v i a T w i t t e r f

  • r

i P h

  • n

e

f r

  • m

S a n t a C l a r a C

  • n

v e n t i

  • n

C e n t e r 5 1 G r e a t A m e r i c a P a r k w a y S a n t a C l a r a , C A 9 5 5 4 V i e w T w e e t s a t t h i s p l a c e

slide-4
SLIDE 4

Background

Wherehoo (2000) ⇢ “The Stuff Around You” ⇢ “Wherehoo Server: An interactive location service for software agents and

intelligent systems” - J.Youll, R.Krikorian

⇢ In your /etc/services file! BusRadio (2004) ⇢ Designed mobile computers to play media while also transmitting telemetry ⇢ Looked and sounded like a radio - but really a Linux computer OneHop (2007) ⇢ Bluetooth proximity-based social networking

[ ] r a f f i @ ~ / : c a t / e t c / s e r v i c e s | g r e p w h e r e h

  • w

h e r e h

  • 5

8 5 9 / u d p # W H E R E H O O w h e r e h

  • 5

8 5 9 / t c p # W H E R E H O O

slide-5
SLIDE 5

Background

Twitter

⇢Originally tech lead of API / Platform team ⇢Built the first geo-based infrastructure before acquisition of

Mixer Labs in December of 2009

⇢Now lead of the Application Services group ⇢Runs five teams focused on scalable infrastructure around

“core” data objects

⇢Tweets, users, timelines, places, etc. ⇢Delivery, authentication, APIs, etc.

slide-6
SLIDE 6
slide-7
SLIDE 7

Table of contents

Background ⇢ Why are we interested in this? Twitter’s geo APIs ⇢ How do we allow people to talk about place? ⇢ Context around “place” Problem statement ⇢ What do we want our system to do? Infrastructure ⇢ How is Twitter solving this problem?

slide-8
SLIDE 8

People want to talk about places

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

Twitter’s Geo APIs

What’s happening here?

slide-15
SLIDE 15

Original attempts

Adding it to the tweet ⇢Use myloc.me, et. al. to add text to the tweet ⇢Puts location “in band” ⇢Takes from the 140 characters Setting profile level locations ⇢Set the user/location of a Twitter user ⇢There’s an API for that! ⇢Not a per-tweet basis ⇢Not intended for high frequency alterations

slide-16
SLIDE 16
slide-17
SLIDE 17

[] raffi@~/: twurl -d location="San Francisco, California" \ http://twitter.com/account/update_location.xml <user> <id>8285392</id> <name>raffi</name> <screen_name>raffi</screen_name> <location>San Francisco, California</location> ... </user>

Profile level changes

slide-18
SLIDE 18

Geotagging API

slide-19
SLIDE 19

Geotagging API

Adding it to the tweet

⇢Per-tweet basis ⇢Out of band and pure metadata ⇢Does not take from the 140 characters

Native Twitter support

⇢Simple way to update status with location data ⇢Ability to remove geotags from your tweets en masse ⇢Using GeoRSS and GeoJSON as the encoding format ⇢Across all Twitter APIs (REST, Search, and Streaming)

slide-20
SLIDE 20

[] raffi@~/: twurl -d "status=hey-ho&lat=37.3&long=-121.9" \ http://api.twitter.com/1/status/update.xml <status> <text>hey-ho</text> ... <geo xmlns:georss="http://www.georss.org/georss> <georss:point>37.3 -121.9</georss:point> </geo> ... </user>

status/update

slide-21
SLIDE 21

[] raffi@~/: curl "http://search.twitter.com/search.atom? geocode=40.757929%2C-73.985506%2C25km&source=foursquare" ... <title>On the way to ace now, so whenever you can make it I'll be

  • there. (@ Port Imperial Ferry in Weehawken) http://4sq.com/

2rq0vO</title> ... <twitter:geo> <georss:point>40.7759 -74.0129</georss:point> </twitter:geo> ...

Search

geocode parameter takes “latitude,longitude,radius” where radius has units of mi or km

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26

geohose

slide-27
SLIDE 27

[] raffi@~/: curl "http://stream.twitter.com/1/statuses/filter.xml? locations=-74.5129,40.2759,-73.5019,41.2759"

location filtering

locations is a bounding box specified by “long1,lat1,long2,lat2” and can track up to 10 locations that are most 1 degree square (~60 miles square and enough to cover most metropolitan areas)

slide-28
SLIDE 28
slide-29
SLIDE 29

Trends API

slide-30
SLIDE 30
slide-31
SLIDE 31

Trends API

Global Trends

⇢Analysis of “hot conversations” ⇢Does not take from the 140 characters

Location specific trends

⇢Tweets being localized through a variety of means internally ⇢Locations exposed over the API as WOEIDs and Twitter IDs ⇢Can ask for available trends sorted by distnace

slide-32
SLIDE 32

[] raffi@~/: curl "http://api.twitter.com/1/trends/available.xml" <locations type=”array”> <location> <woeid>2487956</woeid> <name>San Francisco</name> <placeTypeName code=”7”>Town</placeTypeName> <country type=”Country” code=”US”>United States</country> <url>http://where.yahooapis.com/v1/place/2487956</url> </location> ... </locations>

available locations

C a n

  • p

t i

  • n

a l l y t a k e a l a t a n d l

  • n

g p a r a m e t e r t

  • h

a v e t r e n d s l

  • c

a t i

  • n

s r e t u r n e d , s

  • r

t e d , a s d i s t a n c e f r

  • m

y

  • u

.

slide-33
SLIDE 33

[] raffi@~/: curl "http://api.twitter.com/1/trends/2487956.xml" <matching_trends type=”array”> <trends as_of=”2009-12-15T20:19:09Z”> ... <trend url=”http://search.twitter.com/search?q=Golden+Globe +nominations” query=”Golden+Globe+nominations”>Golden Globe nominations</ trend> <trend url=”http://search.twitter.com/search?q=%23somethingaintright” query=”%23somethingaintright”>#somethingaintright</trend> ... </trends> </matching_trends>

a Local trend

L

  • k

u p a t r e n d a t a g i v e n W O E I D

slide-34
SLIDE 34

What’s in a name?

slide-35
SLIDE 35

A place is a name

5001 Great America Parkway, Santa Clara, CA 95054 Great America Parkway and Tasman Drive The Bay Area Santa Clara convention center Twitter ID 3b7dd0d93e661e18

slide-36
SLIDE 36

how do users what to share “where”?

slide-37
SLIDE 37

Sharing coordinates

More aptly named “geotagging” Good for sharing photos Possibly good for talking about a specific place (e.g. store, restaurant) People don’t understand numbers and without

slide-38
SLIDE 38

Sharing polygons

Privacy implications are potentially better If you thought sharing one pair

  • f numbers was bad...

Questions around polygon definition Still unable to visualize unless

  • n a map
slide-39
SLIDE 39

Sharing names

Has the potential to make a connection with users Distinguishes a “named place” from simply a “place” Inverse relationship between granularity and connection Rather large internationalization / context implications

slide-40
SLIDE 40

Geo-place API

slide-41
SLIDE 41

Geo-place API

Support for “names”

⇢Not just coordinates ⇢More contextually relevant ⇢Positive privacy benefits

Increased comlexity

⇢Need to be able to look up a list of places ⇢Requires a “reverse geocoder” ⇢Human driven tagging and not possible to be fully automatic

slide-42
SLIDE 42

[] raffi@~/: curl http://api.twitter.com/1/geo/search.json&lat=37.3&long=-121.9 ... "place_type":"neighborhood", "country_code":"US", "contained_within": [...] "full_name":"Willow Glen", "bounding_box": { "type":"Polygon", "coordinates": [[ [-121.92481908, 37.275903], [-121.88083608, 37.275903], [-121.88083608, 37.31548203], [-121.92481908, 37.31548203] ]] }, "name":"Willow Glen", "id":"46bc64ecd1da2a46", ...

Search

slide-43
SLIDE 43

[] raffi@~/: twurl -d "status=hey-ho&place_id=46bc64ecd1da2a46" \ http://api.twitter.com/1/status/update.xml

<status> <text>hey-ho</text> ... <place xmlns:georss="http://www.georss.org/georss> <id>46bc64ecd1da2a46</id> <name>Willow Glen</name> <full_name>Willow Glen</full_name> <place_type>neighborhood</place_type> <url>http://api.twitter.com/1/geo/id/46bc64ecd1da2a46.json</url> <country code=”US”>United States</country> </place> ... </user>

Tweeting with a place

slide-44
SLIDE 44
slide-45
SLIDE 45

Problem statement

What do we want our system to do?

slide-46
SLIDE 46

what do we need to build?

Database of places

⇢Given a real-world location, find places ⇢Spatial search

Method to store places with content

⇢Per user basis ⇢Per tweet basis

slide-47
SLIDE 47

spatial lookup and index

slide-48
SLIDE 48

as background... MySQL +

Ability to index points and do a spatial query

⇢For example, get points within a bounding rectangle ⇢SELECT MBRContains(GeomFromText(‘Polygon(0 0, 0

3, 3 3, 3 0, 0 0))’), coord) FROM geometry

Hard to cache the spatial query Possibly requires a DB hit on every query

slide-49
SLIDE 49
  • ptions

Grid / quad-tree ⇢ Create a grid (possibly nested) of the entire Earth Geohash ⇢ Arbitrarily precise and hierarhical spatial data reference Space filling curves ⇢ Mapping 2D space into 1D while preserving locality R-Tree ⇢ Spatial access data structure

slide-50
SLIDE 50

Grid / Quad-Tree

slide-51
SLIDE 51

Grid / Quad-Tree

Recursively subdivide regions Trie Structure to store “prefixes” Spatially oriented data structure

slide-52
SLIDE 52

Geohash

slide-53
SLIDE 53

geohash

37o18’N 121o54’W = 9q9k4 Hierarchical spatial data structure Precision encoded Distance captured

⇢Nearby places (usually) share the same prefix ⇢The longer the string match, the closer the places are

slide-54
SLIDE 54

Geohash

9q9k4 = 01001 / 10110 / 01001 / 10010 / 00100 Longitude bits = 0010100101010 ⇢ -90.0 (0), -135.0 (0), -112.5 (1), -123.75 (0), -118.125 (1), -120.9375 (0),

  • 122.34375 (0), -121.640625 (1), -121.9921875 (0), -121.81640625 (1),
  • 121.904296875 (0), -121.8603515625(1), -121.88232421875 (0) =

121o53’W

Latitude bits = 1011010100000 ⇢ 45.0 (1), 22.5 (0), 33.75 (1), 39.375 (1), 36.5625 (0), 37.96875 (1),

37.265625 (0), 37.617185 (1), 37.4414025 (0), 37.35351125 (0), 37.309565625 (0), 37.287692813 (0) = 37o17’N

slide-55
SLIDE 55

Geohash

Possible to do range query in database

⇢Matching based on prefix will return all the points that fit in

the “grid”

⇢Able to store 2D data in a 1D space

slide-56
SLIDE 56

Space filling curve

slide-57
SLIDE 57

Space filling curve

Generalization of geohash

⇢2D to 1D mapping ⇢Nearness is captured

Recurisvely can fill up space depending on resolution required

slide-58
SLIDE 58

R-Tree

slide-59
SLIDE 59

R-Tree

Height-balanced tree data structure for spatial data Users hierarchically nested bounding boxes nearby elements are

slide-60
SLIDE 60

Representations

slide-61
SLIDE 61

GeoRSS / GeoJSON

http://www.georss.org/ & http://geojson.org/

<georss:point>37.3 -121.9</georss:point> { “type”:”Point”, “coordinates”:[-121.9, 37.3] }

slide-62
SLIDE 62

How do you store precision?

“Precision” is a hard thing to encode Accuracy can be encoded with an error radius Twitter opts for tracking the number of decimals passed

⇢140.0 != 140.00 ⇢DecimalTrackingFloat

slide-63
SLIDE 63
slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66

Ruby on Rails-ish frontend Scala-based services backend MySQL and soon to be Cassandra as the store RPC to back-end or put items into queues

Twitter infrastructure

slide-67
SLIDE 67
slide-68
SLIDE 68

Simplified architecture

R-Tree for spatial lookup

⇢Data provider for front-end lookups ⇢Store place object with envelope of place in R-Tree

Mapping from ID to place object

slide-69
SLIDE 69

Java Toplogy Suite (JTS)

http://www.vividsolutions.com/jts/jtshome.htm Open source Good for representing and manipulating “geometries” Has support for fundamental geometric operations ⇢ contains ⇢ envelope Has a R-Tree implementation

slide-70
SLIDE 70

pointInside in polygon? true pointOutside in polygon? false

slide-71
SLIDE 71

at (0.0, 0.0)

  • - region 1

at (1.0, 1.0)

  • - region 1
  • - region 2

at (2.0, 2.0)

  • - region 1
  • - region 2

at (3.0, 3.0)

  • - region 2

at (4.0, 4.0)

  • - empty
slide-72
SLIDE 72

Java Topology Suite (JTS)

Serializers and deserializers

⇢Well-known text (WKT) ⇢Well-known binary (WKB) ⇢No GeoRSS or GeoJSON support

slide-73
SLIDE 73

interface / RPC

RockDove is a backend service

⇢Data provider for front-end lookups ⇢Uses some form of RPC (Thrift, Avro, etc.) to communicate

with

⇢Data could be cached on frontend to prevent lookups

Simple RPC interface

⇢get(id) ⇢containedWithin(lat, long)

slide-74
SLIDE 74
slide-75
SLIDE 75

Interface / RPC

Watch those RPC queues! Fail fast and potentially throw “over capacity” messages

⇢get(id) throws OverCapacity ⇢containedWithin(lat, long) throws OverCapacity

Distinguish between write path and read path

slide-76
SLIDE 76

georuby

http://georuby.rubyforge.org/ Open source OpenGIS Simple Features Interface Standard Only good for representing geometric entities

GeoRuby::SimpleFeatures::Geometry::from_ewkb

slide-77
SLIDE 77
slide-78
SLIDE 78

“front-end”

slide-79
SLIDE 79

where do you acutally get location from?

slide-80
SLIDE 80

Triangulation: Cellular

200m to 1km accuracy Measuring signal strength to cell towers with known locations If can only see one cellular tower, then fallback to cellular tower identification - better than nothing, but really inaccurate

slide-81
SLIDE 81

Triangulation: Wifi

Sub 20m accuracy Works indoors and in urban areas Doesn’t need dedicated hardware just a 802.11 radio Relatively quick time to get a position

slide-82
SLIDE 82

Triangulation: GPS

Sub 1m accuracy Need dedicated GPS hardware Prone to multi-path confusion especially in cities Needs line of sight to the sky Doesn’t work well indoors

slide-83
SLIDE 83

Association

IP address to geographical mapping All done on the server side Maybe “good” for city level ⇢ Maxmind has 83% at 40km ⇢ Very error prone ⇢ Gets wonky when dealing with cellular

connections or rather large ISPs

Database needs to be refreshed fairly frequently

slide-84
SLIDE 84

Extraction

Read the text and understand intent Hard to understand whether talking from a place, or about a place Running text through a geocoder (Google, Yahoo, Geocoder.us) Parsing structured URLs and then crawling “place pages”

slide-85
SLIDE 85

location in browser

Geolocation API Specification for JavaScript navigator.geolocation.getCurrentPosition Does a callback with a position object position.coords has ⇢ latitude and longitude ⇢ accuracy ⇢ other stuff Support in Firefox 3.5, Chrome 5, Opera 10.6, and others with Google Gears

slide-86
SLIDE 86
slide-87
SLIDE 87

Questions?

Follow me at twitter.com/raffi