Architecture of the Triposo travel guide Douwe Osinga (@dosinga) - - PowerPoint PPT Presentation

architecture of the triposo travel guide
SMART_READER_LITE
LIVE PREVIEW

Architecture of the Triposo travel guide Douwe Osinga (@dosinga) - - PowerPoint PPT Presentation

Architecture of the Triposo travel guide Douwe Osinga (@dosinga) Thursday, 17 October 13 The Company Thursday, 17 October 13 Founded by Ex Googlers Thursday, 17 October 13 In Sydney Thursday, 17 October 13 Financed in California Thursday,


slide-1
SLIDE 1

Architecture of the Triposo travel guide

Douwe Osinga (@dosinga)

Thursday, 17 October 13
slide-2
SLIDE 2

The Company

Thursday, 17 October 13
slide-3
SLIDE 3

Founded by Ex Googlers

Thursday, 17 October 13
slide-4
SLIDE 4

In Sydney

Thursday, 17 October 13
slide-5
SLIDE 5

Financed in California

Thursday, 17 October 13
slide-6
SLIDE 6

Headquartered in Berlin

Thursday, 17 October 13
slide-7
SLIDE 7

Distributed Team

Thursday, 17 October 13
slide-8
SLIDE 8

Quarterly Jamborees

Thursday, 17 October 13
slide-9
SLIDE 9

Last one on a bus in Spain

Thursday, 17 October 13
slide-10
SLIDE 10

Algorithms are King

Thursday, 17 October 13
slide-11
SLIDE 11

No Human Editing in our Guides

Thursday, 17 October 13
slide-12
SLIDE 12

The Product

Thursday, 17 October 13
slide-13
SLIDE 13

Our Mission:

Build the best travel guide

Thursday, 17 October 13
slide-14
SLIDE 14

Mobile

Thursday, 17 October 13
slide-15
SLIDE 15

Works offline

Thursday, 17 October 13
slide-16
SLIDE 16

Covers the World

Thursday, 17 October 13
slide-17
SLIDE 17

Smart

Thursday, 17 October 13
slide-18
SLIDE 18

25,000 destinations

around the world

500,000

points of interest

~5,000,000

downloads

Thursday, 17 October 13
slide-19
SLIDE 19

The Platform

Thursday, 17 October 13
slide-20
SLIDE 20

Big data

Thursday, 17 October 13
slide-21
SLIDE 21

Companies with Big data

Thursday, 17 October 13
slide-22
SLIDE 22

Do you have big data?

Thursday, 17 October 13
slide-23
SLIDE 23

We didn’t.

Thursday, 17 October 13
slide-24
SLIDE 24

Nice, but pricey

Thursday, 17 October 13
slide-25
SLIDE 25

Our current server room

Thursday, 17 October 13
slide-26
SLIDE 26

Coders are more expensive than Python code is slow

Thursday, 17 October 13
slide-27
SLIDE 27

Our System Architecture

Thursday, 17 October 13
slide-28
SLIDE 28

Building A database of the world

Thursday, 17 October 13
slide-29
SLIDE 29

Wikipedia OSM Wikitravel

Dropbox

Berlin

Uluru Potala Beansie

Google Spreadsheet

S3

Buildguide Snapshot Pipeline Crawlers

The Flow

Thursday, 17 October 13
slide-30
SLIDE 30

Crawl 20 Sources

Thursday, 17 October 13
slide-31
SLIDE 31

Split everything in Pois & Locs

Thursday, 17 October 13
slide-32
SLIDE 32

Put every thing back together

Thursday, 17 October 13
slide-33
SLIDE 33

When are two things the same?

Thursday, 17 October 13
slide-34
SLIDE 34

When they are: Similar in location and name

Thursday, 17 October 13
slide-35
SLIDE 35

Wikpedia Wikitravel OSM

Geohashing

Thursday, 17 October 13
slide-36
SLIDE 36

Suroit Camping vs Camping le Suroite

Thursday, 17 October 13
slide-37
SLIDE 37

Shingling! Suroit Camping vs Camping le Suroite

Thursday, 17 October 13
slide-38
SLIDE 38

ampi, camp, itca, mpin, oitc, ping, roit, suro, tcam, uroi

vs

ampi, camp, esur, gles, ingl, lesu, mpin, ngle, oite, ping, roit, suro, uroi

60% Overlap

Thursday, 17 October 13
slide-39
SLIDE 39

John’s Bar and Grill vs Restaurant John

Thursday, 17 October 13
slide-40
SLIDE 40

Stop Shingles!

Thursday, 17 October 13
slide-41
SLIDE 41

Haarlem Library vs Haarlem City Hall

Thursday, 17 October 13
slide-42
SLIDE 42

Location Stop Shingles

Thursday, 17 October 13
slide-43
SLIDE 43

Van Gogh Museum vs Van Gogh Hotel

Thursday, 17 October 13
slide-44
SLIDE 44

Types from names

Thursday, 17 October 13
slide-45
SLIDE 45

Cafe Sydney Opera House vs Sydney Opera House

Thursday, 17 October 13
slide-46
SLIDE 46

Hmm

Thursday, 17 October 13
slide-47
SLIDE 47

Side effect: Cuisine guesser

Thursday, 17 October 13
slide-48
SLIDE 48

Making data useful

Thursday, 17 October 13
slide-49
SLIDE 49

Some further processing...

Thursday, 17 October 13
slide-50
SLIDE 50

Learning from pictures

Thursday, 17 October 13
slide-51
SLIDE 51

Learning from Wikipedia

Thursday, 17 October 13
slide-52
SLIDE 52

Wikipedia article distribution

Thursday, 17 October 13
slide-53
SLIDE 53

Learning from ratings?

Thursday, 17 October 13
slide-54
SLIDE 54

No Soup for you

Thursday, 17 October 13
slide-55
SLIDE 55

Opinion Mining

Thursday, 17 October 13
slide-56
SLIDE 56

Natural Language Processing

Thursday, 17 October 13
slide-57
SLIDE 57

If you count RegExps...

Thursday, 17 October 13
slide-58
SLIDE 58

http://labs.triposo.com

Further experiments

Thursday, 17 October 13
slide-59
SLIDE 59

Can I suggest something?

Thursday, 17 October 13
slide-60
SLIDE 60

The Weather

Thursday, 17 October 13
slide-61
SLIDE 61

Your location

Text

Thursday, 17 October 13
slide-62
SLIDE 62

Time

Thursday, 17 October 13
slide-63
SLIDE 63

Personalization

Thursday, 17 October 13
slide-64
SLIDE 64

Personalization

Thursday, 17 October 13
slide-65
SLIDE 65

Personalization

Thursday, 17 October 13
slide-66
SLIDE 66

Conclusion

Thursday, 17 October 13
slide-67
SLIDE 67

Conclusion If you can’t solve a problem in 50 lines of python running on a server in the kitchen, it must be really hard.

Thursday, 17 October 13
slide-68
SLIDE 68

Thanks!

Douwe Osinga (@dosinga)

Thursday, 17 October 13