Nov 7, Douwe Osinga, @dosinga Smart Big Data What is Smart Travel - - PowerPoint PPT Presentation

nov 7 douwe osinga dosinga smart big data what is smart
SMART_READER_LITE
LIVE PREVIEW

Nov 7, Douwe Osinga, @dosinga Smart Big Data What is Smart Travel - - PowerPoint PPT Presentation

Nov 7, Douwe Osinga, @dosinga Smart Big Data What is Smart Travel Guides Algorithm based. Covering the entire world Suggestions Nearby Start with the web Put it back together Push it to the users Stu fg nearby Weather based Weather data


slide-1
SLIDE 1

Nov 7, Douwe Osinga, @dosinga

slide-2
SLIDE 2

Smart Big Data

slide-3
SLIDE 3

What is

slide-4
SLIDE 4

Smart Travel Guides

slide-5
SLIDE 5

Algorithm based.

slide-6
SLIDE 6

Covering the entire world

slide-7
SLIDE 7

Suggestions

slide-8
SLIDE 8

Nearby

slide-9
SLIDE 9

Start with the web

slide-10
SLIDE 10

Put it back together

slide-11
SLIDE 11

Push it to the users

slide-12
SLIDE 12

Stufg nearby

slide-13
SLIDE 13

Weather based

slide-14
SLIDE 14

Weather data

slide-15
SLIDE 15

Usage on a sunny day

slide-16
SLIDE 16

Pictures on a rainy day

slide-17
SLIDE 17

Weather suggestions

slide-18
SLIDE 18

Time based

slide-19
SLIDE 19

Users keep time

slide-20
SLIDE 20

Spider the web at large

slide-21
SLIDE 21

Opinion mining

slide-22
SLIDE 22

Time based

slide-23
SLIDE 23

Done!

slide-24
SLIDE 24

@dosinga

Thanks!

slide-25
SLIDE 25

Building data-intensive services (aka. immutability and idempotence)

@knutin GameAnalytics

slide-26
SLIDE 26

Instrument your game to send events on user action, such as log in, purchase, level up etc.

  • Analyse game performance

with UI.

  • Improve game.
slide-27
SLIDE 27

Collection
 API Log Stream
 analytics Funnels … SDK User

slide-28
SLIDE 28
slide-29
SLIDE 29
  • 15M devices daily
  • 3B events per day (35k per second)
  • 750 GB uncompressed
slide-30
SLIDE 30

Lesson 1

Store events in a log (immutability)

slide-31
SLIDE 31

Lesson 1

  • Log: immutable, write by appending
  • Split producer & consumers
  • High-availability write path (S3)

1 2 3 4

slide-32
SLIDE 32

Lesson 1

  • Log: immutable, write by appending
  • Split producer & consumers
  • High-availability write path (S3)

1 2 3 4 5

producer

slide-33
SLIDE 33

Lesson 1

  • Log: immutable, write by appending
  • Split producer & consumers
  • High-availability write path (S3)

1 2 3 4

consumer

5

producer

slide-34
SLIDE 34

Lesson 2

If you mess up, redo it (idempotency)

slide-35
SLIDE 35

Lesson 2

slide-36
SLIDE 36

Lesson 2

  • get_checkpoint() return “2014-10-01”
slide-37
SLIDE 37

Lesson 2

  • get_checkpoint() return “2014-10-01”
  • Process events from log offset “2014-10-01” to

log offset “2014-10-02”

slide-38
SLIDE 38

Lesson 2

  • get_checkpoint() return “2014-10-01”
  • Process events from log offset “2014-10-01” to

log offset “2014-10-02”

  • When all messages for 2014-10-01 are

processed, write to DB, overwrite any existing data (idempotence)

slide-39
SLIDE 39

Lesson 2

  • get_checkpoint() return “2014-10-01”
  • Process events from log offset “2014-10-01” to

log offset “2014-10-02”

  • When all messages for 2014-10-01 are

processed, write to DB, overwrite any existing data (idempotence)

  • set_checkpoint(“2014-10-02”)
slide-40
SLIDE 40

Where can I get one?

  • Apacha Samza!
  • Does everything we do and much more
  • Released after we went live … :/
slide-41
SLIDE 41

Thank you

slide-42
SLIDE 42

Q & A

slide-43
SLIDE 43

Building data-intensive services (aka. immutability and idempotence)

@knutin GameAnalytics

slide-44
SLIDE 44

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

slide-45
SLIDE 45

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Why you don’t want your realtime analytics to be exact

Mikio Braun, TU Berlin/streamdrill @mikiobraun GOTO Berlin, Nov 7, 2014

slide-46
SLIDE 46

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Analyzing User Interaction

Scale

slide-47
SLIDE 47

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

What can we do besides scaling? Approximate? But is that ok? Do we really want our analytics to be exact?

slide-48
SLIDE 48

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Why you don't want your real-time analytics to be exact

  • 1. Results are changing all the time anyway.
  • 2. You can't have exactness, real-time, and big

data at the same time (or it costs a lot).

  • 3. Exactness is often not necessary.
  • 4. You probably already have a batch system in

place.

slide-49
SLIDE 49

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Reason 2: You can't have exactness, real-time, and big data at the same time (or it costs a lot)

http://www.slideshare.net/acunu/realtime-analytics-with-casaandra Exactness

Real-Time

Big Data

slide-50
SLIDE 50

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Why you don't want your real-time analytics to be exact

  • 1. Results are changing all the time anyway.
  • 2. You can't have exactness, real-time, and big

data at the same time (or it costs a lot).

  • 3. Exactness is often not necessary.
  • 4. You probably already have a batch system in

place.

slide-51
SLIDE 51

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Reason 3: Exactness is often not necessary

slide-52
SLIDE 52

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Why you don't want your real-time analytics to be exact

  • 1. Results are changing all the time anyway.
  • 2. You can't have exactness, real-time, and big

data at the same time (or it costs a lot).

  • 3. Exactness is often not necessary.
  • 4. You probably already have a batch system in

place.

slide-53
SLIDE 53

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

streamdrill

  • Core Engine

– approximative

counting and trends

– rolling time windows

based on exponential decay

– secondary indices

  • Features

– true real-time, low latency (ms) – Dashboard & REST interface – about 20 events/sec, track 1M

  • bjects/1GB RAM
  • Applications

– real-time user profiling – recommendation – ...

slide-54
SLIDE 54

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Dashboard

slide-55
SLIDE 55

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Trend view

slide-56
SLIDE 56

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Real-time Recommendation at serienjunkies.de

slide-57
SLIDE 57

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Realtime User Profiles

slide-58
SLIDE 58

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Realtime User Profiles

slide-59
SLIDE 59

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Realtime User Profiles

slide-60
SLIDE 60

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact

Summary

  • real-time doesn't have to be exact
  • streamdrill: real-time analytics plattform
  • Contact us at info@streamdrill.com if you're

interested in

– real-time profiling – real-time recommendation – anything else real-time related!

slide-61
SLIDE 61

(c) 2014 streamdrill Mikio Braun Why real-time analytics don't have to be exact