Building a relevance platform with Couchbase and Elasticsearch - - PowerPoint PPT Presentation

building a relevance platform with couchbase and
SMART_READER_LITE
LIVE PREVIEW

Building a relevance platform with Couchbase and Elasticsearch - - PowerPoint PPT Presentation

OneHippo @ Goto Building a relevance platform with Couchbase and Elasticsearch @jreijn | Hippo #gotoams, June 18 follow the Hippo trail dinsdag 18 juni 13 OneHippo @ Goto About me Architect @ Hippo DevOps guy Blogger @


slide-1
SLIDE 1

OneHippo @ Goto follow the Hippo trail

Building a relevance platform with Couchbase and Elasticsearch

@jreijn | Hippo #gotoams, June 18

dinsdag 18 juni 13

slide-2
SLIDE 2

follow the Hippo trail OneHippo @ Goto

About me

  • Architect @ Hippo
  • DevOps guy
  • Blogger @ http://blog.jeroenreijn.com

dinsdag 18 juni 13

slide-3
SLIDE 3

follow the Hippo trail OneHippo @ Goto

About Hippo

dinsdag 18 juni 13

slide-4
SLIDE 4

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Relevance?

dinsdag 18 juni 13

slide-5
SLIDE 5

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

“The capability of a search engine or function to retrieve data appropriate to a user's needs.”

http://www.thefreedictionary.com/relevance

dinsdag 18 juni 13

slide-6
SLIDE 6

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto dinsdag 18 juni 13

slide-7
SLIDE 7

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

How we deliver relevant content @Hippo

dinsdag 18 juni 13

slide-8
SLIDE 8

follow the Hippo trail OneHippo @ Goto

Registration

Visitor - entity making HTTP requests Collector - records data about a visitor or his behavior Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam

dinsdag 18 juni 13

slide-9
SLIDE 9

follow the Hippo trail OneHippo @ Goto

Matching

Characteristic - a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"

dinsdag 18 juni 13

slide-10
SLIDE 10

follow the Hippo trail OneHippo @ Goto

What do we store?

Request log Targeting data Statistics

Averages, e.g. how many visitors became which persona

dinsdag 18 juni 13

slide-11
SLIDE 11

follow the Hippo trail OneHippo @ Goto

Real-time analysis

dinsdag 18 juni 13

slide-12
SLIDE 12

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Architecture

dinsdag 18 juni 13

slide-13
SLIDE 13

follow the Hippo trail OneHippo @ Goto

RDBMS

Hippo Delivery Tier Hippo Repository App server XML JSON (X)HTML

dinsdag 18 juni 13

slide-14
SLIDE 14

follow the Hippo trail OneHippo @ Goto

Delivery Tier

URL Matching Fetch content Compose output

Request Response Request

dinsdag 18 juni 13

slide-15
SLIDE 15

follow the Hippo trail OneHippo @ Goto

Delivery Tier URL Matching Targeting Data Collection Compose output

Request Response Request

Fetch content Scoring

dinsdag 18 juni 13

slide-16
SLIDE 16

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Scaling

dinsdag 18 juni 13

slide-17
SLIDE 17

follow the Hippo trail OneHippo @ Goto

RDBMS

Hippo Delivery Tier Hippo Repository

App server

Hippo Delivery Tier Hippo Repository

App server

Scaling out

dinsdag 18 juni 13

slide-18
SLIDE 18

follow the Hippo trail OneHippo @ Goto

RDBMS

Delivery Tier Repository

App server

Delivery Tier Repository

App server

Scaling out

Targeting Datastore

dinsdag 18 juni 13

slide-19
SLIDE 19

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

What kind of ‘storage’?

dinsdag 18 juni 13

slide-20
SLIDE 20

follow the Hippo trail OneHippo @ Goto

Distributed Cache?

dinsdag 18 juni 13

slide-21
SLIDE 21

follow the Hippo trail OneHippo @ Goto

We have a winner!

dinsdag 18 juni 13

slide-22
SLIDE 22

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Requirements change!

dinsdag 18 juni 13

slide-23
SLIDE 23

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

NoSQL to the rescue

dinsdag 18 juni 13

slide-24
SLIDE 24

follow the Hippo trail OneHippo @ Goto

Suitable types

  • Key-value store
  • Document database

dinsdag 18 juni 13

slide-25
SLIDE 25

follow the Hippo trail OneHippo @ Goto

Assessment Criteria

Maturity Data model Consistency model Performance Replication Caching model Query model Monitoring Scalability Reliability Support

dinsdag 18 juni 13

slide-26
SLIDE 26

follow the Hippo trail OneHippo @ Goto

Selection Criteria

  • Performance!
  • Scalability
  • Schema flexibility
  • Simplicity
  • Monitoring
  • Support

dinsdag 18 juni 13

slide-27
SLIDE 27

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Performance !!

dinsdag 18 juni 13

slide-28
SLIDE 28

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Scalability

dinsdag 18 juni 13

slide-29
SLIDE 29

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Schema flexibility

dinsdag 18 juni 13

slide-30
SLIDE 30

follow the Hippo trail OneHippo @ Goto

{ "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": [] }

Request log document

dinsdag 18 juni 13

slide-31
SLIDE 31

follow the Hippo trail OneHippo @ Goto

{ "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" } }

Visitor document

dinsdag 18 juni 13

slide-32
SLIDE 32

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Simplicity

dinsdag 18 juni 13

slide-33
SLIDE 33

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Monitoring

dinsdag 18 juni 13

slide-34
SLIDE 34

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Support

dinsdag 18 juni 13

slide-35
SLIDE 35

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Couchbase

dinsdag 18 juni 13

slide-36
SLIDE 36

follow the Hippo trail OneHippo @ Goto

Why Couchbase?

  • Drop-in replacement for memcached
  • Read/Write-through cache
  • High throughput
  • Easy scalability
  • Schema flexibility
  • Low latency

dinsdag 18 juni 13

slide-37
SLIDE 37

follow the Hippo trail OneHippo @ Goto

Couchbase

  • Open Source
  • Document-oriented
  • Easy Scalable
  • Consistent High Performance

dinsdag 18 juni 13

slide-38
SLIDE 38

follow the Hippo trail OneHippo @ Goto

Performance

  • Object managed cache
  • Write Queue to disk
  • Avoids Cold Cache

dinsdag 18 juni 13

slide-39
SLIDE 39

follow the Hippo trail OneHippo @ Goto

Easy scalable

  • Auto sharding
  • Cross cluster replication (XDCR)
  • Master - Master replication

dinsdag 18 juni 13

slide-40
SLIDE 40

follow the Hippo trail OneHippo @ Goto

Flexible data model

  • Native JSON support
  • Incremental Map Reduce
  • Gives power to the developer

dinsdag 18 juni 13

slide-41
SLIDE 41

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

How we run Couchbase @Hippo

dinsdag 18 juni 13

slide-42
SLIDE 42

follow the Hippo trail OneHippo @ Goto

Load Balancer Database cluster Hippo Delivery Tier

Couchbase cluster

  • Request log data
  • Targeting data
  • Statistics data

dinsdag 18 juni 13

slide-43
SLIDE 43

follow the Hippo trail OneHippo @ Goto

Query capabilities

  • Querying via views
  • Secondary indexes via views
  • Views based on Map - Reduce
  • Lacks some advanced query capabilities

dinsdag 18 juni 13

slide-44
SLIDE 44

follow the Hippo trail OneHippo @ Goto

Elasticsearch

  • Apache Lucene
  • Designed to be distributed
  • Schema free
  • Apache 2 licensed
  • RESTful API

dinsdag 18 juni 13

slide-45
SLIDE 45

follow the Hippo trail OneHippo @ Goto

Added value of ES

  • Full text search
  • Faceted search
  • Geo spatial search
  • All in (near) real-time

dinsdag 18 juni 13

slide-46
SLIDE 46

follow the Hippo trail OneHippo @ Goto

Couchbase Server Cluster Elasticsearch Server Cluster

Hippo Delivery Tier Java API

Write Read XDCR

Couchbase ES Transport plugin

Replicating to ES

dinsdag 18 juni 13

slide-47
SLIDE 47

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Demo time!

dinsdag 18 juni 13

slide-48
SLIDE 48

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

What’s Next?

dinsdag 18 juni 13

slide-49
SLIDE 49

follow the Hippo trail OneHippo @ Goto

Advanced analytics

dinsdag 18 juni 13

slide-50
SLIDE 50

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Thank you! Questions?

j.reijn@onehippo.com @jreijn

  • ps. We’re hiring!

dinsdag 18 juni 13