Yokozuna NoSQL Search Amsterdam 2013 Me What is Yokozuna? Source: - - PowerPoint PPT Presentation

yokozuna
SMART_READER_LITE
LIVE PREVIEW

Yokozuna NoSQL Search Amsterdam 2013 Me What is Yokozuna? Source: - - PowerPoint PPT Presentation

Yokozuna NoSQL Search Amsterdam 2013 Me What is Yokozuna? Source: http://katrinainjapan.files.wordpress.com/2013/08/yokozuna.jpg Sumo Wrestling Term Horizontal rope. The top rank in sumo, usually translated Grand Champion. The name comes


slide-1
SLIDE 1

Yokozuna

NoSQL Search Amsterdam 2013

slide-2
SLIDE 2

Me

slide-3
SLIDE 3

What is Yokozuna?

slide-4
SLIDE 4

Source: http://katrinainjapan.files.wordpress.com/2013/08/yokozuna.jpg

slide-5
SLIDE 5

Sumo Wrestling Term

“Horizontal rope. The top rank in sumo, usually translated Grand Champion. The name comes from the rope a yokozuna wears for the dohyō-iri.”

Source: http://en.wikipedia.org/wiki/Glossary_of_sumo_terms

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

Riak

+ Amazing KV Store + Distributed + Highly Available + Easily Scalable + Self Healing + Open Source

slide-9
SLIDE 9

Consistent Hashing

slide-10
SLIDE 10

Replication

slide-11
SLIDE 11

x

Self Healing

x x x x x x x

slide-12
SLIDE 12

Riak Questions?

slide-13
SLIDE 13

Riak

  • Limited Query Ability
  • Query Performance
  • Index Entropy Repair
  • Limited Full Text Search
slide-14
SLIDE 14
slide-15
SLIDE 15

Solr

Not Solr Cloud

+ Amazing Query Support + Robust Inverted Index + Near Real-time Indexing + Sophisticated Analyzers + Language Support + Features: facets, highlighting, storing, sorting + Gold Standard

slide-16
SLIDE 16

Solr

Not Solr Cloud

  • HA is secondary to search
  • Manual everything
  • No entropy
  • Key value
slide-17
SLIDE 17
slide-18
SLIDE 18

Combine FTW

  • Amazing KV Store
  • Distributed
  • Highly Available
  • Easily Scalable
  • Self Healing
  • Amazing Query

Support

  • Sophisticated

Analyzers

  • Language Support
  • Great Features
slide-19
SLIDE 19

Why Yokozuna?

What about Riak Search?

slide-20
SLIDE 20

Riak Search

+ Term-based sometimes better + Pure Erlang + Relatively small code base

slide-21
SLIDE 21

Riak Search

  • Large result sets (> 100k)
  • Memory pressure
  • Lack of facet query
  • Language support
  • Basic analyzers
  • Entropy & Repair
slide-22
SLIDE 22

Integrate Search

  • Riak Search & Basho can’t keep pace with

Lucene/Solr

  • Don’t re-invent the search
  • Basho’s strength is distributed databases
slide-23
SLIDE 23

What About 2i?

  • Query one index [field] at a time
  • No notion of ranking
  • Range and exact term only
  • Must use leveldb or memory
  • No full text search
  • Basic types - string and int
slide-24
SLIDE 24

Goals of Yokozuna

slide-25
SLIDE 25

Goals of Yokozuna

  • Provide robust query against KV data
  • Require minimal work from user
  • Don’t concern user with distribution
  • Replace Riak Search (and then some)
slide-26
SLIDE 26

How does it work?

slide-27
SLIDE 27

Yokozuna

  • Erlang application like Riak KV
  • Erlang supervisor for Solr process
slide-28
SLIDE 28

Solr & JVM

  • Configurable jvm_args in riak.conf

yokozuna.solr_jvm_args = -Xms256m -Xmx256m -XX: +UseStringCache -XX:+UseCompressedOops

slide-29
SLIDE 29

Indexing

  • Each Riak node runs a Solr instance
  • Store schema; create index; associate bucket
  • Data is automatically indexed as it is added
  • Index repair is provided through AAE
  • Extendable through custom extractors
slide-30
SLIDE 30

Store Schema

<field name="commit_repo" type="string" indexed="true" stored="true"/> <field name="commit_hash" type="string" indexed="true" stored="true"/> <field name="commit_author" type="string" indexed="true" stored="true"/> <field name="commit_dt" type="date" indexed="true" stored="true"/> <field name="commit_subject" type="text_general" indexed="true" stored="true"/> <field name="commit_body" type="text_general" indexed="true" stored="true"/> curl -XPUT -i -H 'content-type: application/xml' 'http://localhost:10018/yz/schema/cls' --data-binary @cls.xml

slide-31
SLIDE 31

Create Index

curl -XPUT -i -H 'content-type: application/json' 'http://localhost:10018/yz/index/cls' -d '{"schema":"cls"}'

slide-32
SLIDE 32

Associate Bucket

curl -XPUT -i -H 'content-type: application/json' 'http://localhost:10018/buckets/my_bucket/props' -d '{"props":{"yz_index":"my_index"}}'

slide-33
SLIDE 33

Replication

k k1 k2 k3

slide-34
SLIDE 34

Three Replicas

k k1 k2 k3

Riak kv

i1 i2 i3

Solr index

slide-35
SLIDE 35

Features

slide-36
SLIDE 36

Solr has it? Yokozuna has it!*

* http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

slide-37
SLIDE 37

Powerful Analysis

  • Full-text to tokens
  • Lowercasing
  • Stemming
  • Synonyms
  • Stop-word removal
  • Language support
slide-38
SLIDE 38

Querying

slide-39
SLIDE 39

?q=<field>:<term>

  • Single Term

?q=commit_repo:riak_kv

  • Boolean (OR, default)

?q=commit_repo:riak_kv%20commit_repo:riak_core

  • Boolean (AND)

?q=commit_repo:riak_kv%20AND%20commit_author:”Ryan %20Zezeski"

  • Boolean (NOT)

?q=commit_repo:riak_kv%20NOT%20commit_author:”Ryan %20Zezeski"

slide-40
SLIDE 40

?q=<field>:<term>

  • Range (good for dates; Solr has “date math”)

?q=commit_dt:[NOW-1YEAR TO NOW]

  • Wildcard everything (good catch all)

?q=*:*

  • Wildcard terms

?q=commit_repo:riak_*

  • Wildcard Regex

?q=NoExample

slide-41
SLIDE 41

?q=<field>:<term>

  • Term (Full Text)

?q=commit_subject:vnode%AND%commit_body:vnode

  • Phrase/Proximity (exact match)

?q=commit_body:”hinted handoff”

  • Phrase/Proximity (“slop”/“edit distance” of 4)

?q=commit_body:”parition vnode”~4

  • Fuzzy (slop at word level for misspellings)

?q=commit_body:behaviour~1

slide-42
SLIDE 42

Sort & Rank

  • Sorting (good for dates with ranges)

?q=commit_dt:[NOW-1YEAR TO NOW]&sort=commit_dt %20asc

  • Ranking

?q=commit_body:”hinted handoff”&fl=commit_*,score

slide-43
SLIDE 43

Tagging

  • Adds 2i like functionality
  • Indexes via object metadata
  • Index tags that do not affect the object
  • Useful for binary objects
slide-44
SLIDE 44

Facets

slide-45
SLIDE 45

Highlighting

slide-46
SLIDE 46

Self Healing

slide-47
SLIDE 47

Hinted Handoff

slide-48
SLIDE 48

Replication

k k1 k2 k3

slide-49
SLIDE 49

Three Replicas

k k1 k2 k3

Riak kv

i1 i2 i3

Solr index

slide-50
SLIDE 50

x

Node Failure

x x x x x x x

k k1 k3 k2

slide-51
SLIDE 51

Fallback Replica

k k1 k2 k3

Riak kv

i1 i3

Solr index

x

slide-52
SLIDE 52

Hinted Handoff

k1 k2 k3

Riak kv

i1 i3

Solr index

k2 i2

slide-53
SLIDE 53

Hinted Handoff

  • When a node in Riak fails, fallbacks are used
  • When the node returns, data is handed back
  • As data is “handed-off” from fallback to

primary, it is indexed on the primary

slide-54
SLIDE 54

Active Anti-Entropy

slide-55
SLIDE 55

AAE

  • Two systems (Riak & Solr) increase chances
  • f inconsistency
  • Files can become corrupted/truncated
  • Solr indexes could be accidentally removed
  • Handles malformed KV data
slide-56
SLIDE 56

AAE

  • It uses hash trees
  • Updates in real time
  • It’s non-blocking
  • Periodically exchanged
  • Periodically expired and rebuilt
  • It invokes read-repair and re-index on

divergence

slide-57
SLIDE 57

AAE - Exchange

slide-58
SLIDE 58

AAE - Exchange

TOP HASHES DON’T MATCH - SOMETHING IS DIFFERENT

slide-59
SLIDE 59

AAE - Exchange

NARROW DOWN THE DIVERGENT SEGMENT

slide-60
SLIDE 60

AAE - Exchange

NARROW DOWN THE DIVERGENT SEGMENT CONT...

slide-61
SLIDE 61

AAE - Exchange

ITER FINAL LIST OF HASHES TO FIND DIVERGENT KEYS

slide-62
SLIDE 62

AAE - Exchange

REPAIR (RE-INDEX) KEYS THAT ARE DIVERGENT (RED)

slide-63
SLIDE 63

Learn More

  • Mailing list at docs.basho.com
  • #riak IRC room on irc.freenode.net
  • http://bit.ly/riak-2-0
slide-64
SLIDE 64

Questions?

Thanks very much dbrown@basho.com