Sphinx search technical overview Vladimir Fedorkov Open Source - - PowerPoint PPT Presentation

sphinx search technical overview
SMART_READER_LITE
LIVE PREVIEW

Sphinx search technical overview Vladimir Fedorkov Open Source - - PowerPoint PPT Presentation

Sphinx search technical overview Vladimir Fedorkov Open Source Search Devroom FOSDEM15 About me Performance geek blog http://astellar.com Twitter @vfedorkov Enjoy LAMP stack tuning Especially database backend Love to


slide-1
SLIDE 1

Sphinx search technical overview

Vladimir Fedorkov Open Source Search Devroom FOSDEM’15

slide-2
SLIDE 2

About me

  • Performance geek

– blog http://astellar.com – Twitter @vfedorkov

  • Enjoy LAMP stack tuning

– Especially database backend

  • Love to speak on the conferences
  • Use Sphinx in production from 2006

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-3
SLIDE 3

Meet Sphinx

  • Created in early 200x as an alternative to

MySQL full-text search

  • Written on C++
  • Working as separate daemon
  • Running on various platforms *nix, win*, etc

– Seen on iPhones and WiFi routers

  • Now serving installations with billions or

documents.

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-4
SLIDE 4

Architecture sample: querying

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-5
SLIDE 5

Agenda

  • Loading data
  • Current storage types
  • Querying Sphinx
  • Full text vs non-full-text
  • Getting results
  • Life after the search
  • Grow Sphinx from node to cluster

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-6
SLIDE 6

Loading data into Sphinx

  • Sphinx is talking to databases to pull data

– MySQL, PostgreSQL, MSSQL and any ODBC source

  • Loading structured data in XML format

– Useful to load data from NoSQL storages

  • like Mongo, etc

– Can be used for document pre-processing

  • SQL-style updates

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-7
SLIDE 7

Storage types

  • Real-time indexes

– Push mode

  • Application pushes data to Sphinx

– Ideal for frequently updated data

  • On-disk (plain) indexes

– Data pull mode

  • Sphinx handling indexing on itself

– Ideal for static data

  • Or else:

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-8
SLIDE 8

On disk vs Real-time indexes

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-9
SLIDE 9

Querying

  • SphinxQL:

– Uses MySQL client lib to connect to sphinx – Available in most programming languages

  • Legacy API

– PHP, Python, Java, Ruby, C is included in distro – .NET, Rails (via Thinking Sphinx) via third party libs

mysql> SELECT * FROM sphinx_index

  • > WHERE MATCH('I love Sphinx')
  • > AND news_channel = 285
  • > LIMIT 5;

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-10
SLIDE 10

How does it work?

  • Query pre processing
  • Full-text search stage
  • Non-full text filtering
  • Ranking / Grouping / Ordering
  • Applying limit
  • Sending results back

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-11
SLIDE 11

Query & text pre-processing

  • Removing stop words
  • Transforming text

– Applying morphology, blended chars, filters, replacements

  • Prefix/infix indexing
  • Other “magic”

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-12
SLIDE 12

Full-Text support

  • And, Or

– hello | world, hello & world

  • Not

– hello -world

  • Per-field search

– @title hello @body world

  • Field combination

– @(title, body) hello world

  • Search within first N

– @body[50] hello

  • Phrase search

– “hello world”

  • Per-field weights
  • Proximity search

– “hello world”~10

  • Distance support

– hello NEAR/10 world

  • Quorum matching

– "the world is a wonderful place"/3

  • Exact form modifier

– “raining =cats and =dogs”

  • Strict order
  • Sentence / Zone / Paragraph
  • Custom documents weighting

& ranking, etc

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-13
SLIDE 13

Non text filters

  • in SphinxQL terms, WHERE conditions

– a = 5, a < 5, a > 5, a BETWEEN 3 AND 5

  • Integers, floating point, strings are supported
  • JSON

– SELECT ALL(x>3 AND x<7 FOR x IN j.intarray) – SELECT j.users[3].address[2].streetname

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-14
SLIDE 14

Special integers: MVAs

  • Built in “one–to–many” attributes
  • Set of integers in a single value
  • Useful for

– Page tag IDs – Multi category items

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-15
SLIDE 15

GEO-Distance support

  • Bumping up and/or filtering local results

– Just add float latitude, longitude attributes, and..

  • GEODIST(Lat, Long, Lat2, Long2) in Sphinx
  • Has syntax for mi/km/m, deg/rad etc

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-16
SLIDE 16

Relevance tuning

  • Weighting

– Per field – Per index

  • Expression based ranking

– 15+ of text signals, N of yours non-text

  • OPTION ranker=expr(‘1000*sum(lcs)+bm25’)
  • OPTION ranker=expr(‘700*sum(lcs)+bm25f(1.4, 0.8,

{title=3, content=1}’)

– Several built-in rankers available

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-17
SLIDE 17

Reading results

mysql> SELECT * FROM idx

  • > WHERE MATCH('I love Sphinx') LIMIT 5
  • > OPTION field_weights=(title=100, content=1);

+---------+--------+------------+------------+ | id | weight | channel_id | ts | +---------+--------+------------+------------+ | 7637682 | 101652 | 358842 | 1112905663 | | 6598265 | 101612 | 454928 | 1102858275 | | 6941386 | 101612 | 424983 | 1076253605 | | 6913297 | 101584 | 419235 | 1087685912 | | 7139957 | 1667 | 403287 | 1078242789 | +---------+--------+------------+------------+ 5 rows in set (0.00 sec)

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-18
SLIDE 18

Life after search

  • CALL SNIPPETS, making excerpts
  • Building facets (Brands, price ranges)
  • Showing related items
  • Performing misspells corrections
  • “Did you mean” service

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-19
SLIDE 19

Combining indexes

  • On the single box

– Main + Delta – Main + Delta + RT

  • On the cluster

– Local and distributed

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-20
SLIDE 20

Distributed search

  • Yet static nodes configuration
  • Weighted round-robin querying
  • Load-based distribution
  • Failover node

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-21
SLIDE 21

Sphinx search cluster architecture

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-22
SLIDE 22

Sphinx cluster data flow

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-23
SLIDE 23

News from the Lab

  • New index format in Sphinx 3.0

– Faster indexing and search

  • No legacy 4/16Gb attribute limits per index
  • Data replication between nodes
  • HTTP/REST interface
  • Even faster snippets
  • Some secret projects I can’t talk about 

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-24
SLIDE 24

Find more about Sphinx

  • Official website: http://sphinxsearch.com
  • My blog http://astellar.com

– Some information you may find useful – Slides will be there

  • Twitter: @vfedorkov

– Mainly Sphinx and MySQL performance

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-25
SLIDE 25

QUESTIONS!

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM

slide-26
SLIDE 26

THANK YOU!

Open Source Search Devroom, FOSDEM’15 ASTELLAR.COM