Drupal and Solr Saturday, August 30, 2008 1 Hello Im Alexandru - - PowerPoint PPT Presentation

drupal and solr
SMART_READER_LITE
LIVE PREVIEW

Drupal and Solr Saturday, August 30, 2008 1 Hello Im Alexandru - - PowerPoint PPT Presentation

Drupal and Solr Saturday, August 30, 2008 1 Hello Im Alexandru Badiu Drupal and Solr - Alexandru Badiu Saturday, August 30, 2008 2 Hello Im Alexandru Badiu I come from the land of vampires Were going to talk about Solr Drupal


slide-1
SLIDE 1

Drupal and Solr

1 Saturday, August 30, 2008

slide-2
SLIDE 2

Drupal and Solr - Alexandru Badiu

Hello

I’m Alexandru Badiu

2 Saturday, August 30, 2008

slide-3
SLIDE 3

Drupal and Solr - Alexandru Badiu

Hello

I’m Alexandru Badiu I come from the land of vampires We’re going to talk about Solr

3 Saturday, August 30, 2008

slide-4
SLIDE 4

Drupal and Solr - Alexandru Badiu

So what is Solr?

Solr is an enterprise search server It is based on Lucene It is an Apache Software Foundation project It has some cool features

4 Saturday, August 30, 2008

slide-5
SLIDE 5

Drupal and Solr - Alexandru Badiu

Why would I use it?

It is 4 - 5 times faster than the standard Drupal search Your database will be happier Can deliver better search results (ever used site:drupal.org in Google?) Has replication and distributed search (for that really big content website) Select company: CNet, Netflix, Internet Archive,Digg Some cool features:

  • Facets
  • More options when searching
  • Geographical search

5 Saturday, August 30, 2008

slide-6
SLIDE 6

Drupal and Solr - Alexandru Badiu

Using it: the easy way

Use the Apache Solr module http://drupal.org/project/apachesolr Very easy to install You can use Tomcat, Jetty or Resin I recommend Jetty if you have a choice

6 Saturday, August 30, 2008

slide-7
SLIDE 7

Drupal and Solr - Alexandru Badiu

Using it: the not so easy way

Build your own app To do that we’ll learn about Solr Since it’s a BoF let’s get interactive Make sure you have Java (preferably 1.5) Download zips from http://voidberg.org/drupalcon/ from and unpack Cd to solr/example java -jar start.jar Go to http://localhost:8983/solr/ If it works you just installed Solr Let’s use it http://localhost:8983/solr/admin and search for *:* This is how you query Solr

7 Saturday, August 30, 2008

slide-8
SLIDE 8

Drupal and Solr - Alexandru Badiu

Solr concepts

You have a collection of documents Every document has fields You define these fields in a schema Lots of options here

  • Data types
  • Analyzers
  • Tokenizers
  • Dynamic fields
  • Field copy

8 Saturday, August 30, 2008

slide-9
SLIDE 9

Drupal and Solr - Alexandru Badiu

Searching

Uses the Lucene query syntax Lots of options when searching:

  • everything: keyword
  • in a specific field: fieldname:keyword
  • phrases: fieldname:”keyword1 keyword2”
  • wildcards: key?ord and keywo*
  • fuzzy: keyword~
  • proximity: “keyword1 keyword2”~
  • range: created:[* TO 20030101] or created:{20020101 TO NOW}
  • operators:AND, -, +, NOT

9 Saturday, August 30, 2008

slide-10
SLIDE 10

Drupal and Solr - Alexandru Badiu

Searching

You can group boolean queries To sort add ‘sort’ to the query sort=field1 asc, field2 desc Pagination: ‘start’ and ‘rows’ To specify what field are to be retrieved use fl fl=*,score

10 Saturday, August 30, 2008

slide-11
SLIDE 11

Drupal and Solr - Alexandru Badiu

How do I add stuff?

Take a look in the exampledocs directory Take a look at post.sh Programatically: hook_user, hook_nodeapi, hook_update_index Generate a document $xml can also be <commit />, <optimize /> or <delete><query>id:123</ query></delete> Querying is similar wt=json&json.nl=map indent=on

$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, SOLR_URL.SOLR_PATH."/update"); curl_setopt($ch, CURLOPT_POST, TRUE); curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content‐type:text/xml; charset=utf‐8')); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_POSTFIELDS, $xml); $result = curl_exec($ch);

11 Saturday, August 30, 2008

slide-12
SLIDE 12

Drupal and Solr - Alexandru Badiu

Facets

What are facets? You are not limited to a specific order Solr has built in support facet=true facet.field=field facet.query=field:value facet.sort=true facet.mincount=x

12 Saturday, August 30, 2008

slide-13
SLIDE 13

Drupal and Solr - Alexandru Badiu

Request handlers

What are request handlers? You’ll probably use standard, dismax, spellchecker and morelikethis Solr has others You can have more instances of a handler with different configurations You can write your own too

13 Saturday, August 30, 2008

slide-14
SLIDE 14

Drupal and Solr - Alexandru Badiu

Request handlers - Dismax

qt=dismax Boost documents based on your interest qf=title^2 body^1 tag^0.5 bq=cms:Drupal^2 No wildcard To get all documents use q.alt=*:*

14 Saturday, August 30, 2008

slide-15
SLIDE 15

Drupal and Solr - Alexandru Badiu

Request handlers - MoreLikeThis and SpellChecker

MoreLikeThis returns documents who are similar with the ones you specify SpellChecker... spell checks

15 Saturday, August 30, 2008

slide-16
SLIDE 16

Drupal and Solr - Alexandru Badiu

Caching

We all know what caching is

16 Saturday, August 30, 2008

slide-17
SLIDE 17

Drupal and Solr - Alexandru Badiu

Caching

We all know what caching is And now for some useless trivia

(Who doesn’t use Drupal)

17 Saturday, August 30, 2008

slide-18
SLIDE 18

Drupal and Solr - Alexandru Badiu

Caching

Solr has a lot of caches Some you can influence easily, some not filterCache fq=query queryResultCache documentCache Auto warming Explicit warming

18 Saturday, August 30, 2008

slide-19
SLIDE 19

Drupal and Solr - Alexandru Badiu

Caching

#!/usr/bin/python import urllib, time, sys def query(url): print url u = urllib.urlopen(url) data = u.read() u.close() config = { 'imoostiri': 'articol_data, articol_tag', 'rez': 'tipans,judet,zona,oras,pmp,status,dcomp,pstart,ncam,stot,tipv,limba,stip,tag,', 'ci': 'type,added,tag,', } warm = 'q=text:[a%20TO%20z]' url = 'http://solrurl/%s/select?%s&fl=*&wt=python' furl = 'http://solrurl/%s/select?%s&wt=python&facet=true&facet.field=%s&facet.zeros=true&rows=0&facet.limit=‐1' for dir,facets in config.iteritems(): surl = url % (dir, warm) query(surl) facets = facets.split(',') for facet in facets: facet = facet.strip() if facet != '': start = time.time() surl = furl % (dir, warm, facet) query(surl)

19 Saturday, August 30, 2008

slide-20
SLIDE 20

Drupal and Solr - Alexandru Badiu

Geolocation and Solr

No real solution out of the box The easy way: lat:[l1 to l2] and lon:[lo1 to lo2] LocalSolr - port of LocalLucene radius parameter http://localhost:8983/localcinema/

20 Saturday, August 30, 2008

slide-21
SLIDE 21

Drupal and Solr - Alexandru Badiu

Geolocation and Solr

C-Squares

21 Saturday, August 30, 2008

slide-22
SLIDE 22

Drupal and Solr - Alexandru Badiu

Geolocation and Solr

C-Squares Latitude 38.8894 and longitude -77.0356

* 0.0005-degree square 7307:487:380:383:495:2 * 0.001-degree square 7307:487:380:383:495 * 0.005-degree square 7307:487:380:383:4 * 0.01-degree square 7307:487:380:383 * 0.05-degree square 7307:487:380:3 * 0.1-degree square 7307:487:380 * 0.5-degree square 7307:487:3 * 1-degree square 7307:487 * 5-degree square 7307:4 * 10-degree square 7307

22 Saturday, August 30, 2008

slide-23
SLIDE 23

Drupal and Solr - Alexandru Badiu

Geolocation and Solr

C-Squares Add a field, csquare, allowed to have multiple values Index all sizes of the C-Square Convert your position to a C-Square using a “radius” Do a search like csquare:mypos You’ll get all documents in that square Also there’s GeoHash

23 Saturday, August 30, 2008

slide-24
SLIDE 24

Drupal and Solr - Alexandru Badiu

Resources

http://lucene.apache.org/solr http://lucene.apache.org/java/docs/queryparsersyntax.html http://wiki.apache.org/solr/ http://www.marine.csiro.au/csquares/about-csquares.htm http://geohash.org/ http://sourceforge.net/projects/locallucene/

24 Saturday, August 30, 2008

slide-25
SLIDE 25

Drupal and Solr - Alexandru Badiu

Thank you

Alexandru Badiu http://www.voidberg.org i@voidberg.org

25 Saturday, August 30, 2008