Apache Lucene - a library retrieving data for millions of users
Simon Willnauer Apache Lucene Core Committer & PMC Chair
simonw@apache.org / simon.willnauer@searchworkings.org
Friday, October 14, 2011
Apache Lucene - a library retrieving data for millions of users - - PowerPoint PPT Presentation
Apache Lucene - a library retrieving data for millions of users Simon Willnauer Apache Lucene Core Committer & PMC Chair simonw@apache.org / simon.willnauer@searchworkings.org Friday, October 14, 2011 About me? Lucene Core Committer
Simon Willnauer Apache Lucene Core Committer & PMC Chair
simonw@apache.org / simon.willnauer@searchworkings.org
Friday, October 14, 2011
2
Friday, October 14, 2011
3
Friday, October 14, 2011
4
Friday, October 14, 2011
5
Friday, October 14, 2011
SpanQueries
6
Friday, October 14, 2011
7
Friday, October 14, 2011
8
Friday, October 14, 2011
9
Friday, October 14, 2011
10
Friday, October 14, 2011
11
Friday, October 14, 2011
12
Friday, October 14, 2011
13
Friday, October 14, 2011
are using Lucene directly
14
Katta
Friday, October 14, 2011
15
Friday, October 14, 2011
16
Friday, October 14, 2011
programming model
17
Friday, October 14, 2011
18
Friday, October 14, 2011
19
Friday, October 14, 2011
recognition, coreference resolution
20
Friday, October 14, 2011
21
Friday, October 14, 2011
22
ancient APIs and file formats for good.
Friday, October 14, 2011
its all about the user!
23
Friday, October 14, 2011
compared to 3.x
24
Friday, October 14, 2011
25
Friday, October 14, 2011
26
IndexWriter IndexReader
Friday, October 14, 2011
27
IndexWriter IndexReader
Friday, October 14, 2011
28
Friday, October 14, 2011
expensive.
all documents in the search field)
procedures
29
Friday, October 14, 2011
30
\u0000-f, g ,h-n, o, p-\uffff
d
Friday, October 14, 2011
31
// a term representative of the query, containing the field. // term text is not important and only used for toString() and such Term term = new Term("body", "dogs~1"); // builds a DFA for all strings within an edit distance of 2 from "bla" Automaton fuzzy = new LevenshteinAutomata("dogs").toAutomaton(1); // concatenate this with another DFA equivalent to the "*" operator Automaton fuzzyPrefix = BasicOperations.concatenate(fuzzy, BasicAutomata .makeAnyString()); // build a query, search with it to get results. AutomatonQuery query = new AutomatonQuery(term, fuzzyPrefix); Friday, October 14, 2011
32
Friday, October 14, 2011
33
Indexing with Lucene 4.0 Indexing with Lucene 3.x
Friday, October 14, 2011
Indexing with Lucene 4.0
34
Indexing with Lucene 3.x
Friday, October 14, 2011
35
Friday, October 14, 2011
very often is crucial to get things right.
36
Friday, October 14, 2011
37
Friday, October 14, 2011
responsibilities
38
Friday, October 14, 2011
39
Friday, October 14, 2011
40
Friday, October 14, 2011
same seed
41
Friday, October 14, 2011
implementations without much fear!
42
Friday, October 14, 2011
43
Friday, October 14, 2011