Accelerating Document Retrieval and Ranking for Cognitive - - PowerPoint PPT Presentation
Accelerating Document Retrieval and Ranking for Cognitive - - PowerPoint PPT Presentation
Accelerating Document Retrieval and Ranking for Cognitive Applications Presenters: Tim Kaldewey Performance Architect David Wendt Performance Engineer Disclaimer The author's views expressed in this presentation do not necessarily
Disclaimer
The author's views expressed in this presentation do not necessarily reflect the views of IBM.
Watson evolution
*http://www-03.ibm.com/software/businesscasesudies/us/en/corp?synkey=Y362451T34615G34
Watson evolution
40x*
*http://www-03.ibm.com/software/businesscasestudies/us/en/corp?synkey=Y362451T34615G34
A “brainwave” for answering a question
Time [ms]
Background
- Querying unstructured data (text) to identify relevant documents is a
prerequisite for many cognitive data processing tasks (NLP)
- The large number of queries and the volume of unstructured data require
a highly performant mechanism
Example: - Lucene index of Wikipedia (5 million docs) is 105GB
- Average search comprises 7 terms (keywords)
- On average 115 thousand documents scored per search
- Scoring of candidate documents and passages is highly parallelizable.
➔ Acceleration can can be leveraged to improve response time and/or enable more complex queries to improve accuracy
Document Search
- Retrieve the documents that are most likely to have the answer(s) to the question
- Search for documents that contain the words from the question
- Rank the documents based on
– How frequent the words and word combinations appear in each document – The distance between these words in those documents
This provincial government of Canada is officially known as the government of Newfoundland and what region? Index is
- rganized in
term-document format
Anatomy of Lucene Query
- Words are stemmed and some stop words (the, of, as, …) are removed.
- Keywords become term clauses: canada newfoundland provinci govern offici …
– Scores are computed based on term frequency.
- Word pairs (phrases) become span clauses: "provinci govern"~2 …
– Scores are computed based on frequency of phase and word distance between words
- Complex queries (e.g. nested span clauses) can improve accuracy by scoring higher more
relevant documents.
This provincial government of Canada is officially known as the government
- f Newfoundland and what region?
+canada +newfoundland +provinci +govern +offici +known^0.5 +region "provinci govern"~2 "govern canada"~2 "offici known"~2^0.9 "known govern"~2 "govern newfoundland"~2 "offici region"~3
Turn text into a Lucene query to retrieve relative documents.
Scoring term clauses
- Lucene is very efficient making only one-
pass to match and score
- Index format is optimized for speed in
matching terms to documents
- For each document, score each term
clause and then sum the scores
- Scorer takes three values:
– Term frequency – Document length – Term probability
Scoring span clauses
Scoring here uses a ‘sloppy’ frequency value calculated based on how often the term pair appears and how close together the terms are to each other. Clause form: span(term1,term2,slop,order) Example: span(provinci,govern,2,false)
"provinci govern"~2 "govern canada"~2 "offici known"~2 "known govern"~2 "govern newfoundland"~2 "offici region"~3
Scoring span clauses – continued
- Position vectors vary in length per term per document.
span(provinci,govern,2,false)
Analysis
- Scoring for each document is independent from other documents
- At the end, scores are sorted to provide the document rank order
Perfect for GPU
- Floating point operations for thousands of
items (documents) that can occur in parallel
- Each query clause is implemented as a set of
kernels and the scores accumulate in a float array where each element is the score for a unique document
- The top N ranked document ids are returned
to the host application
Scoring on the GPU
- We used the thrust libraries for sorting and
intersecting to more easily include a CPU-only alternative
- All term clauses are scored first and can be
calculated in a single kernel (loop)
- Spans are computed to maximize caching of
term position values
- Once scored, the results are sorted and the top
N document ids are returned along with their scores
Only 5 custom kernels were required.
Results
Making it Real
- Accessing the index data: ids, frequencies, positions
- Managing GPU access
- Recursion for nested clauses
- Scoring special cases
- Coverage of query types
Shared index data
- First approach was to create a custom index with only the values we needed for scoring.
- Sharing the index with the rest of Lucene would be ideal but how much would this cost us?
Shared index data - results
Managing GPU access
- Need to handle simultaneous queries
from many host threads
- A dedicated set of streams – one per
host thread – to handle each query
- Limited the number of streams based on
the available GPU memory and index size
- Once the GPU is fully utilized, additional
host threads can be blocked or can fallback to calling Lucene directly
Recursion for nested spans
- Although CUDA supports recursion, having an unknown stack-size becomes an issue.
- Implemented the recursions as loops and managed a fake stack in global memory
Query Types vs Coverage
- Query types are unique
combinations of search clauses: terms, spanNear, spanOr, nested spans, etc.
- Coverage progression is from
most common clause type to least common. .
Scoring span clauses has special cases
- There are some special cases like when phrases overlap.
Conclusion
- Speed up by half an order of magnitude
- Many challenges: shared index, query types, recursion, …
- GPU performance is even higher for complex queries
– Words resulting in many documents requiring more threads – Complex span clauses with many position values
- Speeding up query allows building more complex queries and