query languages
play

Query Languages R B R Baeza Yates and B R. Baeza Baeza-Yates - PDF document

1 Query Languages R B R Baeza Yates and B R. Baeza Baeza-Yates and B Yates and B Riberio Yates and B. Riberio Riberio Neto Riberio-Neto Neto Neto Modern Information Retrieval, Chapter 4 Modern Information Retrieval, Chapter 4 Jon


  1. 1 Query Languages R B R Baeza Yates and B R. Baeza Baeza-Yates and B Yates and B Riberio Yates and B. Riberio Riberio Neto Riberio-Neto Neto Neto Modern Information Retrieval, Chapter 4 Modern Information Retrieval, Chapter 4 Jon Atle Gulla TDT4215 Query Languages 2 Query Languages • How do specify your information needs? • Query types: – Keyword-based querying Keyword based querying – Pattern matching – Structural queries •NTNU/IDI/IS TDT4215 Query Languages

  2. 3 Keyword-based querying Single-word queries skiing TDT4215 exam NTNU Trondheim skiing norway snowboarding skiing norway snowboarding • Result is a set of documents containing at least one of the words of the query • Documents ranked according to relevance Documents ranked according to relevance • Web extensions: skiing telemark skiing +telemark skiing -telemark TDT4215 Query Languages 4 Keyword-based querying Context queries Words appearing near each other may signal a higher relevance than Words appearing near each other may signal a higher relevance than words far apart • Phrases: – a sequence of single word queries “new york times” “to be or not to be” “olympic games” london “ l i ” l d • Proximity – a sequence of words is given together with a maximum allowed distance q g g between them ntnu trondheim “ …the university in trondheim is ntnu…” “…ntnu is situated in trondheim…” eggen rbk bk TDT4215 Query Languages

  3. 5 Phrasing or not phrasing Query new york times • How to deal with queries that have potential “new york times” phrases? – How to recognize a potential phrase? “new york” times – How to interpret potential phrases? new york times • Interpretation affects ranking!! TDT4215 Query Languages 6 Proximity search D Documents Query it's 3 pm in New York , what time is it in the new york times rest of the world? .For your reading pleasure, we present historic issues from the New York Times . City of York Council - list of new library opening times and addresses opening times and addresses. • Which document is the most relevant one? .Three webcam views of Times Square, • How do we achieve this How do we achieve this New York . N Y k ranking? TDT4215 Query Languages

  4. 7 Keyword-based querying Boolean queries • Boolean operators: – OR (e1 OR e2) – AND (e1 AND e2) – BUT (e1 BUT e2) NOT BUT (e1 BUT e2) NOT • No ranking of documents provided • “Fuzzy boolean”: Meaning of AND and OR relaxed Natural language: • • Query is an enumeration of words and context queries Query is an enumeration of words and context queries • All documents matching a portion of the user query are retrieved • Higher ranking is assigned to those documents matching more parts of the query • Q Query and documents viewed as vectors d d t i d t TDT4215 Query Languages 8 Pattern matching • A pattern is a set of syntactic features that must occur in a text segment, ranging from simple (e.g. words) to complex (e.g. regular expressions) terms • Typical patterns: – words – – prefixes ‘comput’ -> ‘computer’ ‘computation’ ‘computing’ prefixes comput -> computer , computation , computing – suffixes. ‘ters’ -> ‘computers’, ‘testers’, ‘printers’ – sub-strings. ‘tal’ -> ‘coastal’, ‘talk’, ‘metallic’ – ranges. ‘held’ and ‘hold’ -> ‘hoax’, ‘hissing’ ranges. held and hold hoax , hissing – allowing erros – regular expressions – extended patterns TDT4215 Query Languages

  5. 9 Structural queries • Allowing the user to query documents based on their structure (not on their content) • Mixing content and structure in query allows us to post more expressive queries • Three main structures: – form-like fixed structures f lik fi d t t – hypertext structures – hierarchical structures TDT4215 Query Languages 10 Fixed structure Fixed structure • Document has a fixed set of fields, much like a filled form • Intended for document collections with fixed structures • Example – Mail archive as a set of mails – Each mail has a standard set of fields: • sender sender • receiver • subject • date • body – User can search for mails sent to a given person with ”football” in the subject field • Leads to the relational model – Extend SQL to full text retrieval -> SFQL TDT4215 Query Languages

  6. 11 Hypertext • Hypertext is a directed graph where the nodes hold some text and the links represent connections between nodes • Search by following hyperlinks • “give me documents that link to X” TDT4215 Query Languages 12 Hierarchical structure • Hierarchical structure is an intermediate structuring model that lies between fixed structure and hypertext structure • Sample of hierarchical models: Sample of hierarchical models: – PAT expressions Structure is marked in the text as tags (e.g. HTML) – Overlapped lists Hierarchical partly overlapping regions of text defined – Lists of references Lists of references Querying path expressions in text – Proximal nodes Many fixed hierarchical structures of text defined y – Tree matching Document and query gives a tree structure TDT4215 Query Languages

  7. 13 Conclusions • Query types: – Keyword-based queries: • Single-word queries • Context queries q • Boolean queries • Natural language – Pattern matching Pattern matching – Structural queries: • Fixed structure • Hypertext Hypertext • Hierarchical structure TDT4215 Query Languages

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend