linguistics 384 language and computers
play

Linguistics 384: Language and Computers Operators Searching the web - PowerPoint PPT Presentation

Language and Computers Topic 2: Searching Introduction Text Speech Searching in a Library Catalogue Special characters Linguistics 384: Language and Computers Operators Searching the web Topic 2: Searching Operators Improving searching


  1. Language and Computers Topic 2: Searching Introduction Text Speech Searching in a Library Catalogue Special characters Linguistics 384: Language and Computers Operators Searching the web Topic 2: Searching Operators Improving searching Ranking of results Evaluating search results Scott Martin ∗ Advanced searches with regular expressions Dept. of Linguistics, OSU Syntax of regular expressions Spring 2008 Grep: An example for using regular expressions Text corpora and searching them ∗ The course was created by Chris Brew, Markus Dickinson and Detmar Meurers. 1 / 33

  2. Language and Outline Computers Topic 2: Searching Introduction Text Speech Searching in a Introduction Library Catalogue Special characters Operators Searching the web Operators Improving searching Ranking of results Evaluating search results Advanced searches with regular expressions Syntax of regular expressions Grep: An example for using regular expressions Text corpora and searching them 2 / 33

  3. Language and Outline Computers Topic 2: Searching Introduction Text Speech Searching in a Introduction Library Catalogue Special characters Operators Searching in a Library Catalogue Searching the web Operators Improving searching Ranking of results Evaluating search results Advanced searches with regular expressions Syntax of regular expressions Grep: An example for using regular expressions Text corpora and searching them 2 / 33

  4. Language and Outline Computers Topic 2: Searching Introduction Text Speech Searching in a Introduction Library Catalogue Special characters Operators Searching in a Library Catalogue Searching the web Operators Improving searching Ranking of results Searching the web Evaluating search results Advanced searches with regular expressions Syntax of regular expressions Grep: An example for using regular expressions Text corpora and searching them 2 / 33

  5. Language and Outline Computers Topic 2: Searching Introduction Text Speech Searching in a Introduction Library Catalogue Special characters Operators Searching in a Library Catalogue Searching the web Operators Improving searching Ranking of results Searching the web Evaluating search results Advanced searches with regular expressions Advanced searches with regular expressions Syntax of regular expressions Grep: An example for using regular expressions Text corpora and searching them 2 / 33

  6. Language and Searching Computers Topic 2: Searching Introduction Text Speech Searching in a Library Catalogue ◮ An astounding number of information resources are Special characters available: books, databases, the web, newspapers, . . . Operators Searching the web ◮ To locate relevant information, we need to be able to Operators Improving searching search these resources, which often are written texts : Ranking of results Evaluating search results ◮ Searching in a library catalogue (e.g., using OSCAR) Advanced searches ◮ Searching the web (e.g., using Google) with regular expressions ◮ Advanced searching in text corpora (e.g., using regular Syntax of regular expressions Grep: An example for using expressions in Opus) regular expressions Text corpora and searching them 3 / 33

  7. Language and Searching in speech Computers Topic 2: Searching Introduction Text ◮ One might also want to search for speech , e.g., to find Speech Searching in a a particular sentence spoken in an interview one only Library Catalogue Special characters has a recording (audio file) of. Operators ◮ With current technology, this is only possible if the Searching the web Operators interview is transcribed, using the IPA or another writing Improving searching Ranking of results system. Evaluating search results Advanced searches ◮ It is, however, already possible to with regular expressions ◮ detect the language of a spoken conversation, e.g., Syntax of regular expressions when listening in to a telephone conversation Grep: An example for using regular expressions ◮ detect a new topic being started in a conversation Text corpora and searching them ◮ In the following, we focus on searching in text. 4 / 33

  8. Language and Searching in a library catalogue Computers Topic 2: Searching Introduction Text Speech ◮ To find articles, books, and other library holdings, a Searching in a Library Catalogue library generally provides a database containing Special characters Operators information on its holdings. Searching the web Operators ◮ OSCAR is the database frontend providing access to Improving searching Ranking of results the library database at OSU. Evaluating search results ◮ OSCAR makes it possible to search for the occurrence Advanced searches with regular expressions of literal strings occurring in the author, title, keywords, Syntax of regular expressions call number, etc. associated with an item held by the Grep: An example for using regular expressions library. Text corpora and searching them 5 / 33

  9. Language and Basic searching in OSCAR Computers Topic 2: Searching ◮ Literal strings are composed of characters which Introduction naturally must be in the same character encoding Text system (e.g. ASCII, ISO8859-1, UTF-8) as the strings Speech Searching in a encoded in the database. Library Catalogue Special characters ◮ For literal strings, OSCAR does not distinguish between Operators upper and lower-case letters (i.e. they aren’t so literal Searching the web Operators after all) Improving searching Ranking of results ◮ Adjacent words are searched as a phrase. Evaluating search results Advanced searches ◮ art therapy with regular expressions ◮ vitamin c Syntax of regular expressions Grep: An example for using ◮ In addition to querying literal strings, the query regular expressions Text corpora and searching them language of OSCAR also supports the use of ◮ special characters to abbreviate multiple options ◮ special operators for combining two query strings (boolean operators) or modifying the meaning of a single string (unary operators) 6 / 33

  10. Language and OSCAR: Special characters Computers Topic 2: Searching Introduction Text Speech ◮ Use * for 1–5 characters at end or within a word. Searching in a Library Catalogue ◮ art* finds arts, artists, artistic Special characters Operators ◮ gentle*n Searching the web Operators ◮ Use ** for any number of characters at end of word. Improving searching Ranking of results art** finds artificial, artillery Evaluating search results Advanced searches ◮ Use ? for a single character at end or within a word. with regular expressions gentlem?n Syntax of regular expressions Grep: An example for using ◮ The special * and ? characters must have at least 2 regular expressions Text corpora and searching characters to their left. ( → for efficiency reasons) them 7 / 33

  11. Language and OSCAR: Literal Strings and Operators (I) Computers Topic 2: Searching Introduction Text Speech Searching in a Library Catalogue ◮ Use and or or to specify multiple words in any field, any Special characters Operators order. Searching the web Operators ◮ art and therapy Improving searching Ranking of results ◮ art or therapy Evaluating search results Advanced searches ◮ Use and not to exclude words. with regular expressions ◮ art and not therapy Syntax of regular expressions Grep: An example for using regular expressions Text corpora and searching them 8 / 33

  12. Language and OSCAR: Operators (II) Computers Topic 2: Searching Introduction Text Speech ◮ Use parentheses to group words together when using Searching in a more than one operator. Library Catalogue Special characters art therapy and not ((music or dance) Operators therapy) Searching the web Operators ◮ Use near to specify words within 10 words of each Improving searching Ranking of results other, in any order. Evaluating search results Advanced searches ◮ art near therapy with regular expressions Syntax of regular expressions ◮ Use within n to specify words within n words of each Grep: An example for using regular expressions other. The value of n has no limit. Text corpora and searching them ◮ art within 12 therapy 9 / 33

  13. Language and Searching the web Computers Topic 2: Searching Introduction Text A computer user Speech Searching in a ◮ wants to find something on “the web”, i.e., in files Library Catalogue Special characters accessible via the hypertext transfer protocol (http) Operators protocol on the internet Searching the web Operators ◮ goes to a search engine = program that matches Improving searching Ranking of results Evaluating search results documents to a user’s search requests Advanced searches ◮ enters a query = request for information with regular expressions ◮ gets a list of websites that might be relevant to the query Syntax of regular expressions Grep: An example for using regular expressions ◮ evaluates the results : either picks a website with the Text corpora and searching them information looked for or reformulates the query 10 / 33

Recommend


More recommend