Information Retrieval Introduction 1 Hamid Beigy Sharif university - - PowerPoint PPT Presentation

information retrieval
SMART_READER_LITE
LIVE PREVIEW

Information Retrieval Introduction 1 Hamid Beigy Sharif university - - PowerPoint PPT Presentation

Information Retrieval Information Retrieval Introduction 1 Hamid Beigy Sharif university of technology October 6, 2018 1 Some slides have been adapted from slides of Manning, Yannakoudakis, and Sch utze. Hamid Beigy | Sharif university of


slide-1
SLIDE 1

Information Retrieval

Information Retrieval

Introduction1 Hamid Beigy

Sharif university of technology

October 6, 2018

1Some slides have been adapted from slides of Manning, Yannakoudakis, and

Sch¨ utze.

Hamid Beigy | Sharif university of technology | October 6, 2018 1 / 18

slide-2
SLIDE 2

Information Retrieval

Table of contents

  • 1. Course Information
  • 2. Introduction
  • 3. Course overview

Hamid Beigy | Sharif university of technology | October 6, 2018 2 / 18

slide-3
SLIDE 3

Information Retrieval | Course Information

Outline

1 Course Information 2 Introduction 3 Course overview

Hamid Beigy | Sharif university of technology | October 6, 2018 3 / 18

slide-4
SLIDE 4

Information Retrieval | Course Information

Course Information

1 Course name : Modern Information Retrieval 2 Instructor : Hamid Beigy

Email : beigy@sharif.edu

3 Course Website:

http://ce.sharif.edu/courses/97-98/1/ce324-2/

4 Lectures: Sat-Mon (10:30-12:00) 5 TAs :

Faeze Ghorbanpour Email: f.gorbanpor93@students.sharif.ir

Hamid Beigy | Sharif university of technology | October 6, 2018 3 / 18

slide-5
SLIDE 5

Information Retrieval | Course Information

Course evaluation

Evaluation: Mid-term exam 20% 1398/7/28 Mid-term exam 20% 1397/8/28 Final exam 30% Practical Assignments 25% Quiz 10%

Hamid Beigy | Sharif university of technology | October 6, 2018 4 / 18

slide-6
SLIDE 6

Information Retrieval | Course Information

Main Reference

Hamid Beigy | Sharif university of technology | October 6, 2018 5 / 18

slide-7
SLIDE 7

Information Retrieval | Course Information

References

  • R. Baeza-Yates and B. Ribeiro-Neto.

Modern Information Retrieval. Addison-Wesley Publishing Company, USA, 2nd edition, 2011.

  • G. Kowalski.

Information Retrieval Architecture and Algorithms. Springer-Verlag, Berlin, Heidelberg, 1st edition, 2010.

  • C. D. Manning, P. Raghavan, and H. Sch¨

utze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

Hamid Beigy | Sharif university of technology | October 6, 2018 6 / 18

slide-8
SLIDE 8

Information Retrieval | Introduction

Outline

1 Course Information 2 Introduction 3 Course overview

Hamid Beigy | Sharif university of technology | October 6, 2018 7 / 18

slide-9
SLIDE 9

Information Retrieval | Introduction

Definition of information retrieval

1 Information retrieval (IR) is finding material (usually documents) of

an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

2 Document Collection: units we have built an IR system over.

Documents can be

1 memos 2 book chapters paragraphs 3 scenes of a movie 4 turns in a conversation... 3 These days we frequently think first of web search, but there are

many other cases:

E-mail search Searching your laptop Corporate knowledge bases Legal information retrieval

Hamid Beigy | Sharif university of technology | October 6, 2018 7 / 18

slide-10
SLIDE 10

Information Retrieval | Introduction

Structured vs Unstructured Data

Unstructured data means that a formal, semantically overt, easy-for-computer structure is missing. In contrast to the rigidly structured data used in DB style searching (e.g. product inventories, personnel records) SELECT * FROM business-catalogue WHERE category = ”florist” AND city-zip = ”cb1” This does not mean that there is no structure in the data

Document structure (headings, paragraphs, lists. . . ) Explicit markup formatting (e.g. in HTML, XML. . . ) Linguistic structure (latent, hidden)

Hamid Beigy | Sharif university of technology | October 6, 2018 8 / 18

slide-11
SLIDE 11

Information Retrieval | Introduction

Information Needs and Relevance

1 Information retrieval (IR) is finding material (usually documents) of

an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

2 An information need is the topic about which the user desires to know

more about.

3 A query is what the user conveys to the computer in an attempt to

communicate the information need.

4 Types of information needs 1 Known-item search 2 Precise information seeking search 3 Open-ended search (topical search)

Hamid Beigy | Sharif university of technology | October 6, 2018 9 / 18

slide-12
SLIDE 12

Information Retrieval | Introduction

Structured vs Unstructured data growth

Hamid Beigy | Sharif university of technology | October 6, 2018 10 / 18

slide-13
SLIDE 13

Information Retrieval | Introduction

Relevance

1 A document is relevant if the user perceives that it contains

information of value with respect to their personal information need.

2 Are the retrieved documents 1 about the target subject 2 up-to-date? 3 from a trusted source? 4 satisfying the users needs? 3 How should we rank documents in terms of these factors?

Hamid Beigy | Sharif university of technology | October 6, 2018 11 / 18

slide-14
SLIDE 14

Information Retrieval | Introduction

Information Retrieval Basics

IR System Query Document Collection Set of relevant documents

Hamid Beigy | Sharif university of technology | October 6, 2018 12 / 18

slide-15
SLIDE 15

Information Retrieval | Introduction

How well has the system performed?

The effectiveness of an IR system (i.e., the quality of its search results) is determined by two key statistics about the systems returned results for a query:

Precision: What fraction of the returned results are relevant to the information need? Recall: What fraction of the relevant documents in the collection were returned by the system? What is the best balance between the two?

Easy to get perfect recall: just retrieve everything Easy to get good precision: retrieve only the most relevant

Hamid Beigy | Sharif university of technology | October 6, 2018 13 / 18

slide-16
SLIDE 16

Information Retrieval | Introduction

A short history of IR

1945 1950s 1960s 1970s 1980s 1990s 2000s

memex T erm IR coined by Calvin Moers Literature searching systems; evaluation by P&R (Alan Kent) Cranfield experiments Boolean IR SMART

1 recall precision no items retrieved precision/ recall

Salton; VSM pagerank TREC Multimedia Multilingual (CLEF) Recommendation Systems

Hamid Beigy | Sharif university of technology | October 6, 2018 14 / 18

slide-17
SLIDE 17

Information Retrieval | Introduction

IR for non-textual media

Hamid Beigy | Sharif university of technology | October 6, 2018 15 / 18

slide-18
SLIDE 18

Information Retrieval | Introduction

Unstructured data in 1650

Which plays of Shakespeare contain the words Brutus and Caesar, but not Calpurnia? One could grep all of Shakespeare’s plays for Brutus and Caesar, then strip out lines containing Calpurnia. Why is grep not the solution?

Slow (for large collections) grep is line-oriented, IR is document-oriented “not Calpurnia” is non-trivial Other operations (e.g., find the word Romans near countryman) not feasible

Hamid Beigy | Sharif university of technology | October 6, 2018 16 / 18

slide-19
SLIDE 19

Information Retrieval | Introduction

Web Information Retrieval

IR System Query web pages Set of relevant web pages

Hamid Beigy | Sharif university of technology | October 6, 2018 17 / 18

slide-20
SLIDE 20

Information Retrieval | Course overview

Outline

1 Course Information 2 Introduction 3 Course overview

Hamid Beigy | Sharif university of technology | October 6, 2018 18 / 18

slide-21
SLIDE 21

Information Retrieval | Course overview

Course overview

Introduction Indexing and text operations IR Models ( Boolean, vector space, probabilistic) Evaluation of IR systems Query operations Machine Learning in IR (Classification, clustering, and ranking) Web Information Retrieval Some advanced topics

Hamid Beigy | Sharif university of technology | October 6, 2018 18 / 18