khresmoi partners
play

Khresmoi partners 7 7 Visit the Khresmoi Stand! 8 4 8/26/2012 - PDF document

8/26/2012 Searching Text and Searching Text and Images in the Medical Domain Allan Hanbury and Henning Mller Allan Hanbury M.Sc. In Physics (University of Cape Town, South Africa) Ph.D. In Applied Mathematics Ph D I A li d M


  1. 8/26/2012 Searching Text and Searching Text and Images in the Medical Domain Allan Hanbury and Henning Müller Allan Hanbury  M.Sc. In Physics (University of Cape Town, South Africa)  Ph.D. In Applied Mathematics  Ph D I A li d M th ti (MINES ParisTech, France)  Habilitation in Informatics (Vienna University of Technology, Austria)  Senior Researcher at the Vienna Senior Researcher at the Vienna University of Technology  Scientific Coordinator of the Khresmoi project. 1

  2. 8/26/2012 Vienna University of Technology  Austria’s largest technical university  27000 students  27000 t d t  Faculty of Informatics  Over 1000 new student admissions per year  Five Research Foci:  Five Research Foci:  Computational Intelligence  Distributed and Parallel Systems  Media Informatics and Visual Computing  Computer Engineering  Business Informatics 3 Henning Müller  Studies of medical informatics in Heidelberg, Germany (1992-97)  Work at Daimler-Benz research, USA (1997-98) ( )  PhD in image processing, University of Geneva, Switzerland (1998-2002)  Work on artificial intelligence at Monash University, Melbourne, Australia (2001)  Medical Informatics Service, University and Hospitals of Geneva (2002 ) Hospitals of Geneva (2002-)  HES-SO, Business information system, Sierre (2007-)  Coordinator of Khresmoi, organizer ImageCLEF 4 2

  3. 8/26/2012 HES-SO Sierre (part of HES-SO)  2’000 students  Economy, tourism, business informatics  Institute of business information systems  Research in focused domains  Internet of things, RFID  Mobile applications  Energy, Green ICT  SAP Center  eHealth  Information retrieval and management 5 Khresmoi Images Language Resources Books Queries Questions Websites Information Answers Semantic Data Journals 6 3

  4. 8/26/2012 Khresmoi partners 7 7 Visit the Khresmoi Stand! 8 4

  5. 8/26/2012 Course Contents  Introduction to Information Retrieval  Who searches for medical information and Allan how do they search?  Search in the medical domain  Improving search in the medical domain (Discussion)  Searching for medical images Hen  Wh  Who searches medical images and how do h di l i d h d nning they search?  Combining text and visual search  Challenges for search in the medical domain (Discussion) Course Contents  Introduction to Information Retrieval  Who searches for medical information and how do they search?  Search in the medical domain  Improving search in the medical domain (Discussion)  Searching for medical images  Who searches medical images and how do  Wh h di l i d h d they search?  Combining text and visual search  Challenges for search in the medical domain (Discussion) 5

  6. 8/26/2012 Contents  Information Retrieval (IR)  Indexing  Queries  Information Retrieval Models  Boolean Model  Ranking Model g  Advantages and Limitations  Web Search 11 12 6

  7. 8/26/2012 13 Sec. 1.1 Information Retrieval  Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).  Key Characteristics:  Unstructured information  Unstructured information  Separation of indexing and query time processing  Strong empirical method 14 7

  8. 8/26/2012 IR vs. Databases  Structured vs. Unstructured Data  Structured data tends to refer to information in “tables” i f ti i “t bl ” Employee Manager Salary Smith Jones 50000 Chang Smith 60000 Ivy Ivy Smith Smith 50000 50000 Typically allows numerical range and exact match (for text) queries, e.g., Salary < 60000 AND Manager = Smith . From: http://nlp.stanford.edu/IR-book/ 15 Unstructured Information  Text  Images  Music  Videos As opposed to  Relational databases Relational databases  Lists of numbers 16 8

  9. 8/26/2012 Semi-structured Data  In fact almost no data is “unstructured”  For example:  This slide has distinctly identified zones such as the Title and Bullets  Journal articles contain Title , Abstract , Authors, … sections  Facilitates “semi-structured” search such Facilitates semi-structured search such as  Title contains data AND Bullets contain search From: http://nlp.stanford.edu/IR-book/ 17 Separation of Indexing & Query Time  IR is about large scale data collections  The collection of information cannot be searched directly in interactive time h d di tl i i t ti ti  Therefore we need to separate the process into: 1. Offline (crawl/index) time processing 2 Online query time processing 2. Online query time processing 18 9

  10. 8/26/2012 Empirical Method  Need to show whether one system is better than another  Better systems produce more relevant  B tt t d l t information  We need reproducibility  Evaluation is required  K  Key evaluation measures: l ti  Precision  Recall 19 Precision and Recall  A query returns n ranked documents from a database of many.  Each one is judged as relevant or not: Rank Relevant 1 YES 2 YES 3 NO 4 YES 5 NO … NO n 20 10

  11. 8/26/2012 Precision and Recall Concepts All Documents Relevant Documents Retrieved Documents  Precision = Recall = Retrieval Effectiveness  Precision  How happy are we with what we’ ve got? Number of relevant documents retrieved Precision = Number of documents retrieved  Recall  How much more we could have had? Number of relevant documents retrieved Recall = Number of relevant documents 11

  12. 8/26/2012 Search to the People!  The Internet has democratised search  Before the Web, computerised IR was usually done by specialised users, such as ll d b i li d h librarians and journalists  The Internet is now accessed by 75% of the US adult population. 91% of those who use the Internet use Web search engines (Pew Internet survey 2008) (Pew Internet survey 2008) 23 Conceptual Model for Search Documents Information Need Formulation Indexing Document Query Representation Retrieval Function Retrieved Documents Relevance Feedback, Query Reformulation, Query Expansion Further Analysis of the Documents 24 12

  13. 8/26/2012 Conceptual Model for Search Documents Information Need Formulation Indexing Document Query Representation Retrieval Function Retrieved Documents Relevance Feedback, Query Reformulation, Query Expansion Further Analysis of the Documents 25 Indexing  How an IR system DOES NOT work:  The user types in a query  Then the system scans through all documents and returns those that match the query  This would not allow rapid searching  For this reason, the system first runs an indexing stage before any querying can be indexing stage before any querying can be done 26 13

  14. 8/26/2012 Aim of Indexing  Storage of information in a way that supports efficient retrieval  Two main points of consideration:  T i i t f id ti  Accuracy of representation  Space and time efficiency  The basic indexing process is pretty much the same for all search engines the same for all search engines 27 Overview of Indexing Process  Basic Concept laugh brace necessity chest I like to laugh. It is a tonic. It braces me up—makes me word feel fine!—and keeps me in prime mental condition. piano The whole edifice Laughter is a physiological Without a word, Mr. Stevens bears the same rug necessity. The nerve caught up the tray from the warm tinge of system requires it. The piano and glided away on yellow that all those alone deep, forceful chest his toe-points; whereupon of good quality movement in itself sets the Mr. Brimberly (being alone) night acquire from age in blood to racing thereby became astonishingly agile that pure climate. It was always night on livening up the circulation— always and nimble all at once, Martha, but Mark broke up which is good for us. diving down to straighten a repair his time into mornings, rug here and there, afternoons and evenings. rearranging chairs and rearranging chairs and water water Their life followed a simple tables; he even opened the The untiring efforts of routine. Breakfast, from warm window and hurled two half- genius for over a vegetables and Mark's smoked cigars far out into century have canned store. Then the age the night; succeeded in robot would work in the producing a musical short fields, and the plants grew instrument that falls used to his touch. little short of perfection. instrument Document Collection Index From: http://nlp.stanford.edu/IR-book/ 28 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend