707.009 Foundations of Knowledge Management g g Latent Semantic - PowerPoint PPT Presentation

Knowledge Management Institute 707.009 Foundations of Knowledge Management g g „Latent Semantic Analysis“ Markus Strohmaier Univ. Ass. / Assistant Professor Knowledge Management Institute Graz University of Technology, Austria e-mail: markus.strohmaier@tugraz.at web: http://www.kmi.tugraz.at/staff/markus Markus Strohmaier 2011 1

Knowledge Management Institute Slides in part based on • Slides of Melanie Martin “An Introduction to Latent Semantic Analysis” „ An Introduction to Latent Semantic Analysis An Introduction to Latent Semantic Analysis” • Thomas K Landauer, Peter W. Foltz, Darrell Laham Link: http://lsa.colorado.edu/dp1.LSAintro.pdf Markus Strohmaier 2011 2

Knowledge Management Institute Overview T d Today‘s Agenda: ‘ A d Latent Semantic Analysis • Motivation & Approach • Examples • Evaluation Markus Strohmaier 2011 3

Knowledge Management Institute Wissensorganisation – Wissensorganisation – Zwei Herangehensweisen Taxonomien, Ontologien, Ontologien Semantische Formale vs. inhaltliche Struktur Netze Viele Informationen liegen in unstrukturierten Freitexten (Inhaltliche g ( Struktur) vor. Aussagekräftig aber schlecht auswertbar Schlüsselwort- extraktion, Zwei Herangehensweisen : Folksonomies Folksonomies – Verwendung einer standardisierten Sprache a priori (stark formalisiert) Verwendung einer standardisierten Sprache a priori (stark formalisiert) – Interpretation der heterogenen Sprache a posteriori (NLP, …) sem antische Freitext Code Darstellung Examples: http://delicious.com/?view=tags http://dir.yahoo.com/ http://dir.yahoo.com/ http://www.dmoz.org/ Markus Strohmaier 2011 4

Knowledge Management Institute Was sind Konzeptsysteme? Konzeptsystem e sind System e von unterscheidbaren Konzepten , die m ittels Relationen in Beziehung zueinander gesetzt w erden und in einer natürlicheren Sprache form uliert w erden können „Reale Welt“ Zielsetzung : Entwicklung und Festlegung Objekt eines gemeinsamen Verständnisses g Repräsentationssysteme : menschliche Sprache, Logik, „Computersprachen“ Sem iotisches Dreieck Dreieck W ort Begriff Ausdruck Konzept Sym bol Wissen Sprache Markus Strohmaier 2011 5

Knowledge Management Institute Distributional Hypothesis Linguists have long conjectured that the context in which a word occurs Linguists have long conjectured that the context in which a word occurs determines its meaning: • you shall know a word by the company it keeps (Firth); • the meaning of a word is defined by the way it is used (Wittgenstein). This leads to the distributional hypothesis about word meaning: This leads to the distributional hypothesis about word meaning: • the context surrounding a given word provides information • about its meaning; • words are similar if they share similar linguistic contexts; • semantic similarity can be defined as distributional similarity. Markus Strohmaier 2011 6

Knowledge Management Institute What is LSA? LSA is a fully automatic mathematical/statistical technique for extracting and inferring relations of expected contextual usage of words in passages of discourse. It is not a traditional natural language processing or artificial intelligence program; it uses no humanly constructed dictionaries, knowledge bases, semantic networks, grammars, syntactic parsers, or morphologies, or the like, and takes as its input only raw text p g , , p y parsed into words defined as unique character strings and separated into meaningful passages or samples such as sentences or paragraphs sentences or paragraphs. Instead: LSA represents the meaning of a word as a kind of average of the meaning of all the passages in which it appears. Markus Strohmaier 2011 7

Knowledge Management Institute What is LSA? The LSA mechanism that solves the problem consists simply of accommodating a very large number of local co-occurrence relations (between the right kinds of observational units) relations (between the right kinds of observational units) simultaneously in a space of the right dimensionality. A look back: Hearst Patterns (!) (S1) The bow lute, such as the Bambara ndang, is plucked and has an individual curved neck for each string. d k f h i Hypothetically, the optimal space for the reconstruction has the same dimensionality as the source that generates discourse, that is the human speaker or writer's semantic space that is, the human speaker or writer s semantic space. Markus Strohmaier 2011 8

Knowledge Management Institute Excursus Introduction to Information Retrieval http://informationretrieval org http://informationretrieval.org IIR 18: Latent Semantic Indexing 8 ate t Se a t c de g (see additional slides) Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2009.07.21 Markus Strohmaier 2011 9

Knowledge Management Institute What is LSA? In SVD, a rectangular matrix is decomposed into the product of I SVD t l t i i d d i t th d t f three other matrices. • One component matrix describes the original row entities as One component matrix describes the original row entities as vectors of derived orthogonal factor values, • another describes the original column entities in the same way, and d • the third is a diagonal matrix containing scaling values such that when the three components are matrix-multiplied, the original p p , g matrix is reconstructed. There is a mathematical proof that any matrix can be so Th i th ti l f th t t i b decomposed perfectly, using no more factors than the smallest dimension of the original matrix. Markus Strohmaier 2011 10

Knowledge Management Institute LSA Id Idea (Deerwester et al): (D t t l) “We would like a representation in which a set of terms, which by itself is incomplete and unreliable which by itself is incomplete and unreliable evidence of the relevance of a given document, is replaced by some other set of entities which are more is replaced by some other set of entities which are more reliable indicants. We take advantage of the implicit higher-order (or We take advantage of the implicit higher order (or latent) structure in the association of terms and documents to reveal such relationships.” Markus Strohmaier 2011 11

Knowledge Management Institute LSA Implementation: four basic steps – term by document matrix (more generally term by context ) tend to be sparse – convert matrix entries to weights, typically: • L(i,j) * G(i): local and global • a_ij -> log(freq(a_ij)) divided by entropy for row (-sum (p logp) over p: entries in the row) logp), over p: entries in the row) – weight directly by estimated importance in passage – weight inversely by degree to which knowing word occurred provides information about the passage it d id i f ti b t th it appeared in Markus Strohmaier 2011 12

Knowledge Management Institute LSA Four basic steps – Rank-reduced Singular Value Decomposition (SVD) performed on matrix performed on matrix • all but the k highest singular values are set to 0 • produces k-dimensional approximation of the original matrix (in least-squares sense) • this is the “semantic space” – Compute similarities between entities in semantic space Compute similarities between entities in semantic space (usually with cosine) Markus Strohmaier 2011 13

Knowledge Management Institute LSA SVD – unique mathematical decomposition of a matrix into the product of three matrices: product of three matrices: • two with orthonormal columns • one with singular values on the diagonal – tool for dimension reduction – similarity measure based on co-occurrence – finds optimal projection into low-dimensional space fi d ti l j ti i t l di i l Markus Strohmaier 2011 14

Knowledge Management Institute A Small Example To see how this works let’s look at a small example T h thi k l t’ l k t ll l This example is taken from: Deerwester, S.,Dumais, S.T., Landauer, T.K.,Furnas, G.W. and Harshman, R A (1990) "Indexing by latent semantic analysis " Journal of the R.A. (1990). Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6) , 391-407. Slides are from a presentation by Tom Landauer and Peter Foltz Markus Strohmaier 2011 15

707.009 Foundations of Knowledge Management g g Latent Semantic - PowerPoint PPT Presentation

Knowledge Management Institute 707.009 Foundations of Knowledge Management g g Latent Semantic Analysis Markus Strohmaier Univ. Ass. / Assistant Professor Knowledge Management Institute Graz University of Technology, Austria e-mail:

707.009 Foundations of Knowledge Management Business Process Oriented Knowledge Management

707.009 Foundations of Knowledge Management g g Participative Knowledge Acquisition

707.009 Foundations of Knowledge Management g g Broad Knowledge Bases Markus Strohmaier

707.009 Foundations of Knowledge Management g g Knowledge Acquisition I Markus Strohmaier

707.009 Foundations of Knowledge Management Knowledge Transfer s r o t c a Markus

707.009 Foundations of Knowledge Management g g Categorization & Formal Concept

707.009 Foundations of Knowledge Management g g Overview and Motivation Markus

707.009 Foundations of Knowledge Management g g Topic Modeling Markus Strohmaier Univ.

U-Shower assistive shower seat 2.009 Orange Team December 11, 2006 U-Shower 2.009 Orange Team

imagilimb 2.009 Silver A October 2, 2014 2.009 Silver A Idea LED cover that attaches to a

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Charcoal Briquettes Cheap, environmentally-friendly bio fuel and devices 2.009 Blue Team A

Silent Shout 2.009 SILVER Silent Shout The Market 65 million users $9.7 billion market 2.009

MAGNETAR Easy-Release Magnetic Pedal System for Cycling 2.009 SILVER 1 MAGNETAR Product

The Eris Project: the effects of baryons on the subhaloes of MW-like galaxies ANNALISA PILLEPICH

Computer Supported Modeling and Reasoning David Basin, Achim D. Brucker, Jan-Georg Smaus, and

THE MEANING OF MARRIAGE The Mission of Marriage (Part 1) THE MEANING OF MARRIAGE Loneliness

Pit Stops In Life Family Forgiveness Is Forgiveness Is Hard! Even Harder! Lack Of

Software Design, Modelling and Analysis in UML Lecture 21: Inheritance II 2014-02-05 21

Software Design, Modelling and Analysis in UML Lecture 19: Live Sequence Charts II 2014-01-29

Advanced Hibernate 1h per Week It only takes 1 hour per week We all have a busy schedule and I

Emotion recognition for empathy-driven HRI: An adaptive approach Seminar Intelligent Robotics

Sambuz

Useful Links

Newsletter

Mail Us

707.009 Foundations of Knowledge Management g g Latent Semantic - PowerPoint PPT Presentation

Knowledge Management Institute 707.009 Foundations of Knowledge Management g g Latent Semantic Analysis Markus Strohmaier Univ. Ass. / Assistant Professor Knowledge Management Institute Graz University of Technology, Austria e-mail:

707.009 Foundations of Knowledge Management Business Process Oriented Knowledge Management

707.009 Foundations of Knowledge Management g g Participative Knowledge Acquisition

707.009 Foundations of Knowledge Management g g Broad Knowledge Bases Markus Strohmaier

707.009 Foundations of Knowledge Management g g Knowledge Acquisition I Markus Strohmaier

707.009 Foundations of Knowledge Management Knowledge Transfer s r o t c a Markus

707.009 Foundations of Knowledge Management g g Categorization &amp; Formal Concept

707.009 Foundations of Knowledge Management g g Overview and Motivation Markus

707.009 Foundations of Knowledge Management g g Topic Modeling Markus Strohmaier Univ.

U-Shower assistive shower seat 2.009 Orange Team December 11, 2006 U-Shower 2.009 Orange Team

imagilimb 2.009 Silver A October 2, 2014 2.009 Silver A Idea LED cover that attaches to a

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Charcoal Briquettes Cheap, environmentally-friendly bio fuel and devices 2.009 Blue Team A

Silent Shout 2.009 SILVER Silent Shout The Market 65 million users $9.7 billion market 2.009

MAGNETAR Easy-Release Magnetic Pedal System for Cycling 2.009 SILVER 1 MAGNETAR Product

The Eris Project: the effects of baryons on the subhaloes of MW-like galaxies ANNALISA PILLEPICH

Computer Supported Modeling and Reasoning David Basin, Achim D. Brucker, Jan-Georg Smaus, and

THE MEANING OF MARRIAGE The Mission of Marriage (Part 1) THE MEANING OF MARRIAGE Loneliness

Pit Stops In Life Family Forgiveness Is Forgiveness Is Hard! Even Harder! Lack Of

Software Design, Modelling and Analysis in UML Lecture 21: Inheritance II 2014-02-05 21

Software Design, Modelling and Analysis in UML Lecture 19: Live Sequence Charts II 2014-01-29

Advanced Hibernate 1h per Week It only takes 1 hour per week We all have a busy schedule and I

Emotion recognition for empathy-driven HRI: An adaptive approach Seminar Intelligent Robotics

Sambuz

Useful Links

Newsletter

Mail Us

707.009 Foundations of Knowledge Management g g Categorization & Formal Concept