computational semantics for the humanities
play

Computational semantics for the humanities Diarmuid O S eaghdha - PowerPoint PPT Presentation

Computational semantics for the humanities Diarmuid O S eaghdha Natural Language and Information Processing Group Computer Laboratory University of Cambridge do242@cam.ac.uk Translation and the Digital 25 April 2014 Introduction


  1. Computational semantics for the humanities Diarmuid ´ O S´ eaghdha Natural Language and Information Processing Group Computer Laboratory University of Cambridge do242@cam.ac.uk Translation and the Digital 25 April 2014

  2. Introduction ◮ “Big Data” revolution: ◮ We have access to more textual data than any human could ever read. ◮ We can perform some kinds of automated analysis over large datasets. ◮ For humanities researchers: ◮ Data mining is a tool that facilitates asking questions about language use. ◮ Data mining is not a question or an answer. ◮ Natural Language Processing (NLP) research gives us computational methods for analysing and interpreting text.

  3. Corpus frequency Proportional frequencies in Google Books corpus · 10 − 4 2 . 5 computer mouse 2 1 . 5 1 0 . 5 0 1900 1920 1940 1960 1980 2000

  4. Semantics: The distributional hypothesis ◮ Imagine that tezg¨ uino is a rare English word, and you saw the word used in the following sentences: 1. A bottle of tezg¨ uino is on the table. 2. Everyone likes tezg¨ uino . 3. Tezg¨ uino makes you drunk. 4. We make tezg¨ uino out of corn. (Lin, 1998) ◮ Can you guess what tezg¨ uino means? ◮ What kind of things do you expect will be similar to tezg¨ uino ? ◮ The Distributional Hypothesis: Two words are expected to be semantically similar if they have similar patterns of co-occurrence in observed text.

  5. Co-occurrences and similarity ◮ We can produce a distributional “profile” of a word from a corpus: farmer : part-time, sheep, peasant, tenant, wife, crop , . . . doctor : nurse, junior, prescribe, consult, patient, surgery ,. . . hospital : psychiatric, memorial, discharge, admission, clinic , . . . ◮ We can compute similarity between words by comparing their profiles.

  6. Semantic space visualisation British National Corpus, top 5000 dependencies kangaroo shark woman pet worker man doctor cat nurse chicken surgeon dog vet fish apple wine hospital food factory salad cinema hammer beer pizza surgery tool computer

  7. Discovering semantic classes BNC nouns, method related to Latent Dirichlet Allocation (topic modelling) Class 1 Class 2 Class 3 Class 4 attack test line university raid examination axis college assault check section school campaign testing circle polytechnic operation exam path institute incident scan track institution bombing assessment arrow library offensive sample curve hospital

  8. Tracking meaning over time ◮ Ongoing project (with Meng Zhang) ◮ We know that language changes over time. ◮ Words change their meaning by adding and losing senses and associations. ◮ Can we study this behaviour in a large corpus? ◮ Goal: “word biographies”. ◮ A historian of ideas might be interested in what a word meant to people at different points in time.

  9. Tracking meaning over time Meaning consistency in Google Books corpus computer 1 mouse 0 . 8 0 . 6 0 . 4 0 . 2 1900 1920 1940 1960 1980 2000

  10. Conclusion ◮ We have methods for extracting meaning from document collections: ◮ Comparing words and texts ◮ Clustering words/concepts ◮ Identifying themes in a corpus ◮ Identifying associations between words/concepts ◮ We need users in other fields to provide interesting questions. ◮ If you have ideas, say hi! Or send me an email at do242@cam.ac.uk .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend