computational methods for text analysis
play

COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM SOCIOLOGY AND - PowerPoint PPT Presentation

COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM SOCIOLOGY AND SOCIAL INFORMATICS Kirill Maslinsky 2020 Higher School of Economics Saint Petersburg 1/14 WHY DO YOU NEED IT THE TIME WE LIVE IN 2/14 JUST TO LEARN TO MAKE THOSE


  1. COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM “SOCIOLOGY AND SOCIAL INFORMATICS” Kirill Maslinsky 2020 Higher School of Economics — Saint Petersburg 1/14

  2. WHY DO YOU NEED IT

  3. THE TIME WE LIVE IN 2/14

  4. JUST TO LEARN TO MAKE THOSE PICTURES 1 1 Just kiddin 3/14

  5. IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14

  6. IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14

  7. IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14

  8. COURSE GOALS • provide basic understanding of how to properly use collections of texts • and to make this knowledge practical 5/14 as quantitative evidence,

  9. COURSE CONTENT

  10. BREAD AND BUTTER: TOPIC MODELING 6/14

  11. KILLER FEATURE: WORD EMBEDDINGS 7/14

  12. THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14

  13. THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14

  14. THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14

  15. COURSE TOPICS • word embeddings, • this is a really very boring slide, isn’t it? • information extraction from unstructured text. • sentiment analysis, • automating content analisys (extracting theme and topic), • Applied tasks: • language models. • topic modeling, • Basic word statistics: • document classifjcation and clusterization, • dictionary methods, • Methods for supervised and unsupervised modeling: • vector representation of text. • distributive semantics (word co-occurrence patterns), • lexical statistics (word frequency distributions), 9/14

  16. WHAT TO EXPECT

  17. HOW COURSEWORK WILL BE ORGANIZED Format: • OFFLINE — lectures, discussions, student presentations • ONLINE — practical work (programming exercises), tech support Content: • NLP basics • discussion of several recent articles (understanding methodology, reproducing parts of it) • Practicing analysis of textual data (with R) 10/14

  18. EXPECTATIONS Practical work with real texts in class and at home. • command line • mining your own text collection • R scripts • bugs in scripts, googling, bugs in scripts again • seeking and getting help from your peers and course instructor • happy end 11/14

  19. WORK IN GROUPS 12/14

  20. WHAT YOU CAN LEARN • State-of-the-art of natural language processing: • solved problems • topical issues and unsolved problems • Terms: • a minimal vocabulary of necessary linguistic terms (with meanings! :)) • appropriate keywords to search for current research and tools • Tools: • Where to apply methods for computational text analysis and how to interpret their results • Existing software for text analysis (for Russian and English) • Existing linguistic resources — dictionaries, corpora, pre-trained models (for Russian and English) 13/14

  21. GRADING 25% student presentations/paper summaries 10% practical exercises 45% lab works (3) 20% fjnal project (?) 14/14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend