COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM SOCIOLOGY AND - - PowerPoint PPT Presentation

computational methods for text analysis
SMART_READER_LITE
LIVE PREVIEW

COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM SOCIOLOGY AND - - PowerPoint PPT Presentation

COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM SOCIOLOGY AND SOCIAL INFORMATICS Kirill Maslinsky 2020 Higher School of Economics Saint Petersburg 1/14 WHY DO YOU NEED IT THE TIME WE LIVE IN 2/14 JUST TO LEARN TO MAKE THOSE


slide-1
SLIDE 1

COMPUTATIONAL METHODS FOR TEXT ANALYSIS

BA PROGRAM “SOCIOLOGY AND SOCIAL INFORMATICS”

Kirill Maslinsky 2020

Higher School of Economics — Saint Petersburg 1/14

slide-2
SLIDE 2

WHY DO YOU NEED IT

slide-3
SLIDE 3

THE TIME WE LIVE IN

2/14

slide-4
SLIDE 4

JUST TO LEARN TO MAKE THOSE PICTURES

1

1Just kiddin

3/14

slide-5
SLIDE 5

IMAGINED AUDIENCE — ALLEGED GOALS

  • Practical type — learn to make those pictures
  • Sentimental type — make the machine understand those

texts for me

  • Philosopher type — why on earth it works at all?

4/14

slide-6
SLIDE 6

IMAGINED AUDIENCE — ALLEGED GOALS

  • Practical type — learn to make those pictures
  • Sentimental type — make the machine understand those

texts for me

  • Philosopher type — why on earth it works at all?

4/14

slide-7
SLIDE 7

IMAGINED AUDIENCE — ALLEGED GOALS

  • Practical type — learn to make those pictures
  • Sentimental type — make the machine understand those

texts for me

  • Philosopher type — why on earth it works at all?

4/14

slide-8
SLIDE 8

COURSE GOALS

  • provide basic understanding of how

to properly use collections of texts as quantitative evidence,

  • and to make this knowledge practical

5/14

slide-9
SLIDE 9

COURSE CONTENT

slide-10
SLIDE 10

BREAD AND BUTTER: TOPIC MODELING

6/14

slide-11
SLIDE 11

KILLER FEATURE: WORD EMBEDDINGS

7/14

slide-12
SLIDE 12

THE ICING ON THE CAKE: SENTIMENT ANALYSIS

8/14

slide-13
SLIDE 13

THE ICING ON THE CAKE: SENTIMENT ANALYSIS

8/14

slide-14
SLIDE 14

THE ICING ON THE CAKE: SENTIMENT ANALYSIS

8/14

slide-15
SLIDE 15

COURSE TOPICS

  • Basic word statistics:
  • lexical statistics (word frequency distributions),
  • distributive semantics (word co-occurrence patterns),
  • vector representation of text.
  • Methods for supervised and unsupervised modeling:
  • dictionary methods,
  • document classifjcation and clusterization,
  • topic modeling,
  • word embeddings,
  • language models.
  • Applied tasks:
  • automating content analisys (extracting theme and topic),
  • sentiment analysis,
  • information extraction from unstructured text.
  • this is a really very boring slide, isn’t it?

9/14

slide-16
SLIDE 16

WHAT TO EXPECT

slide-17
SLIDE 17

HOW COURSEWORK WILL BE ORGANIZED

Format:

  • OFFLINE — lectures, discussions, student presentations
  • ONLINE — practical work (programming exercises), tech

support Content:

  • NLP basics
  • discussion of several recent articles (understanding

methodology, reproducing parts of it)

  • Practicing analysis of textual data (with R)

10/14

slide-18
SLIDE 18

EXPECTATIONS

Practical work with real texts in class and at home.

  • command line
  • mining your own text collection
  • R scripts
  • bugs in scripts, googling, bugs in scripts again
  • seeking and getting help from your peers and course

instructor

  • happy end

11/14

slide-19
SLIDE 19

WORK IN GROUPS

12/14

slide-20
SLIDE 20

WHAT YOU CAN LEARN

  • State-of-the-art of natural language processing:
  • solved problems
  • topical issues and unsolved problems
  • Terms:
  • a minimal vocabulary of necessary linguistic terms (with

meanings! :))

  • appropriate keywords to search for current research and

tools

  • Tools:
  • Where to apply methods for computational text analysis

and how to interpret their results

  • Existing software for text analysis (for Russian and English)
  • Existing linguistic resources — dictionaries, corpora,

pre-trained models (for Russian and English)

13/14

slide-21
SLIDE 21

GRADING

25% student presentations/paper summaries 10% practical exercises 45% lab works (3) 20% fjnal project (?)

14/14