Computational methods for text analysis BA program Sociology and - - PowerPoint PPT Presentation

computational methods for text analysis
SMART_READER_LITE
LIVE PREVIEW

Computational methods for text analysis BA program Sociology and - - PowerPoint PPT Presentation

Computational methods for text analysis BA program Sociology and Social Informatics Kirill Maslinsky 2018 Higher School of Economics Saint Petersburg 1/12 Why do you need it Just to learn to make those pictures 1 1 Just kiddin


slide-1
SLIDE 1

Computational methods for text analysis

BA program “Sociology and Social Informatics”

Kirill Maslinsky 2018

Higher School of Economics — Saint Petersburg 1/12

slide-2
SLIDE 2

Why do you need it

slide-3
SLIDE 3

Just to learn to make those pictures

1

1Just kiddin

2/12

slide-4
SLIDE 4

Scale up

population studied “all social media users of a town” time spans “all of the Post-Soviet history” geographical scope “all educational migration in Russia”

3/12

slide-5
SLIDE 5

Course goals

  • provide basic understanding of how

to properly use collections of texts as quantitative evidence,

  • and to make this knowledge practical

4/12

slide-6
SLIDE 6

Course content

slide-7
SLIDE 7

Bread and butter: Topic modeling

5/12

slide-8
SLIDE 8

Killer feature: Word embeddings

6/12

slide-9
SLIDE 9

The icing on the cake: Sentiment analysis

7/12

slide-10
SLIDE 10

The icing on the cake: Sentiment analysis

7/12

slide-11
SLIDE 11

The icing on the cake: Sentiment analysis

7/12

slide-12
SLIDE 12

Course topics

  • Basic word statistics:
  • lexical statistics (word frequency distributions),
  • distributive semantics (word co-occurrence patterns),
  • vector representation of text.
  • Methods for supervised and unsupervised modeling:
  • dictionary methods,
  • document classification and clusterization,
  • topic modeling,
  • word embeddings,
  • sequence modeling.
  • Applied tasks:
  • automating content analisys (extracting theme and topic),
  • sentiment analysis,
  • information extraction from unstructured text.
  • this is a really very boring slide, isn’t it?

8/12

slide-13
SLIDE 13

What to expect

slide-14
SLIDE 14

How coursework will be organized

  • An interesting recent article
  • with an explanation of the necessary concepts and

methods during lecture

  • followed by detailed analysis of the method in class
  • concluded by the task to reproduce the method with your
  • wn data

9/12

slide-15
SLIDE 15

Expectations

Practical work with real texts in class and at home.

  • command line
  • mining your own text collection
  • R scripts
  • bugs in scripts, googling, bugs in scripts again
  • seeking and getting help from your peers and course

instructor

  • happy end

10/12

slide-16
SLIDE 16

Work in groups

11/12

slide-17
SLIDE 17

What you can learn

  • State-of-the-art of natural language processing:
  • solved problems
  • topical issues and unsolved problems
  • Terms:
  • a minimal vocabulary of necessary linguistic terms (with

meanings! :))

  • appropriate keywords to search for current research and

tools

  • Tools:
  • Where to apply methods for computational text analysis

and how to interpret their results

  • Existing software for text analysis (for Russian and English)
  • Existing linguistic resources — dictionaries, corpora,

pre-trained models (for Russian and English)

12/12