From Emotion Analysis and Topic Extraction to Narrative Modeling - - PowerPoint PPT Presentation

from emotion analysis and topic extraction to narrative
SMART_READER_LITE
LIVE PREVIEW

From Emotion Analysis and Topic Extraction to Narrative Modeling - - PowerPoint PPT Presentation

From Emotion Analysis and Topic Extraction to Narrative Modeling Andreea Kremm Mohammed Ibraaz Syed About us q Andreea Kremm Founder of Netex Group (www.netex.ai) M.Sc. Psychology (University of Roehampton, London) Research Interests:


slide-1
SLIDE 1

From Emotion Analysis and Topic Extraction to Narrative Modeling

Andreea Kremm Mohammed Ibraaz Syed

slide-2
SLIDE 2

About us

qAndreea Kremm

§ Founder of Netex Group (www.netex.ai) § M.Sc. Psychology (University of Roehampton, London) § Research Interests: combining the power of AI, neural networks, and psychology for economic applications through Narrative Economics

qMohammed Ibraaz Syed

§ B.A. Economics, B.Sc. Mathematics (University of Maryland, College Park) § Master’s in Applied Economics (UCLA) § Research Interests: applying AI and machine learning to extract narratives from text

slide-3
SLIDE 3

What are Narratives? How do Narratives affect the Economy?

"This past year has been the most difficult and painful year

  • f my career.

It was excruciating."

Elon Musk, New York Times interview, 08/16/2018 https://www.nytimes.com/2018/08/16/business/elon-musk-interview-tesla.html

slide-4
SLIDE 4

Musk´s Narrative effect on Tesla´s stock:

https://finance.yahoo.com/chart/TSLA

slide-5
SLIDE 5

What is Narrative Economics?

slide-6
SLIDE 6

How do Narratives spread?

  • Kermack-McKendrick (1927)

mathematical theory of disease epidemics

  • SIR – Model: S=susceptible, I=infected,

R=recovered, where N=S+I+R is assumed constant

  • Powerful narratives spread, mutate, and

propagate like a virus

slide-7
SLIDE 7

The Structure of a Narrative

  • The Plot (Overcoming the Monster, Rags to Riches, Voyage,

Return, Comedy, Tragedy, Rebirth, etc.)

  • The Characters (Hero, Villain, Maiden, King, etc.)
  • Emotionally engaging​
  • Take-Away / Lesson, Call to Action
  • A good story is easily remembered and gladly retold
slide-8
SLIDE 8

Narrative Modeling Algorithm

  • Analyze a Narrative:​

ü Emotion Analysis ​ ​ ü Entity-Relation Extraction​ ü Topic Extraction ü Subject Modeling ​

  • Insert into an SIR Disease Epidemics Model Equation ​
  • Predicting Narrative Spread and Economic Consequences​
slide-9
SLIDE 9

Emotion Analysis Showcase

  • Task: recognize emotions in written English text
  • Solution: Bi-LSTM trained as a classifier
  • Resources:

ü ​NRC-EmoLex (National Research Council Canada Word-Emotion

Association Lexicon) ü Facebook´s FastText ü Training dataset: 7,665 emotion labeled sentences from the Association for the Advancement of Affective Computing (AAAC)

slide-10
SLIDE 10

Methodology

slide-11
SLIDE 11

Emotion Analysis Results

  • Random accuracy: 20% (Baseline)​
  • Softmax (word counting) accuracy:

21%

  • IBM Watson Tone Analyzer: 39%
  • IBM Watson NLU: 58%​
  • Bi-LSTM with 128 LSTM cells in one layer: 66%
  • Bi-LSTM with 32 LSTM cells in four layers: 71%
slide-12
SLIDE 12

Visualizing Entity Embeddings

slide-13
SLIDE 13

Challenges and Limitations

  • Limited size of the training dataset​
  • Limited size of NRC-EmoLex​
  • Single label emotions​
  • No subject modeling​
  • No information about the author´s context
  • Topic was disregarded
slide-14
SLIDE 14
  • 1. Where do we find an appropriate data set of narrative-rich

text?

  • 2. How do we pre-process the data to facilitate narrative

extraction?

  • 3. How do we estimate the number of narratives (topics)?
  • 4. How do we estimate narrative similarity and model their

evolution?

Topic Extraction Showcase

  • Four Key Problems to Solve:
slide-15
SLIDE 15

Selecting an Appropriate Dataset

  • Politicians are often responsible for spreading narratives
  • Press releases issues by politicians
  • Politicians’ social media accounts
  • Social media messages often lack context
  • News data is often labeled with categories / topics and related issues
  • Social media and news data can complement each other
slide-16
SLIDE 16

Data and Pre-Processing

  • Data sets selected (solution to 1st problem):
  • White House Press Briefings from January 20, 2017 onwards
  • Tweets by President Donald Trump from January 20, 2017
  • nwards
  • Narrative extraction-specific pre-processing (solution to 2nd problem):
  • Pre-existing labels incorporated into document strings
  • Summaries of documents also added to their strings
  • 2017 data divided into six 2-month time periods:

Period 1 Period 2 Period 3 Period 4 Period 5 Period 6

January – February March – April May – June July – August September – October November – December

slide-17
SLIDE 17

Methodology (1)

  • Additional pre-processing
  • Stop words removed
  • Terms appearing in 90%+ of documents ignored
  • Unigrams and bigrams considered
  • Conversion into TFIDF matrix – to filter out most important words
  • Documents as rows, Words as columns
  • Entries correspond to word counts in each document
  • Entries of words occurring in multiple documents downweighed
  • Different matrix for each time period
slide-18
SLIDE 18

Methodology (2)

  • Hierarchical (agglomerative) clustering algorithm (solution to 3rd problem):
  • HAC used on each of the 6 TFIDF matrices
  • Linkage criterion: Ward’s method (minimizes variance of new clusters)
  • Cut-off of 70% of final merge used to estimate optimal number of clusters
  • Output:
slide-19
SLIDE 19

Methodology (3)

  • Hierarchical clustering thresholds: # of clusters increase non-linearly

2 clusters 5 clusters 16 clusters

slide-20
SLIDE 20

Methodology (4)

  • Latent Dirischlet Allocation (LDA) algorithm used to extract topics
  • Each topic comes with probabilities of generating particular words
  • Used separately for each time period (6 times total)
  • Cutoff from hierarchical clustering used to determine # of topics
  • Sample Outputs of LDA:
  • Supreme Court Nomination topic:
  • Federal Emergency topic:
slide-21
SLIDE 21
  • Two points in space (straight line)
  • Even two probability distributions
  • Dissimilarity / distance measures can be used to compare:
  • Two points on a sphere (great circle distance)

Methodology (5)

  • Hellinger Distance used to compare topics (solution to 4th problem):
  • Can effectively determine similar topics
  • Can be applied to track topic evolution over time
slide-22
SLIDE 22

Time period 1 (January & February, 2017): Time period 2 (March & April, 2017): Time period 3 (May & June, 2017):

Key Results (1)

  • Estimated # of clusters from HAC led to coherent topics
  • Similar topics (through Hellinger distance) could be compared
  • ver time to track topic evolution:
slide-23
SLIDE 23

Key Results (2)

  • A “Make America Great” topic was generally the most common
  • Discovered the Supreme Court nomination process as a major topic in

early 2017

  • Criticism of the media was a major topic through multiple time periods
  • Model was able to distinguish between unique topics:
  • Various foreign policy topics
  • Natural disasters – Hurricanes Harvey (Aug. 2017) & Maria (Sep. 2017)
slide-24
SLIDE 24

Conclusions and Limitations

  • Different tools can be effectively combined to model narratives
  • Can generate quantitative data on narratives and their evolution
  • Narrative Economics in a very young field
  • Various avenues for future research:
  • New data sources
  • Alternate pre-processing methods
  • Different thresholds / time intervals / other parameter tuning
slide-25
SLIDE 25

Future Research

  • Analyze a Narrative:​

ü Emotion Analysis ​ ü Entity-Relation Extraction​ ü Topic Extraction​ ü Subject Modeling ​

  • Insert into an SIR Disease Epidemics Model Equation ​
  • Predicting Narrative Spread and Economic Consequences​
slide-26
SLIDE 26

Acknowledgments

www.narrativeeconomics.com

  • Naveed Ghaffar, co-founder Narrative Economics

(naveedgh@gmail.com)

  • Dr. Rashed Iqbal, co-founder Narrative Economics

(rashed_iqbal@econ.ucla.edu)

slide-27
SLIDE 27

Thank you for listening!

Any Questions? Get in touch: Mohammed Ibraaz Syed - ibraaz@g.ucla.edu Andreea Kremm – kremm@netex.ai