Shallow Reading with Deep Learning Predicting popularity of online - - PowerPoint PPT Presentation

shallow reading with deep learning
SMART_READER_LITE
LIVE PREVIEW

Shallow Reading with Deep Learning Predicting popularity of online - - PowerPoint PPT Presentation

Shallow Reading with Deep Learning Predicting popularity of online content using only its title W. Stokowiec, W Trzci ski, K. Wo k, K. Marasek and P. Rokita Polish-Japanese Academy of Information Technology, Warsaw University of


slide-1
SLIDE 1

Polish-Japanese Academy of Information Technology, 
 Warsaw University of Technology
 Tooploox

Shallow Reading with Deep Learning

Predicting popularity of online content using only its title

  • W. Stokowiec, W Trzciński, K. Wołk, K. Marasek and P. Rokita
slide-2
SLIDE 2

Presentation plan

  • 1. Popularity. What is it exactly?
  • 2. Datasets description
  • 3. Baselines
  • 1. BoW + SVM
  • 2. CNN
  • 4. Bi-LSTM (our approach)
  • 5. Results
slide-3
SLIDE 3

Problem

Given a title predict whether the content would be performing well with respect to a given popularity metric (views, reactions, etc.).

slide-4
SLIDE 4

Problem definition

Title: This Syrian-American poet just lost 10 family members in Syria — her story will break your heart (via NowThis Politics) Views: 5,623,842 Title: ‘He colluded or obstructed’: Trump turns Russia suspicions against Obama Reactions: 4,336

slide-5
SLIDE 5

Problem definition

Distribution of logarithmized views
 for the NowThisNews dataset Distribution of views 
 for the NowThisNews dataset

  • Popularity prediction framed as binary classification task
  • Population split into classes according to the median of normalized popularity

metric distribution

slide-6
SLIDE 6

Datasets

Popularity proxy - number of views one week after publication

NowThis News (4K)

slide-7
SLIDE 7

Datasets

Popularity proxy - number of comments under the article

The BreakingNews Dataset (38K)

slide-8
SLIDE 8

This is amazing!

slide-9
SLIDE 9

Last chance for a good title

slide-10
SLIDE 10

Keyword analysis

slide-11
SLIDE 11

Related work

  • Most of the work focuses on Twitter and its specific

characteristics such as retweets or social graph analysis 
 (Hong, 2011).

  • Recently, several works have touched on multimodal popularity 


prediction (Trzcinski, 2017) and (Chen, 2016).

  • Prediction of popularity online articles based on their whole text


(Ramisa, 2016).

  • In our work, we focus only on the title ignoring everything else.
slide-12
SLIDE 12

Baselines

Bag of Words + SVM with Linear Kernel CNN (Ramisa, 2016)

  • Representing a title by a D x N matrix of

concatenated GloVe word vectors.

  • 256 convolution filters with width 5 and

stride equal to 1 and max pooling (x3).

  • FC layer with L2 regularization, 


ReLU dropout

  • Final FC layer with sigmoid*

Convolutional Neural Network (Ramisa, 2016)

slide-13
SLIDE 13

Bidirectional Long Short-Term Memory Network

Architecture

  • 1-of-K word encoding
  • GloVe as an embedding layer
  • Bidirectional LSTM for title encoding
  • Regularization (Dropout, L2)
  • Sigmoid on the top
slide-14
SLIDE 14

Results

We used k-fold evaluation protocol with k=5

slide-15
SLIDE 15

Results

slide-16
SLIDE 16

BiLSTM Hidden State Interpretation

  • Concatenation of hidden states at time t ( )can be seen as context-

depended vector representation of word w_t

  • This allows us to introspect a given title and approximate the contribution of

each word in the sequence to the popularity

  • The output of the last fully-connected layer could be interpreted as

context-depended influence of a word w_t on popularity

Visualization of context-depended word influence

slide-17
SLIDE 17
slide-18
SLIDE 18

Conclusions

  • To our knowledge, this is the first attempt of predicting the performance
  • f content on social media using only textual information from its title.
  • We show that our method consistently outperforms baseline models.
  • We are able to introspect the model and use the hidden states to

better understand audience preferences.

slide-19
SLIDE 19

Thank you!

slide-20
SLIDE 20

References

  • 1. A. Ramisa, F. Yan, F. Moreno-Noguer, K. Mikolajczyk; BreakingNews: Article

Annotation by Image and Text Processing; arXiv:1603.07141 [cs.CV], 2016.

  • 2. J. Chen, X. Song, L. Nie, X. Wang, H. Zhang, and T. Chua. Micro tells macro:

Predicting the popularity of micro-videos via a transductive model. In ACMMM, 2016.

  • 3. L. Hong, O. Dan, and B. Davison. Predicting popular messages in twitter. In
  • Proc. International Conference Companion on World Wide Web, 2011.
  • 4. T. Trzcinski, P. Rokita. Predicting popularity of online videos using Support

Vector Regression. IEEE Trans. Multimedia (TMM), 2017.