[PPT] - Research Paper Recommender System Based on Deep Text Comprehension PowerPoint Presentation

SLIDE 1

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Research Paper Recommender System Based on Deep Text Comprehension

Dongyu Ru Kun Chen

SJTU

May 27, 2018

Dongyu Ru RPRS based on DTC 1/29

SLIDE 2

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Introduction

A recommender system is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item.

Dongyu Ru RPRS based on DTC 3/29

SLIDE 4

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Introduction

Many recommendation classes have been utilized over the past few years, among which, typically, the following two classes are most popular.

Content-Based Filtering
Collaborative Filtering

Dongyu Ru RPRS based on DTC 4/29

SLIDE 5

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Introduction

Content-based filtering approaches utilize a series of

discrete characteristics of an item in order to recommend additional items with similar properties.

Collaborative filtering approaches build a model from a

user’s past behaviour as well as similar decisions made by

ther users. This model is then used to predict items (or

ratings for items) that the user may have an interest in.

Dongyu Ru RPRS based on DTC 5/29

SLIDE 6

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Framework

DTC Model Tag Web Data Papers Corpus Text Preprocess Candidate Papers User Profile Recommended Papers

Dongyu Ru RPRS based on DTC 6/29

SLIDE 7

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Framework

We came up with the framework above based on a typical Content-Based Filtering Model. The main difference is that, we replace the original matching model in the CBF system with our DTC Model. Because we claim that our Deep Text Comprehension model has higher capacity to recognize the patterns of given text than simple n-gram or TF-IDF based models.

Dongyu Ru RPRS based on DTC 7/29

SLIDE 8

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Framework

We think the rest parts except for the DTC model are relatively mature and well exploited. So we focus on the DTC model, which is actually a deep neural network.

Dongyu Ru RPRS based on DTC 8/29

SLIDE 9

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

word embedding char embedding

... ... ... ...

concat

...

highway layer contextual embedding layer attention flow layer modeling layer dense layer

... ... ... ... Dongyu Ru RPRS based on DTC 9/29

SLIDE 10

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

The DTC model is a deep LSTM-based neural network which consists of mainly 7 layers, as shown in Figure above. It takes as input the words and characters of the paper text. And output a similarity score between the input papers. The detail structures are introduced in the following part.

Dongyu Ru RPRS based on DTC 10/29

SLIDE 11

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

Character Embedding Layer This layer maps each word

to a vector space using character-level CNN (Convolution Neural Network). Let a = {a1, a2, ..., aT } and b = {b1, b2, ..., bT } represent the input words of two

papers. Characters are embedded into vectors, as 1D

inputs to the CNN, whose size is the input channel size of

CNN. The outputs of CNN are max-pooled over the entire

width to obtain a fixed-size vector for each word.

Dongyu Ru RPRS based on DTC 11/29

SLIDE 12

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

Word Embedding Layer This layer maps each word to a

high-dimensional vector space. Pretrained word vectors, GloVe, are used to obtain the fixed word embedding of each word. The output of Word Embedding Layer and Char Embedding layer are concatenated together as representation of input text.

Dongyu Ru RPRS based on DTC 12/29

SLIDE 13

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

Highway Layer This layer takes as input the concatenation
f two sequences of embedding vectors in word-level. And

it performs as a gate to leak part of original information of input directly to next layer. Let x represent the input. T(x) = σ(WT x + bT )

(x) = relu(Wox + bo)

O(x) = T(x) · o(x) + (1 − T(x)) · x (1)

Dongyu Ru RPRS based on DTC 13/29

SLIDE 14

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

Contextual Embedding Layer In this layer, a LSTM(Long

Short Term Memory) Network is applied after the Highway layer output. The output states of LSTM are concatenated and transmitted to the next layer. Till now, feature representation on different granularity has been obtained. yt = BiLSTM(yt−1, xt) (2)

Dongyu Ru RPRS based on DTC 14/29

SLIDE 15

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

Attention Flow Layer Here, contextual embedding output
f two papers are input to the Attention Flow Layer to get a

mutual-aware representation of input papers.

Dongyu Ru RPRS based on DTC 15/29

SLIDE 16

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

Modeling Layer The Modeling Layer are constructed by

another LSTM layer. The input of modeling layer is attention output stacks. It captures the interaction in the mutual-aware representation of input papers.

Dongyu Ru RPRS based on DTC 16/29

SLIDE 17

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Model

Dense Layer The Dense Layer acts as the output layer of

this model, which takes the final state of Modeling Layer as input, use a fully-connected layer and sigmoid function to get score of similarity. score = sigmoid(W T

s M)

(3)

Dongyu Ru RPRS based on DTC 17/29

SLIDE 18

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Baseline

There are two baselines selected to compare with our Model on matching performance.

TF-IDF
Simhash

Dongyu Ru RPRS based on DTC 18/29

SLIDE 19

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Baseline

TF-IDF, short for term frequency-inverse document

frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

Simhash is a technique for quickly estimating how similar

two sets are. The algorithm is used by the Google Crawler to find near duplicate pages.

Dongyu Ru RPRS based on DTC 19/29

SLIDE 20

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Baseline

Some important formulas of TF-IDF:

tfi,j =

ni,j

k nk,j
id

fi = log

|D| 1+|j:ti∈dj|

tfid

f(i, j, D) = tfi,j ∗ id fi

Dongyu Ru RPRS based on DTC 20/29

SLIDE 21

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Baseline

Main procedure of simhash:

Default hashsize B = 64, let V = [0] * B
Break the phrase up into features, and hash each feature

using a normal 64-bit hashing algorithm

For each hash, if biti is set then add 1 to V[i], else take 1

from V[i]

simhash biti is 1 if V [i] > 0 and 0 otherwise
Sort all hash values and check adjacent, then rotate 1 bit,

repeat for B times

Dongyu Ru RPRS based on DTC 21/29

SLIDE 22

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Experiments

To prove our model performs better to play as a matching model in Research Paper Recommender System. We collect a dataset to verify the performance of our model and baselines. Restricted by the limited computation power, we randomly selected 1M papers from the dataset for validation. After filtering out bad cases in the dataset. Finally we perform the experiments on a dataset of 200K papers.

Dongyu Ru RPRS based on DTC 22/29

SLIDE 23

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Experiments

30% of the datasets are reserved as test set. And experiments

n baselines are directly performed on test set without training.

We evaluate our DTC (Deep Text Comprehension) Model with ROC (Receiver Operating Characteristic Curve) as shown in following Figure and AUC (Area Under Curve) as shown in following Table.

Dongyu Ru RPRS based on DTC 23/29

SLIDE 24

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Experiments

Dongyu Ru RPRS based on DTC 24/29

SLIDE 25

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Experiments

Table: AUC comparison of matching performance

TF_IDF SIM-HASH DTC AUC 0.65 0.61 0.95

Dongyu Ru RPRS based on DTC 25/29

SLIDE 26

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Experiments

One more thing need to mention is that our model runs slower than those 2 baselines. For a same dataset with 60K data items, TF-IDF runs for 3 mins, while Simhash runs for 3 hours, both on CPU i5-5200U. Our model needs about 10 hours, and extra GPU support needed.

Dongyu Ru RPRS based on DTC 26/29

SLIDE 27

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Conclusion

We proposed a Deep Text Comprehension based Recommender System, which replace the original matching model in a CBF Research Paper Recommender System with a Deep Neural Network. And we claim the DTC model has higher capacity to recognize the patterns in text. Experiments indicate that our DTC model performs better than two baselines mentioned in the report. However, the extra running time and computing power is still a problem to be fixed.

Dongyu Ru RPRS based on DTC 27/29

SLIDE 28

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

References

Beel J, Gipp B, Langer S, et al. paper recommender systems: a literature survey[J]. International Journal on

Digital Libraries, 2016, 17(4): 305-338.

Ferrara F, Pudota N, Tasso C. A keyphrase-based paper recommender system[C]//Italian Research

Conference on Digital Libraries. Springer, Berlin, Heidelberg, 2011: 14-25.

Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
Kim Y, Jernite Y, Sontag D, et al. Character-Aware Neural Language Models[C]//AAAI. 2016: 2741-2749.
Pennington J, Socher R, Manning C. Glove: Global vectors for word representation[C]//Proceedings of the

2014 conference on empirical methods in natural language processing (EMNLP). 2014: 1532-1543.

Srivastava R K, Greff K, Schmidhuber J. Highway networks[J]. arXiv preprint arXiv:1505.00387, 2015.
Seo M, Kembhavi A, Farhadi A, et al. Bidirectional attention flow for machine comprehension[J]. arXiv

preprint arXiv:1611.01603, 2016. Dongyu Ru RPRS based on DTC 28/29

SLIDE 29

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Questions

Thank you for your time!

Dongyu Ru RPRS based on DTC 29/29

Research Paper Recommender System Based on Deep Text Comprehension

Dongyu Ru Kun Chen

May 27, 2018

Table of Contents

Introduction Framework Model (Dongyu Ru) Baseline (Kun Chen) Experiments Conclusion

Introduction

A recommender system is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item.

Introduction

Many recommendation classes have been utilized over the past few years, among which, typically, the following two classes are most popular.

Introduction

discrete characteristics of an item in order to recommend additional items with similar properties.

user’s past behaviour as well as similar decisions made by

ratings for items) that the user may have an interest in.

Framework

Framework

Framework

We think the rest parts except for the DTC model are relatively mature and well exploited. So we focus on the DTC model, which is actually a deep neural network.

Model

Model

The DTC model is a deep LSTM-based neural network which consists of mainly 7 layers, as shown in Figure above. It takes as input the words and characters of the paper text. And output a similarity score between the input papers. The detail structures are introduced in the following part.

Model

to a vector space using character-level CNN (Convolution Neural Network). Let a = {a1, a2, ..., aT } and b = {b1, b2, ..., bT } represent the input words of two

inputs to the CNN, whose size is the input channel size of

width to obtain a fixed-size vector for each word.

Model

high-dimensional vector space. Pretrained word vectors, GloVe, are used to obtain the fixed word embedding of each word. The output of Word Embedding Layer and Char Embedding layer are concatenated together as representation of input text.

Model

it performs as a gate to leak part of original information of input directly to next layer. Let x represent the input. T(x) = σ(WT x + bT )

O(x) = T(x) · o(x) + (1 − T(x)) · x (1)

Model

Short Term Memory) Network is applied after the Highway layer output. The output states of LSTM are concatenated and transmitted to the next layer. Till now, feature representation on different granularity has been obtained. yt = BiLSTM(yt−1, xt) (2)

Model

mutual-aware representation of input papers.

Model

another LSTM layer. The input of modeling layer is attention output stacks. It captures the interaction in the mutual-aware representation of input papers.

Model

this model, which takes the final state of Modeling Layer as input, use a fully-connected layer and sigmoid function to get score of similarity. score = sigmoid(W T

(3)

Baseline

There are two baselines selected to compare with our Model on matching performance.

Baseline

frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

two sets are. The algorithm is used by the Google Crawler to find near duplicate pages.

Baseline

Some important formulas of TF-IDF:

fi = log

f(i, j, D) = tfi,j ∗ id fi

Baseline

Main procedure of simhash:

using a normal 64-bit hashing algorithm

from V[i]

repeat for B times

Experiments

Experiments

30% of the datasets are reserved as test set. And experiments

We evaluate our DTC (Deep Text Comprehension) Model with ROC (Receiver Operating Characteristic Curve) as shown in following Figure and AUC (Area Under Curve) as shown in following Table.

Experiments

Experiments

TF_IDF SIM-HASH DTC AUC 0.65 0.61 0.95

Experiments

One more thing need to mention is that our model runs slower than those 2 baselines. For a same dataset with 60K data items, TF-IDF runs for 3 mins, while Simhash runs for 3 hours, both on CPU i5-5200U. Our model needs about 10 hours, and extra GPU support needed.

Conclusion

References

Questions

Thank you for your time!