#TagSpace: Semantic Embeddings from Hashtags Jason Weston, Sumit - - PowerPoint PPT Presentation

tagspace semantic embeddings from hashtags
SMART_READER_LITE
LIVE PREVIEW

#TagSpace: Semantic Embeddings from Hashtags Jason Weston, Sumit - - PowerPoint PPT Presentation

#TagSpace: Semantic Embeddings from Hashtags Jason Weston, Sumit Chopra, Keith Adams 2014 Jack Lanchantin Motivation Word and document embeddings are difficult to learn Most current techniques use unsupervised methods word2vec


slide-1
SLIDE 1

#TagSpace: Semantic Embeddings from Hashtags

Jason Weston, Sumit Chopra, Keith Adams – 2014

Jack Lanchantin

slide-2
SLIDE 2

Motivation

  • Word and document embeddings are difficult to learn
  • Most current techniques use unsupervised methods
  • word2vec learns word embeddings by trying to

predict each word in a doc based on surrounding text

  • Hashtags: labels of text for such as sentiment (#happy)
  • r topic annotation (#nyc) written by the post author
  • Hashtag prediction provides better way to learn word

and document embeddings than unsupervised learning because hashtags provide stronger semantic guidance

slide-3
SLIDE 3

Overview Model Hashtag Prediction Document Recommendation Conclusion

slide-4
SLIDE 4

Overview

  • #TagSpace: Convolutional Neural Network that

learns features (embeddings) of short textual posts using hashtags as the supervised signal

  • Train the network to be able to optimally

predict hashtags on test posts

  • The learned embedding of text (ignoring the

hashtag labels) is useful for other tasks such as document recommendation

slide-5
SLIDE 5

Overview Model Hashtag Prediction Document Recommendation Conclusion

slide-6
SLIDE 6

Neural Net For Scoring a (doc, hashtag) Pair

Assigning a d dimensional vector to each of the l words Hidden network layers Representation of entire document Assigning a d dimensional vector to the hashtag Scoring function

slide-7
SLIDE 7
  • Given a document, rank all hashtags by score:
  • Loss function is used to approximately optimize

the top of the ranked list – useful for P and R@k

  • More energy spent on improving ranking of

positive labels near the top of ranked list

Training the Scoring Function

slide-8
SLIDE 8

Overview Model Hashtag Prediction Document Recommendation Conclusion

slide-9
SLIDE 9

Hashtag Prediction

  • Goal: Rank a post’s ground truth hashtags higher

than hashtags it does not contain

  • Test using: Precision @ 1, Recall @10, mean rank

for the hashtags of 50,000 test posts

  • Compared to 4 other models:

–Frequency: always ranks hashtags by training frequency –#words: “crazy commute this am” → #crazy, #commute, #this, #am –Word2vec (unsupervised) –WSABIE (supervised)

slide-10
SLIDE 10

Data

Business, Celebrity, Brand, or Product Individual users

slide-11
SLIDE 11

#TagSpace Examples (256 dim)

Post Predicted Hashtags

slide-12
SLIDE 12

Hashtag Prediction Results

slide-13
SLIDE 13

Overview Model Hashtag Prediction Document Recommendation Conclusion

slide-14
SLIDE 14

Personalized Document Recommendation

  • Goal: extend the learned representations from

predicting hashtags to do other tasks

  • Document recommendation: recommending

documents to users based on interaction history

  • Used day-long interaction histories for 34,000

people on Facebook

  • Text of posts that he/she liked, clicked, replied to
  • Given n-1 trailing posts, predict the nth post by

ranking it against 10,000 other posts

  • Score of nth post is obtained by max embedding

similarity over n-1 posts

  • Used cosine similarity between post embeddings
slide-15
SLIDE 15

Document Recommendation Results

TF-IDF weighted bag of words baseline Best results come from summing BOW scores w/ Tagspace

slide-16
SLIDE 16

Overview Model Hashtag Prediction Document Recommendation Conclusion

slide-17
SLIDE 17

Conclusion

  • Outperformed all comparison models in hashtag

prediction

  • Model scales very well when considering a large

number (millions) of hashtags

  • Logistic regression and SVMs do not
  • Semantics of hashtags cause #TagSpace to learn

features that capture important aspects of text

  • Able to port the learned embeddings to the task
  • f personalized document recommendation

with better accuracy than other models

slide-18
SLIDE 18

#TagSpace: https://research.facebook.com/publications/279494668926031/- tagspace-semantic-embeddings-from-hashtags/ WSABIE: http://www.thespermwhale.com/jaseweston/papers/wsabie-ijcai.pdf Word2Vec: http://arxiv.org/abs/1301.3781

slide-19
SLIDE 19