Social Media & Text Analysis
lecture 2 - Twitter API
CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org
Social Media & Text Analysis lecture 2 - Twitter API CSE - - PowerPoint PPT Presentation
Social Media & Text Analysis lecture 2 - Twitter API CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org Course Website socialmedia-class.org Wei Xu socialmedia-class.org Have a Question? Ask
CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
(pre-requirements: familiar with Python programming)
engineering, but machine learning algorithm — difficult to debug).
for solutions.
Wei Xu ◦ socialmedia-class.org
HW#2 (Main Algorithm) Correct 33% Minor Error 33% Incorret 33% HW#2 (Axillary Algorithm) Yes 50% No 50%
Wei Xu ◦ socialmedia-class.org
Processing
Techniques
Wei Xu ◦ socialmedia-class.org
P(A|B) = P(B|A)P(A) P(B)
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
softmax(x)i = exi P
j exj
softmax(x) = softmax(x + c)
Useful for improving the numerical stability of the computation!
Wei Xu ◦ socialmedia-class.org
(need to be computationally efficient)
A normalization trick for numerical stability! (highest value in the vector becomes 0)
Wei Xu ◦ socialmedia-class.org
0.86 0.28 0.058 2.36 1.32 0.016 0.631 0.353
exp normalize (to sum to one)
softmax(x)i = exi P
j exj
Wei Xu ◦ socialmedia-class.org
see also: http://cs231n.github.io/linear-classify/#softmax
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
if f = g(u) and u = h(x), i.e. f(x) = g(h(x)), then: d f dx = d f du du dx = dg(u) du dh(x) dx
Wei Xu ◦ socialmedia-class.org
The Derivative of a Sigmoid
We noted earlier that the Sigmoid is a smooth (i.e. differentiable) threshold function: f x x e x ( ) Sigmoid( ) = = +
−
1 1 We can use the chain rule by putting f(x) = g(h(x)) with g(h) = h–1 and h(x) = 1 + e–x so ∂ ∂ g h h h ( ) = − 1
2 and ∂
∂ h x x e x ( ) = − − ∂ ∂ f x x e e e e e
x x x x x
( ) ( ) ( ) . =− + ⋅ − = + + − +
− − − − −
1 1 1 1 1 1 1
2
′ = = −
( )
f x f x x f x f x ( ) ( ) ( ). ( ) ∂ ∂ 1 This simple relation will make our equations much easier and save a lot of computing time!
8 4Source: John A. Bullinaria
Wei Xu ◦ socialmedia-class.org
Twitter API Tutorial: socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
(a NYU undergraduate then)
(20k→60k tweets/day)
50m tweets/day in 2010, 400m tweets/day in 2013
unexpected as was the rise of the @ sign for replies
Twitter staff received the festival's Web Award prize with the remark "we'd like to thank you in 140 characters or less. And we just did!"
Wei Xu ◦ socialmedia-class.org
revenue $435m, net loss $162m in 2015 Q1
July 1st, 2015
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
a re-posting of someone else’s Tweet
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
conversations
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
hashtags are powerful
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
friend hashtag reply @mention follower retweet
Source: Volkova, Van Durme, Yarowsky, Bachrach “Tutorial on Social Media Predictive Analytics” NAACL 2015
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Application Programming Interface API is a set of protocols that specify how software programs communicate with each other.
Wei Xu ◦ socialmedia-class.org
Source: Chris Beach @ Quora
Wei Xu ◦ socialmedia-class.org
technology company.
September 2006.
Wei Xu ◦ socialmedia-class.org
Streaming API REST API a sample of public tweets and events as they published on Twitter (can specify search terms or users)
historical data up to a week continuous net connection
no limit rate limit (varies for different requests)
Wei Xu ◦ socialmedia-class.org
its API.
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
OAuth connection
Wei Xu ◦ socialmedia-class.org
JavaScript Object Notation JSON is a minimal, readable format for structuring data.
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
trending topics are determined by an unpublished algorithm, which finds words, phrases and hashtags that have had a sharp increase in popularity, as
Wei Xu ◦ socialmedia-class.org
Where On Earth ID
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
known as the “Chinese Twitter” 120 Million Posts / Day
Wei Xu ◦ socialmedia-class.org
Female users use Twitter.
years old.
million users), Japan (25.9 million), and Mexico (23.5 million).
Wei Xu ◦ socialmedia-class.org
who are active on Twitter.
presence.
Wei Xu ◦ socialmedia-class.org
Source: http://www.ecardshack.com/blog/top-tech-companies-revenue-per-employee
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
ICWSM, CIKM, ICML …
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
all NLP conference and journal papers (free!)
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
Some shifts: e.g. summarization and generation is now in top 5 areas, while in 2014 it didn’t even make top 10
read more: https://chairs-blog.acl2017.org/
Wei Xu ◦ socialmedia-class.org
Wei Xu ◦ socialmedia-class.org
William Wang UCSB Computer Science 10/06/2016
Wei Xu ◦ socialmedia-class.org
and unknown unknowns in the scientific world.
machine learning algorithms;
Wei Xu ◦ socialmedia-class.org
perform experiments to verify the idea.
Wei Xu ◦ socialmedia-class.org
and what you can learn from them;
put theories into practice: write some code;
move on to larger datasets.
Wei Xu ◦ socialmedia-class.org
problem and your solutions;
algorithms;
baselines.
Wei Xu ◦ socialmedia-class.org
scholarly communications.
conferences will create impacts, get inspirations, and facilitate the exchange of thoughts and good ideas.
researchers in your field.
effectiveness of your research.
Wei Xu ◦ socialmedia-class.org
work?
datasets/code?
Wei Xu ◦ socialmedia-class.org
solution!
challenging.
Wei Xu ◦ socialmedia-class.org
accepted… (acceptance rates typically 10-30%);
papers, and use your code/approaches;