Social Media & Text Analysis lecture 2 - Twitter API CSE - - PowerPoint PPT Presentation

social media text analysis
SMART_READER_LITE
LIVE PREVIEW

Social Media & Text Analysis lecture 2 - Twitter API CSE - - PowerPoint PPT Presentation

Social Media & Text Analysis lecture 2 - Twitter API CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org Course Website socialmedia-class.org Wei Xu socialmedia-class.org Have a Question? Ask


slide-1
SLIDE 1

Social Media & Text Analysis

lecture 2 - Twitter API

CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org

slide-2
SLIDE 2

Wei Xu ◦ socialmedia-class.org

Course Website

socialmedia-class.org

slide-3
SLIDE 3

Wei Xu ◦ socialmedia-class.org

Have a Question?

  • Ask in class!
  • Office Hour: Tue 4:15 pm — 5:15 pm, Dreese 495
  • Piazza Q&A Board (a Module within OSU Canvas) 

slide-4
SLIDE 4

Wei Xu ◦ socialmedia-class.org

This is a Special Topic Class

  • It is about NLP research, not programming. 


(pre-requirements: familiar with Python programming)

  • Homework #2 can be difficult (not about software

engineering, but machine learning algorithm — difficult to debug).

  • Students are required to think hard and independently

for solutions.

slide-5
SLIDE 5

Wei Xu ◦ socialmedia-class.org

Homework #2 (last year)

HW#2 
 (Main Algorithm) Correct 33% Minor Error 33% Incorret 33% HW#2 
 (Axillary Algorithm) Yes 50% No 50%

slide-6
SLIDE 6

Wei Xu ◦ socialmedia-class.org

Alternatives

  • audit the course or take LING 5801 (Computational Linguistics I)
  • more background: CSE 3521, 5521, 3522, Stat 3460, 3470
  • other related courses:
  • CSE 5525 Foundations of Speech and Language

Processing

  • CSE 5523 Machine Learning
  • CSE 5522 Survey of Artificial Intelligence II: Advanced

Techniques

  • CSE 5526 Introduction to Neural Networks
slide-7
SLIDE 7

Wei Xu ◦ socialmedia-class.org

  • For events A and B, prove

Quiz #1

P(A|B) = P(B|A)P(A) P(B)

slide-8
SLIDE 8

Wei Xu ◦ socialmedia-class.org

  • What does this regular expression mean?

Quiz #1

slide-9
SLIDE 9

Wei Xu ◦ socialmedia-class.org

  • Softmax function is defined as
  • prove

Quiz #1

softmax(x)i = exi P

j exj

softmax(x) = softmax(x + c)

Useful for improving the numerical stability of the computation!

slide-10
SLIDE 10

Wei Xu ◦ socialmedia-class.org

Quiz #1

  • implement Softmax function in Python 


(need to be computationally efficient)

A normalization trick for numerical stability! (highest value in the vector becomes 0)

slide-11
SLIDE 11

Wei Xu ◦ socialmedia-class.org

Softmax Function

  • 2.85

0.86 0.28 0.058 2.36 1.32 0.016 0.631 0.353

exp normalize (to sum to one)

softmax(x)i = exi P

j exj

slide-12
SLIDE 12

Wei Xu ◦ socialmedia-class.org

Softmax

see also: http://cs231n.github.io/linear-classify/#softmax

slide-13
SLIDE 13

Wei Xu ◦ socialmedia-class.org

Softmax Function

  • softmax regression (multinominal logistic regression)
  • often used as the output layer in neural networks
  • We will learn later in the class
slide-14
SLIDE 14

Wei Xu ◦ socialmedia-class.org

Quiz #2

  • derivative of the Sigmoid function:
  • use the chain rule:

if f = g(u) and u = h(x), i.e. f(x) = g(h(x)), then: d f dx = d f du du dx = dg(u) du dh(x) dx

slide-15
SLIDE 15

Wei Xu ◦ socialmedia-class.org

The Derivative of a Sigmoid

We noted earlier that the Sigmoid is a smooth (i.e. differentiable) threshold function: f x x e x ( ) Sigmoid( ) = = +

1 1 We can use the chain rule by putting f(x) = g(h(x)) with g(h) = h–1 and h(x) = 1 + e–x so ∂ ∂ g h h h ( ) = − 1

2 and ∂

∂ h x x e x ( ) = − − ∂ ∂ f x x e e e e e

x x x x x

( ) ( ) ( ) . =− + ⋅ − = +     + − +      

− − − − −

1 1 1 1 1 1 1

2

′ = = −

( )

f x f x x f x f x ( ) ( ) ( ). ( ) ∂ ∂ 1 This simple relation will make our equations much easier and save a lot of computing time!

8 4
  • 4
  • 8
0.0 0.2 0.4 0.6 0.8 1.0 1.2 Sigmoid(x) x 8 6 4 2
  • 2
  • 4
  • 6
  • 8
0.0 0.1 0.2 0.3 Sigmoid'(x) x

Source: John A. Bullinaria

slide-16
SLIDE 16

Wei Xu ◦ socialmedia-class.org

Twitter API Tutorial: socialmedia-class.org

slide-17
SLIDE 17

Wei Xu ◦ socialmedia-class.org

Homework #1 is out Due next Tuesday (Sep 5)

slide-18
SLIDE 18

Wei Xu ◦ socialmedia-class.org

Reading #1 is out Due Sep 12

slide-19
SLIDE 19

Wei Xu ◦ socialmedia-class.org

Twitter History

  • Jack Dorsey’s idea 


(a NYU undergraduate then)

  • 1st tweet on March 21, 2006
  • exploded at SXSW 2007 


(20k→60k tweets/day)

  • 100m tweets/quarter in 2008, 


50m tweets/day in 2010, 
 400m tweets/day in 2013

  • Huge API usage was

unexpected as was the rise of the @ sign for replies

Twitter staff received the festival's Web Award prize with the remark "we'd like to thank you in 140 characters or less. And we just did!"

slide-20
SLIDE 20

Wei Xu ◦ socialmedia-class.org

Twitter History

  • IPO in 2013 Q4
  • market value $24b,

revenue $435m, net loss $162m in 2015 Q1

  • CEO Dick Costolo resigned

July 1st, 2015

slide-21
SLIDE 21

Wei Xu ◦ socialmedia-class.org

Twitter HQ (since 2012)

slide-22
SLIDE 22

Wei Xu ◦ socialmedia-class.org

slide-23
SLIDE 23

Wei Xu ◦ socialmedia-class.org

slide-24
SLIDE 24

Wei Xu ◦ socialmedia-class.org

Tweets

slide-25
SLIDE 25

Wei Xu ◦ socialmedia-class.org

ReTweets

a re-posting of someone else’s Tweet

slide-26
SLIDE 26

Wei Xu ◦ socialmedia-class.org

ReTweets

  • not an official Twitter feature
  • often signifies quoting another user
  • sometimes creates problems for data analytics
slide-27
SLIDE 27

Wei Xu ◦ socialmedia-class.org

Embedded Links

  • shortened for display
slide-28
SLIDE 28

Wei Xu ◦ socialmedia-class.org

Embedded Links

  • can provide extra external information for text processing
slide-29
SLIDE 29

Wei Xu ◦ socialmedia-class.org

Mentions

  • user’s @username anywhere in the body of the Tweet
slide-30
SLIDE 30

Wei Xu ◦ socialmedia-class.org

Replies/Conversations

  • Tweet starts with a @username
slide-31
SLIDE 31

Wei Xu ◦ socialmedia-class.org

Replies/Conversations

  • can have multi-round 


conversations

slide-32
SLIDE 32

Wei Xu ◦ socialmedia-class.org

slide-33
SLIDE 33

Wei Xu ◦ socialmedia-class.org

slide-34
SLIDE 34

Wei Xu ◦ socialmedia-class.org

Images

slide-35
SLIDE 35

Wei Xu ◦ socialmedia-class.org

Hashtags

slide-36
SLIDE 36

Wei Xu ◦ socialmedia-class.org

hashtags are powerful

slide-37
SLIDE 37

Wei Xu ◦ socialmedia-class.org

Cashtags

slide-38
SLIDE 38

Wei Xu ◦ socialmedia-class.org

Twitter’s Social Graph

friend hashtag reply @mention follower retweet

Source: Volkova, Van Durme, Yarowsky, Bachrach
 “Tutorial on Social Media Predictive Analytics” NAACL 2015

slide-39
SLIDE 39

Wei Xu ◦ socialmedia-class.org

Twitter API

slide-40
SLIDE 40

Wei Xu ◦ socialmedia-class.org

What is an API?

Application Programming Interface API is a set of protocols that specify how software programs communicate with each other.

slide-41
SLIDE 41

Wei Xu ◦ socialmedia-class.org

What is an API?

Source: Chris Beach @ Quora

slide-42
SLIDE 42

Wei Xu ◦ socialmedia-class.org

Twitter API

  • Twitter is recognized for having one of the most
  • pen and powerful developer APIs of any major

technology company.

  • The first version of its public API was released in

September 2006.

slide-43
SLIDE 43

Wei Xu ◦ socialmedia-class.org

Two Most Popular APIs

Streaming API REST API a sample of public tweets and events as they published on Twitter (can specify search terms or users)

  • search
  • trends
  • read author profile and follower data
  • post / modify
  • nly real-time data

historical data up to a week continuous net connection

  • ne-time request

no limit rate limit (varies for different requests)

slide-44
SLIDE 44

Wei Xu ◦ socialmedia-class.org

OAuth

  • Twitter uses OAuth to provide authorized access to

its API.

  • which means, to start with needs:
  • a Twitter account
  • OAuth access tokens from apps.twitter.com
slide-45
SLIDE 45

Wei Xu ◦ socialmedia-class.org

Python Twitter Tools

slide-46
SLIDE 46

Wei Xu ◦ socialmedia-class.org

Streaming API

OAuth connection

slide-47
SLIDE 47

Wei Xu ◦ socialmedia-class.org

JSON

JavaScript Object Notation JSON is a minimal, readable format for structuring data.

slide-48
SLIDE 48

Wei Xu ◦ socialmedia-class.org

A Tweet in JSON

slide-49
SLIDE 49

Wei Xu ◦ socialmedia-class.org

Search

slide-50
SLIDE 50

Wei Xu ◦ socialmedia-class.org

Search API

slide-51
SLIDE 51

Wei Xu ◦ socialmedia-class.org

Trends

slide-52
SLIDE 52

Wei Xu ◦ socialmedia-class.org

Trends

trending topics are determined by an unpublished algorithm, which finds words, phrases and hashtags that have had a sharp increase in popularity, as

  • pposed to overall volume.
slide-53
SLIDE 53

Wei Xu ◦ socialmedia-class.org

Trends API

Where On Earth ID

slide-54
SLIDE 54

Wei Xu ◦ socialmedia-class.org

slide-55
SLIDE 55

Wei Xu ◦ socialmedia-class.org

known as the “Chinese Twitter” 120 Million Posts / Day

slide-56
SLIDE 56

Wei Xu ◦ socialmedia-class.org

Twitter Demographics

  • 24% of All Internet male users use Twitter, whereas 21% of All Internet

Female users use Twitter.

  • 79% of Twitter accounts are based outside the United States
  • There are over 67 million Twitter users in US.
  • Total number of Twitter users in UK is 13 million.
  • 37% of Twitter users are between ages of 18 and 29, 25% users are 30-49

years old.

  • 54% of Twitter users earn more than $50,000 a year at least.
  • The top three countries by user count outside the U.S. are Brazil (27.7

million users), Japan (25.9 million), and Mexico (23.5 million).

slide-57
SLIDE 57

Wei Xu ◦ socialmedia-class.org

Fun Facts about Twitter

  • More than 100 million tweets contained GIFs in 2015.
  • Saudi Arabia has the highest percent of internet users

who are active on Twitter.

  • Number of Twitter timeline views in 2014 is 200 billion.
  • 83% of 193 UN member countries have Twitter

presence.

  • Twitter’s revenue per employee is $488,913.
slide-58
SLIDE 58

Wei Xu ◦ socialmedia-class.org

RPE

Source: http://www.ecardshack.com/blog/top-tech-companies-revenue-per-employee

slide-59
SLIDE 59

Wei Xu ◦ socialmedia-class.org

Natural Language Processing Conferences

slide-60
SLIDE 60

Wei Xu ◦ socialmedia-class.org

a.k.a.

  • Natural Language Processing (NLP)
  • Text Analysis
  • Computational Linguistics
slide-61
SLIDE 61

Wei Xu ◦ socialmedia-class.org

ACL

slide-62
SLIDE 62

Wei Xu ◦ socialmedia-class.org

NLP Publications

  • top NLP-specific venues:
  • ACL, NAACL, EACL, EMNLP, COLING (conference)
  • TACL (journal+conference model)
  • CL (journal)
  • other venues:
  • NLP: CoNLL, *Sem, WMT, LREC, IJNLP, Workshops …
  • related CS fields: WWW, KDD, AAAI, WSDM, NIPS,

ICWSM, CIKM, ICML …

  • related non-CS fields: psychology, linguistics, …
slide-63
SLIDE 63

Wei Xu ◦ socialmedia-class.org

Conference Rotation

  • ACL (and/or NAACL, EACL), EMNLP / COLING
slide-64
SLIDE 64

Wei Xu ◦ socialmedia-class.org

NLP Publications

  • ACL Anthology (http://aclweb.org/anthology/)


all NLP conference and journal papers (free!)

slide-65
SLIDE 65

Wei Xu ◦ socialmedia-class.org

ACL’14 at A Glance

  • The Annual Meeting of the Association for Computational Linguistics
  • Duration:
  • tutorials (1 day)
  • main conference (3 days)
  • workshops (2 days)
  • Attendance of 1300+ people
  • Papers:
  • 1,123 submissions
  • 146 long papers and 129 short papers accepted
  • + 19 TACL papers
  • 159 oral and 145 poster presentations
slide-66
SLIDE 66

Wei Xu ◦ socialmedia-class.org

ACL’17 at A Glance

  • The Annual Meeting of the Association for Computational Linguistics
  • Duration:
  • tutorials (1 day)
  • main conference (3 days)
  • workshops (2 days)
  • Attendance of 1800+ people
  • Papers:
  • 1,318 submissions
  • 195 long papers and 107 short papers accepted
  • + 21 TACL papers
  • 151 oral and 151 poster presentations
slide-67
SLIDE 67

Wei Xu ◦ socialmedia-class.org

ACL’14 vs. ACL’17

Some shifts: e.g. summarization and generation is now in top 5 areas, 
 while in 2014 it didn’t even make top 10

read more: https://chairs-blog.acl2017.org/

slide-68
SLIDE 68

Wei Xu ◦ socialmedia-class.org

Research Areas

slide-69
SLIDE 69

Wei Xu ◦ socialmedia-class.org

How to Do Research

William Wang UCSB Computer Science 10/06/2016

slide-70
SLIDE 70

Wei Xu ◦ socialmedia-class.org

What is research?

  • Investigate and understand the known unknowns

and unknown unknowns in the scientific world.

  • In our lab, we are specifically interested in:
  • designing accurate, robust, and scalable

machine learning algorithms;

  • advancing natural language processing models;
  • combining learning and reasoning for better AI.
slide-71
SLIDE 71

Wei Xu ◦ socialmedia-class.org

How’s research different from taking courses?

  • Taking courses: instructor tells you exactly what to do.
  • Research:
  • define an open research problem with your advisor;
  • you (students) take the initiatives;
  • discuss and refine the technical approaches;
  • you (students) implement the approach and

perform experiments to verify the idea.

slide-72
SLIDE 72

Wei Xu ◦ socialmedia-class.org

How to make good progress in research activities

  • Clearly define the problem / task that you want to solve;
  • Understand the literature: what other people have done,

and what you can learn from them;

  • Work out the algorithm first, find a suitable dataset, and

put theories into practice: write some code;

  • Start with smaller subset of data for debugging, and

move on to larger datasets.

  • Document the results carefully in spreadsheet / docs.
slide-73
SLIDE 73

Wei Xu ◦ socialmedia-class.org

How to measure the effectiveness of ideas?

  • Use mathematical tools to clearly define the

problem and your solutions;

  • Look at the theoretical properties of your

algorithms;

  • Define good metric(s), and perform experiments
  • n multiple datasets;
  • Report results and compare with state-of-the-arts

baselines.

slide-74
SLIDE 74

Wei Xu ◦ socialmedia-class.org

Why is publication important?

  • Publication is the most important formal method for

scholarly communications.

  • Presenting your research and attending leading

conferences will create impacts, get inspirations, and facilitate the exchange of thoughts and good ideas.

  • Peer-review is a good way to get feedback from top

researchers in your field.

  • And it is a relatively objective way to claim the

effectiveness of your research.

slide-75
SLIDE 75

Wei Xu ◦ socialmedia-class.org

What is in a good research (paper)?

  • Is the problem new?
  • Is your approach new?
  • How good are the results comparing to prior

work?

  • Can you contribute any new open-source

datasets/code?

  • Is this paper well-structured and well-written?
slide-76
SLIDE 76

Wei Xu ◦ socialmedia-class.org

Research is hard

  • They are open problems that no one has a perfect

solution!

  • Implementing ideas and debugging code could be

challenging.

  • Performing good experiments are not easy.
  • Writing papers against deadlines..
slide-77
SLIDE 77

Wei Xu ◦ socialmedia-class.org

Research is rewarding

  • You helped to advance science!
  • When your first top conference full paper is

accepted… (acceptance rates typically 10-30%);

  • Other people attend your talk, read/cite your

papers, and use your code/approaches;

  • You are now the world’s expert in this area.