social media text analysis
play

Social Media & Text Analysis lecture 2 - Twitter API CSE - PowerPoint PPT Presentation

Social Media & Text Analysis lecture 2 - Twitter API CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org Course Website socialmedia-class.org Wei Xu socialmedia-class.org Have a Question? Ask


  1. Social Media & Text Analysis lecture 2 - Twitter API CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org

  2. Course Website socialmedia-class.org Wei Xu ◦ socialmedia-class.org

  3. Have a Question? • Ask in class! • Office Hour: Tue 4:15 pm — 5:15 pm, Dreese 495 • Piazza Q&A Board (a Module within OSU Canvas) 
 Wei Xu ◦ socialmedia-class.org

  4. This is a Special Topic Class • It is about NLP research , not programming. 
 (pre-requirements: familiar with Python programming) • Homework #2 can be difficult (not about software engineering, but machine learning algorithm — difficult to debug). • Students are required to think hard and independently for solutions. Wei Xu ◦ socialmedia-class.org

  5. Homework #2 (last year) HW#2 
 HW#2 
 (Main Algorithm) (Axillary Algorithm) Correct Incorret 33% 33% Yes No 50% 50% Minor Error 33% Wei Xu ◦ socialmedia-class.org

  6. Alternatives • audit the course or take LING 5801 (Computational Linguistics I) • more background : CSE 3521, 5521, 3522, Stat 3460, 3470 • other related courses : - CSE 5525 Foundations of Speech and Language Processing - CSE 5523 Machine Learning - CSE 5522 Survey of Artificial Intelligence II: Advanced Techniques - CSE 5526 Introduction to Neural Networks Wei Xu ◦ socialmedia-class.org

  7. Quiz #1 • For events A and B, prove P ( A | B ) = P ( B | A ) P ( A ) P ( B ) Wei Xu ◦ socialmedia-class.org

  8. Quiz #1 • What does this regular expression mean? Wei Xu ◦ socialmedia-class.org

  9. Quiz #1 e x i • Softmax function is defined as softmax ( x ) i = j e x j P • prove softmax ( x ) = softmax ( x + c ) Useful for improving the numerical stability of the computation! Wei Xu ◦ socialmedia-class.org

  10. Quiz #1 • implement Softmax function in Python 
 (need to be computationally efficient) A normalization trick for numerical stability! (highest value in the vector becomes 0) Wei Xu ◦ socialmedia-class.org

  11. Softmax Function e x i softmax ( x ) i = j e x j P -2.85 0.058 0.016 exp normalize 0.86 2.36 0.631 (to sum to one) 0.28 1.32 0.353 Wei Xu ◦ socialmedia-class.org

  12. Softmax see also: http://cs231n.github.io/linear-classify/#softmax Wei Xu ◦ socialmedia-class.org

  13. Softmax Function • softmax regression (multinominal logistic regression) • often used as the output layer in neural networks • We will learn later in the class Wei Xu ◦ socialmedia-class.org

  14. Quiz #2 • derivative of the Sigmoid function: • use the chain rule: if f = g ( u ) and u = h ( x ), i.e. f ( x ) = g ( h ( x )), then: dx = dg ( u ) dh ( x ) dx = d d f f du du du dx Wei Xu ◦ socialmedia-class.org

  15. The Derivative of a Sigmoid We noted earlier that the Sigmoid is a smooth (i.e. differentiable) threshold function: 1.2 1.0 1 0.8 = = Sigmoid(x) f x ( ) Sigmoid( ) x − + e x 0.6 1 0.4 0.2 0.0 - 8 - 4 0 4 8 x We can use the chain rule by putting f(x) = g(h(x)) with g(h) = h –1 and h(x) = 1 + e – x so ∂ ( ) = − 1 2 and ∂ ( ) = − − g h h x e x ∂ ∂ h h x 0.3 −   ∂ + − x   0.2 f x ( ) 1 1 1 e 1 Sigmoid'(x) − = − ⋅ − = x   ( e ) .   − − − ∂ + +  +  x 2 x x x ( 1 e ) 1 e 1 e 0.1 ∂ 0.0 f x ( ) - 8 - 6 - 4 - 2 0 2 4 6 8 ( ) ′ = = − x f ( ) x f x ( ). 1 f x ( ) ∂ x This simple relation will make our equations much easier and save a lot of computing time! Wei Xu ◦ socialmedia-class.org Source: John A. Bullinaria

  16. Twitter API Tutorial: socialmedia-class.org Wei Xu ◦ socialmedia-class.org

  17. Homework #1 is out Due next Tuesday (Sep 5) Wei Xu ◦ socialmedia-class.org

  18. Reading #1 is out Due Sep 12 Wei Xu ◦ socialmedia-class.org

  19. Twitter History • Jack Dorsey’s idea 
 (a NYU undergraduate then) • 1st tweet on March 21, 2006 • exploded at SXSW 2007 
 (20k → 60k tweets/day) • 100m tweets/quarter in 2008, 
 50m tweets/day in 2010, 
 400m tweets/day in 2013 Twitter staff received the festival's Web Award prize with the remark • Huge API usage was "we'd like to thank you in 140 unexpected as was the rise of characters or less. And we just did!" the @ sign for replies Wei Xu ◦ socialmedia-class.org

  20. Twitter History • IPO in 2013 Q4 • market value $24b, revenue $435m, net loss $162m in 2015 Q1 • CEO Dick Costolo resigned July 1st, 2015 Wei Xu ◦ socialmedia-class.org

  21. Twitter HQ (since 2012) Wei Xu ◦ socialmedia-class.org

  22. Wei Xu ◦ socialmedia-class.org

  23. Wei Xu ◦ socialmedia-class.org

  24. Tweets Wei Xu ◦ socialmedia-class.org

  25. ReTweets a re-posting of someone else’s Tweet Wei Xu ◦ socialmedia-class.org

  26. ReTweets - not an official Twitter feature - often signifies quoting another user - sometimes creates problems for data analytics Wei Xu ◦ socialmedia-class.org

  27. Embedded Links - shortened for display Wei Xu ◦ socialmedia-class.org

  28. Embedded Links - can provide extra external information for text processing Wei Xu ◦ socialmedia-class.org

  29. Mentions - user’s @username anywhere in the body of the Tweet Wei Xu ◦ socialmedia-class.org

  30. Replies/Conversations - Tweet starts with a @username Wei Xu ◦ socialmedia-class.org

  31. Replies/Conversations - can have multi-round 
 conversations Wei Xu ◦ socialmedia-class.org

  32. Wei Xu ◦ socialmedia-class.org

  33. Wei Xu ◦ socialmedia-class.org

  34. Images Wei Xu ◦ socialmedia-class.org

  35. Hashtags Wei Xu ◦ socialmedia-class.org

  36. hashtags are powerful Wei Xu ◦ socialmedia-class.org

  37. Cashtags Wei Xu ◦ socialmedia-class.org

  38. Twitter’s Social Graph hashtag friend reply retweet follower @ mention Source: Volkova, Van Durme, Yarowsky, Bachrach 
 “Tutorial on Social Media Predictive Analytics” NAACL 2015 Wei Xu ◦ socialmedia-class.org

  39. Twitter API Wei Xu ◦ socialmedia-class.org

  40. What is an API? A pplication P rogramming I nterface API is a set of protocols that specify how software programs communicate with each other. Wei Xu ◦ socialmedia-class.org

  41. What is an API? Wei Xu ◦ socialmedia-class.org Source: Chris Beach @ Quora

  42. Twitter API • Twitter is recognized for having one of the most open and powerful developer APIs of any major technology company. • The first version of its public API was released in September 2006. Wei Xu ◦ socialmedia-class.org

  43. Two Most Popular APIs Streaming API REST API - search a sample of public tweets and events - trends as they published on Twitter - read author profile and follower data (can specify search terms or users) - post / modify only real-time data historical data up to a week continuous net connection one-time request no limit rate limit (varies for different requests) Wei Xu ◦ socialmedia-class.org

  44. OAuth • Twitter uses OAuth to provide authorized access to its API. • which means, to start with needs: • a Twitter account • OAuth access tokens from apps.twitter.com Wei Xu ◦ socialmedia-class.org

  45. Python Twitter Tools Wei Xu ◦ socialmedia-class.org

  46. Streaming API OAuth connection Wei Xu ◦ socialmedia-class.org

  47. JSON JavaScript Object Notation JSON is a minimal, readable format for structuring data. Wei Xu ◦ socialmedia-class.org

  48. A Tweet in JSON Wei Xu ◦ socialmedia-class.org

  49. Search Wei Xu ◦ socialmedia-class.org

  50. Search API Wei Xu ◦ socialmedia-class.org

  51. Trends Wei Xu ◦ socialmedia-class.org

  52. Trends trending topics are determined by an unpublished algorithm, which finds words, phrases and hashtags that have had a sharp increase in popularity, as opposed to overall volume. Wei Xu ◦ socialmedia-class.org

  53. Trends API Where On Earth ID Wei Xu ◦ socialmedia-class.org

  54. Wei Xu ◦ socialmedia-class.org

  55. known as the “Chinese Twitter” 120 Million Posts / Day Wei Xu ◦ socialmedia-class.org

  56. Twitter Demographics • 24% of All Internet male users use Twitter, whereas 21% of All Internet Female users use Twitter. • 79% of Twitter accounts are based outside the United States • There are over 67 million Twitter users in US. • Total number of Twitter users in UK is 13 million. • 37% of Twitter users are between ages of 18 and 29, 25% users are 30-49 years old. • 54% of Twitter users earn more than $50,000 a year at least. • The top three countries by user count outside the U.S. are Brazil (27.7 million users), Japan (25.9 million), and Mexico (23.5 million). Wei Xu ◦ socialmedia-class.org

  57. Fun Facts about Twitter • More than 100 million tweets contained GIFs in 2015. • Saudi Arabia has the highest percent of internet users who are active on Twitter. • Number of Twitter timeline views in 2014 is 200 billion. • 83% of 193 UN member countries have Twitter presence. • Twitter’s revenue per employee is $488,913. Wei Xu ◦ socialmedia-class.org

  58. RPE Source: http://www.ecardshack.com/blog/top-tech-companies-revenue-per-employee Wei Xu ◦ socialmedia-class.org

  59. Natural Language Processing Conferences Wei Xu ◦ socialmedia-class.org

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend