social media studies Understanding content, user behaviors and - PowerPoint PPT Presentation

Research frontiers in online social media studies Understanding content, user behaviors and information diffusion Emilio Ferrara Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University Bloomington Summer Workshop on Algorithms and Cyberinfrastructure for large scale optimization/AI August 8, 2013

Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Data Collection • Twitter Streaming API (10% sample of total traffic) • August, 2010 – present • ~5TB Compressed • Real-time access to data from last 9 months related to 3 themes: US Politics, Social Movements, News Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Detecting early signatures of persuasion in information cascades Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Scope of the project • Data acquisition in streaming scenario from Social Media (Twitter, FB) • Extraction of information tokens, so-called memes • Clustering of memes • Meme clusters classification Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Architecture Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Problem statement Goal : clustering a large volume of tweets, in a • streaming scenario, in topics based on their similarity. Challenges : tweets text is too sparse for • classification, we need to exploit further features: Network structure • Temporal signature • Meta-data • Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Meme definition @Mention: the user addresses another user mentioning its username • (Twitter syntax: @) #Hashtag : the user tags its message with a “concept” (syntax: #) • URL: a message can include one/multiple URL(s) in extended or • shortened format Phrase: whatever remains after removing mentions, hashtags and • URLs, stemming verbs/nouns, removing stop-words and punctuation. Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Advantages of using memes • More granularity: each tweet is assigned to at least one (or more) memes • Efficiency in real-time scenario: each incoming tweet is directly assigned to its meme/s without additional overhead • Memes can be aggregated each other forming clusters of topics related by content/structure similarity • We define a set of similarity measures: • Common user similarity • Common tweets similarity • Common document similarity Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Network, memes & content relations • Social network • Memes • Content Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Meme similarity measures • Common users similarity • Common tweet similarity • Content similarity Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

CUS: Common User Similary Cosine similarity: Example: = 0.77 Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

CTS: Common Tweet Similarity Example: P meme1 = {tweet1, tweet2, tweet5} P meme2 = {tweet1, tweet2, tweet5, tweet6} Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

CDS: Common Document Similarity We use once again the cosine similarity but using the TF-IDF matrix Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Linear Combination Linear combination similarity score: Example: CTS(i,j) = 0.8 CUS(i,j) = 0.7 CDS(i,j) = 0.9 Different weighting schema: L(i,j) = 0.5 * 0.8 + 0.5 * 0.9 = 0.85 L(i,j) = 0.33 * 0.8 + 0.33 * 0.7 + 0.33 * 0.9 = 0.79 L(i,j) = 1 * 0.9 = 0.9 (MAX) Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Evaluation method Building a ground-truth dataset: • Thematic, hand-picked keywords (Twitter trends) • ~ 2K tweets with keywords classified as trending • All tweets collected from a given day • The dataset contains 9 different classes (imbalanced size) Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Goals 1.Determining how much correct is our clustering solution compared to the ground truth. 2.Determining the best trade-off in the clustering algorithm configuration considering: • Quality of obtained partitioning • Number of obtained clusters • Size of obtained clusters Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Evaluation metrics Individual entropies: H(X), H(Y) Joint entropy: H(X,Y) Conditional entropies: H(X|Y), H(Y|X) Mutual information: Normalized mutual information: Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Clustering algorithms We investigated different clustering algorithms: • Static hierarchical clustering • Hierarchical stream-clustering • Online K-means clustering • Algorithms have been evaluated against each • other to determine: Tweet clustering vs. meme clustering • Best clustering algorithm • Content vs. network features • Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Stream-clustering quality Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Cluster Number/Size Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Summary Efficient clustering must exploit meta-data, content • and network information Introduction of memes (hashtags, mentions, URLs, • phrases) as a ‘ pre-clustering ’ step improves performance Efficient, scalable, robust clustering algorithm • adaptable for working in streaming scenario Room for performance improvement adding further • features and exploiting parameter tuning for similarity measures Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

Current/Future Work Design of a distributed architecture supporting: • o Distributed MapReduce-based storage with replication; currently evaluating: • Hadoop/Hbase • Riak • Cassandra/Solandra o Large data-storage required: • ~50M tweets/day (increasing) ~300G/day uncompressed data • Data compression (gz, lzw), support for JSON o Low latency, data-storage & access & analysis in close-to- real-time scenario Summer Workshop on Algorithms and Cyberinfrastructure for large August 8, 2013 scale optimization/AI

social media studies Understanding content, user behaviors and - PowerPoint PPT Presentation

Research frontiers in online social media studies Understanding content, user behaviors and information diffusion Emilio Ferrara Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Social Media donts What is social media Social media is nothing new Just an extension

Social Media Analytics Ahmed Abbasi University of Virginia 1 Outline Social Media Overview

Getting Social What is social media? Why does social media matter? What social media

Social Media Seminar for Development Educators Part 1: Social Media Basics How are these

Social Media for Business July 28, 2009 What is it? Social media marketing also known as social

network science and social science on Twitter mor naaman rutgers SC&I | social media

Presentation 2 Why is there advertising on social media? Get Media Smart social media 2

Social Media Week BEIRUT Social Media versus Traditional Media; The contradictory results of the

Digital Media Addiction Smart Phones, Social Media and Suicide Fact: Social Media is a

Contents Introduction What is social media Social media overview Classification of

Social media for equality bodies Adam Zbiejczuk & Jaroslav Faltus - Social media for equality

SOCIAL MEDIA & NON PROFITS Tips and tricks for success. Public Relations WHAT IS SOCIAL

Social Media -- Understanding it and Making it Work Preliminary Guidance on Social Media

Correspondence and Rigidity Results on Asymptotically Anti-de Sitter Spacetimes Arick Shao Queen

Low-delay compression for sensor networks Alexandre Guitton University of Oxford, Computing

Towards Measuring Anonymity Claudia Diaz, Stefaan Seys, Joris Claessens, Bart Preneel Presented

Before we start Boeing night at RIT Welcome to Virtual Theatre March 28th 6-8pm

Wavelets for progressive transmission/compression of images The SPIHT method WTBV WS 2016/17

Layer Optimization: Congestion Control CS 118 Computer Network Fundamentals Peter Reiher

8. Strings and Tries http://aofa.cs.princeton.edu Orientation Second half of class Surveys

Theoretical Computer Science Bridging Course - Introduction / General Info Summer Term 2016

social media studies Understanding content, user behaviors and - PowerPoint PPT Presentation

Research frontiers in online social media studies Understanding content, user behaviors and information diffusion Emilio Ferrara Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Social Media donts What is social media Social media is nothing new Just an extension

Social Media Analytics Ahmed Abbasi University of Virginia 1 Outline Social Media Overview

Getting Social What is social media? Why does social media matter? What social media

Social Media Seminar for Development Educators Part 1: Social Media Basics How are these

Social Media for Business July 28, 2009 What is it? Social media marketing also known as social

network science and social science on Twitter mor naaman rutgers SC&amp;I | social media

Presentation 2 Why is there advertising on social media? Get Media Smart social media 2

Social Media Week BEIRUT Social Media versus Traditional Media; The contradictory results of the

Digital Media Addiction Smart Phones, Social Media and Suicide Fact: Social Media is a

Contents Introduction What is social media Social media overview Classification of

Social media for equality bodies Adam Zbiejczuk &amp; Jaroslav Faltus - Social media for equality

SOCIAL MEDIA &amp; NON PROFITS Tips and tricks for success. Public Relations WHAT IS SOCIAL

Social Media -- Understanding it and Making it Work Preliminary Guidance on Social Media

Correspondence and Rigidity Results on Asymptotically Anti-de Sitter Spacetimes Arick Shao Queen

Low-delay compression for sensor networks Alexandre Guitton University of Oxford, Computing

Towards Measuring Anonymity Claudia Diaz, Stefaan Seys, Joris Claessens, Bart Preneel Presented

Before we start Boeing night at RIT Welcome to Virtual Theatre March 28th 6-8pm

Wavelets for progressive transmission/compression of images The SPIHT method WTBV WS 2016/17

Layer Optimization: Congestion Control CS 118 Computer Network Fundamentals Peter Reiher

8. Strings and Tries http://aofa.cs.princeton.edu Orientation Second half of class Surveys

Theoretical Computer Science Bridging Course - Introduction / General Info Summer Term 2016

network science and social science on Twitter mor naaman rutgers SC&I | social media

Social media for equality bodies Adam Zbiejczuk & Jaroslav Faltus - Social media for equality

SOCIAL MEDIA & NON PROFITS Tips and tricks for success. Public Relations WHAT IS SOCIAL