Mining personal media thresholds for opinion dynamics and social - PowerPoint PPT Presentation

Mining personal media thresholds for opinion dynamics and social influence Alex Meandzija

Defining the Problem Transaction ID Items Dataset: 1 ABC I is a set of discrete items ( i ) 2 DE T is a set of transactions ( t ) such that t ⊆ I 3 AB 4 CDE Question: What sets of items are frequently found together in the transactions?

Example Applications • Market Basket Analysis  Determining frequently co-purchased products can inform store layout • Survey Data  Discrete data from surveys can be mined for trends and profiles • Website Logs  Pages frequently visited in the same session by users can be hyperlinked to each other

Support Measure Support is the frequency at which an item or item set X is found in the transactions T and is defined as: Sup( X , T ) = |{ t ∊ T ; X ⊆ t }| | T | Example: Transaction ID Items Supp(A) = .5 1 ABC Supp(C) = .5 2 DE 3 AB Supp(AC) = .25 4 BCD

Apriori Algorithm Theorem: Supp(AB) ≤ Supp(A) & Supp(AB) ≤ Supp(B) Proof: any set Y such that Y ⊆ AB must satisfy Y ⊆ A and A ⊈ AB Therefore, if Supp( X ) ≤ Supp_min we can eliminate all item sets Y such that Y ⊆ X. The Apriori Algorithm uses this approach to build the frequent itemsets.

Apriori Algorithm (continued) Figure and Plot from Datamining and Analysis P. 247-248 (Zaki & Meira 2014 )

ECLAT Optimization Horizontal Database formatting allows for faster computation of support values. t( XY ) = t( X ) ⋂ t( Y ) & Supp( X ) = |t( X )| New prefixes can be generated by intersecting their bases, and their support is simply its cardinality. Transaction ID Items Items A B C D E 1 ABC TIDs 1 1 1 2 2 2 DE 3 3 4 4 3 AB 4 4 BCD

ECLAT Optimization (continued) Figure and Plot from Datamining and Analysis P. 251-252 (Zaki & Meira 2014 )

Association Rules Often, it is more important to find directional associations rather than simple frequent item sets. We call these associations rules. Examples: Eggs, Butter, and Sugar → Flour In order to draw association rules from the frequent item sets one must use interestingness measures.

Interestingness Measures Confidence : Conf( X → Y ) = Supp( XY ) Supp( X ) Lift: Supp( XY ) Lift( X → Y ) = Supp( X ) ∗ Supp( Y ) Jaccard: Supp( XY ) Jacc( X → Y ) = Supp( X ) + Supp( Y ) − Supp( XY )

Media Threshold Survey Participants asked to self-identify the number of social media items they would need to see before forming or shifting opinions given: Media types: Controversy levels: Media sources: Images: for still photos and Low: minimal (some people 1. Unknown: individual has no 1. 1. drawings would form an opinion) knowledge of the source Videos: for any animations or Medium: generally 2. Like-minded: the source of the 2. 2. moving picture controversial (most would media generally thinks similarly Messages: for text, tweets, form an opinion) to the recipient 3. and Facebook posts High: very controversial 3. Different-minded: the source 3. (most or all would form an of the media generally thinks opinion) differently from the recipient None: no reference to 4. controversy

Binning Survey Data In order to mine a dataset, it must be split up into a discrete set of items. Contiguous or open-ended response should be binned such that: 1. The bin resolution is broad enough to keep frequencies above the minimal support. 2. The bin resolution is fine enough that information is not lost in the binning process.

Responses binned by %-deviation from Avg. Individual responses log2 binned

Filtering Through the Rules One of the major challenges of data mining is the massive quantity of rules it can generate. Interestingness measures, problem considerations, and bloat reduction measures can greatly reduce the overall quantity of rules. Examples: Interestingness measure: Requiring a sizeable Minimum lift or Confidence. • Problem Considerations: If one is looking to find the variables that effect average response, requiring • average response on the RHS. Bloat Reduction: Accepting only maximal frequent item sets (Eliminating any FISs which have supersets • with equivalent support.) Filter(min) FC rule count FC %-remain FS rule count FS %-remain Support(.12/.15) 873,998 100.00 2,584,330 100.00 Confidence(.6) 360,644 41.26 1,096,151 42.42 Lift(3) 3,801 0.43 68,878 2.67 Maximal 784 0.09 25,329 0.98

Filtering – Community Detection? The prior filtering techniques can help to pull out uninteresting or unimportant rules for our data, but they little in the way of parsing the data we have found. Community detection provides a way to further filter our results by placing them into communities which can be used the base unit for analysis. Additionally, community detection can find rule clusters with substitutable items (butter and margarine) and help to pull out unneeded complexity.

Rules and Items as Bipartite Graph Fixed-Source rules as Network Fixed-Context rules as Network

SpeakEasy Community Detection Influences on choice of algorithm: Label Propagation used based community detection resilient to graph topology. • History based approach reduces the impact of random initial conditions and prevents cascades. • Multiple runs reconciled with ARI to find most representative partition. • Does not require user to set the number of partitions. • Written here at RPI! •

SpeakEasy Community Algorithm

Preliminary Results – Modular Rules The first major category of communities were sets of rules where only a couple items differed from any given rule.

Preliminary Results – Equivalent Items The second set of Communities of note were communities where mutually exclusive items co-occurred.

Mining personal media thresholds for opinion dynamics and social - PowerPoint PPT Presentation

Mining personal media thresholds for opinion dynamics and social influence Alex Meandzija Defining the Problem Transaction ID Items Dataset: 1 ABC I is a set of discrete items ( i ) 2 DE T is a set of transactions ( t ) such that t I

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2013 Outline Outline

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2011 Outline Outline

For personal use only For personal use only For personal use only For personal use only For

What we monitor and why Streams Fisheries thresholds Stream Environment Zones SEZ

Thresholds in random graphs with focus on thresholds for k -regular subgraphs Pawe Praat

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Opinion Extraction Task Opinion Mining Reviews A popular topic in opinion analysis is

Multi Agency Guidance for Thresholds of Need and Intervention Multi Agency Thresholds

Exercise 8: Thresholds Beginners FLUKA Course Exercise 8: Thresholds First part Aim: see

Exercise 2: Thresholds FLUKA Advanced Course Exercise 2: Thresholds Aim of the exercise: 1.

Opinion Mining Exercises Feiyu Xu DFKI 12/13/13 Language Technology I 1 Opinion Mining

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & Adam Funk

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Opinion Integration Through Opinion Integration Through Semi supervised Topic Modeling

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Opinion Dynamics Self-Organization (summer-term 2014) July 21, 2014 Self-Organization

Optimising overtime planning Federica Sarro, PhD Research Associate, CREST centre Department of

Large Biological used for any commercial purpose without the written permission of the owners.

WHY YOU SHOULD BE EXCITED ABOUT CSS SHAPES By ChenHuiJing / @hj_chen LONG AGO, THE WEB

from environmental isolates Nicolas Kieffer* 1 , Julia Guzmn Puche 2 , Hyo Jung Kang 3 , Che Ok

Ham Sandwich Theorem Carola Wenk 3/8/16 1 CMPS 6640/4040 Computational Geometry Ham-Sandwich

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke

spam, ham and other food or how to distribute spam to 100k email addresses Who am I? Debian

Introduction to Machine Learning 5. Support Vector Classification Alex Smola Carnegie Mellon

Mining personal media thresholds for opinion dynamics and social - PowerPoint PPT Presentation

Mining personal media thresholds for opinion dynamics and social influence Alex Meandzija Defining the Problem Transaction ID Items Dataset: 1 ABC I is a set of discrete items ( i ) 2 DE T is a set of transactions ( t ) such that t I

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2013 Outline Outline

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2011 Outline Outline

For personal use only For personal use only For personal use only For personal use only For

What we monitor and why Streams Fisheries thresholds Stream Environment Zones SEZ

Thresholds in random graphs with focus on thresholds for k -regular subgraphs Pawe Praat

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Opinion Extraction Task Opinion Mining Reviews A popular topic in opinion analysis is

Multi Agency Guidance for Thresholds of Need and Intervention Multi Agency Thresholds

Exercise 8: Thresholds Beginners FLUKA Course Exercise 8: Thresholds First part Aim: see

Exercise 2: Thresholds FLUKA Advanced Course Exercise 2: Thresholds Aim of the exercise: 1.

Opinion Mining Exercises Feiyu Xu DFKI 12/13/13 Language Technology I 1 Opinion Mining

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion &amp; Adam Funk

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Opinion Integration Through Opinion Integration Through Semi supervised Topic Modeling

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Opinion Dynamics Self-Organization (summer-term 2014) July 21, 2014 Self-Organization

Optimising overtime planning Federica Sarro, PhD Research Associate, CREST centre Department of

Large Biological used for any commercial purpose without the written permission of the owners.

WHY YOU SHOULD BE EXCITED ABOUT CSS SHAPES By ChenHuiJing / @hj_chen LONG AGO, THE WEB

from environmental isolates Nicolas Kieffer* 1 , Julia Guzmn Puche 2 , Hyo Jung Kang 3 , Che Ok

Ham Sandwich Theorem Carola Wenk 3/8/16 1 CMPS 6640/4040 Computational Geometry Ham-Sandwich

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke

spam, ham and other food or how to distribute spam to 100k email addresses Who am I? Debian

Introduction to Machine Learning 5. Support Vector Classification Alex Smola Carnegie Mellon

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & Adam Funk