From AI K to AI D : Acquiring Social Media Intelligence via `Big - - PowerPoint PPT Presentation

from ai k to ai d acquiring social media intelligence via
SMART_READER_LITE
LIVE PREVIEW

From AI K to AI D : Acquiring Social Media Intelligence via `Big - - PowerPoint PPT Presentation

From AI K to AI D : Acquiring Social Media Intelligence via `Big Data Huan Liu Arizona State University SBP-BRiMS2017, DC AI, Social Media Intelligence, Big Data 1 Data Mining and Machine Learning Lab Thanks to Former & Current PhD


slide-1
SLIDE 1

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

1

From AIK to AID: Acquiring Social Media Intelligence via `Big’ Data

Huan Liu

slide-2
SLIDE 2

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

2

Thanks to Former & Current PhD Students

  • Reza Zafarani, Asst Prof, Syracuse U
  • Xia Hu, Asst Prof, Texas A&M U
  • Magdiel Galan, Intel
  • Shamanth Kumar, Castlight Health
  • Pritam Gundecha, IBM Res Almaden
  • Jiliang Tang, Asst Prof, MSU
  • Huiji Gao, LinkedIn
  • Ali Abbasi, Machine Zone
  • Salem Alelyani, Asst Prof, King Khalid U
  • Xufei Wang, LinkedIn
  • Geoffrey Barbier, AFRL
  • Lei Tang, Clari
  • Zheng Zhao, Google
  • NiUn Agarwal, Chair Prof, UALR
  • Sai Moturu, PostDoc, MIT Media Lab
  • Lei Yu, Assc Prof, Binghamton U, NY
  • Robert Trevino, AFRL
  • Yunzhong Liu, LeEco, US
  • Somnath Shahapurkar, FICO
  • Fred Morsta\er, USC ISI
  • Christophe Faucon
  • Isaac Jones
  • Suhas Ranganath
  • Suhang Wang
  • Tahora Nazer
  • Jundong Li
  • Liang Wu
  • Ghazaleh Beigi
  • Kai Shu
  • JusUn Sampson
slide-3
SLIDE 3

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

3

A Tortuous but Fortuitous Path to Social CompuIng

slide-4
SLIDE 4

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

4

From AIK to AID

  • “Knowledge is Power”: AI was then solely about K

– Expert Systems or Rule-based Systems

  • “Intelligence is ten million rules.”

– Knowledge-based Systems (Cyc)

  • “Data is the New Oil”: AI is now hyped up with D

– Big data is ubiquitous – CS, StaUsUcs, InformaUon Science èData Science

  • Recent surge of AI is powered by Data

– Machine Learning (including Deep Learning) – For any learning algorithm to work, data is key

slide-5
SLIDE 5

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

5

Big Social Media Data

  • Twi\er

– 300 million users – 500 million tweets / day – 1% (5 million) released for research

  • Facebook

– 2 billion users – 422 million updates / day – 196 million photos / day

  • Instagram

– 700 million users – 80 million photos / day

Facebook Degree DistribuUon Instagram Users over Time

slide-6
SLIDE 6

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

6

Discovering Social Media Intelligence

  • Graph Theories
  • Network Measures and Models
  • Data Mining, NLP, and Visual AnalyUcs
  • Community DetecUon and Analysis
  • InformaUon Diffusion
  • Influence and Homophily
  • Recommender Systems
  • Behavior AnalyUcs

– SenUment Analysis

slide-7
SLIDE 7

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

7

Some Challenges in Acquiring SM Intelligence

  • Social media data seems really big, but why

are we onen sUll short of data?

– How can we make data `bigger’?

  • Data is power, so it can produce any result

– Can we algorithmically evaluate the results from big data?

  • We don’t know what we don’t know

– How can we know if our result of social media analysis is of any value?

slide-8
SLIDE 8

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

8

Making Big Data “Bigger”

  • What is big data?

– A convenUonal answer is 4Vs – A pracUUoner’s answer is more nuanced

  • Big data can be actually li.le or thin
  • For machine learning or data mining to

work, the more data, the be,er

– Make li\le data bigger – Make thin data thicker

slide-9
SLIDE 9

Recent Advances in Feature SelecIon: A Data PerspecIve

Arizona State University Michigan State University

KDD2017 Tutorial, Halifax, Canada

9

  • Sparsity becomes exponenUally worse as

feature dimensionality increases

– ConvenUonal distance metric becomes ineffecUve as far and near neighbors have similar distances

Curse of Dimensionality: Required Samples

3 samples per unit region 1 sample per region 1/3 sample per region

http://nikhilbuduma.com/2015/03/10/the-curse-of-dimensionality/

slide-10
SLIDE 10

Recent Advances in Feature SelecIon: A Data PerspecIve

Arizona State University Michigan State University

KDD2017 Tutorial, Halifax, Canada

10

Relevant, Redundant and Irrelevant Features

  • Feature selecUon retains relevant features for

learning and removes redundant or irrelevant ones

  • For a binary classificaUon task below, f1 is relevant,

f2 is redundant given f1, and f3 is irrelevant

slide-11
SLIDE 11

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

11

Feature selecUon selects an `opUmal’ subset of relevant features from the original high- dimensional data given a certain criterion

Feature SelecIon

slide-12
SLIDE 12

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

12

Feature SelecIon and scikit-feature

  • Feature selecUon can make data `bigger’

– Assuming all binary a\ribute values in our toy example – Before FS, 5/210 = 5/1024, aner FS, 5/23 = 5/8

  • Does FS always work?

– Yes, for most high-d data

  • Where can we find it?
  • scikit-feature, an open-

source repository in Python

slide-13
SLIDE 13

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

13

Making Thin Data

  • Most people like many of us

are in the long tail

– Our data is thin or sparse – With li\le data, machine learning is powerless

  • Social media data offers new opportuniUes

– MulUple facets: posts, profile, linked informaUon – MulUple platorms that offer different funcUons

  • Two case studies

– Feature selecUon using social network informaUon – ConnecUng users across more than one social media site

slide-14
SLIDE 14

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

14

Making Sense of Big Data

  • For big social-media data, we want to

automaUcally get a sense of what it is

– User needs, senUment, opinions, behavior, and trends

  • A big part of big data is TEXT
  • NLP and text mining can help extract topics from

text

  • If these machine-learned topics are for human

consumpUon, are they actually comprehensible?

– How can comprehensibility be measured?

slide-15
SLIDE 15

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

15

Measuring Topic Interpretability

  • How to measure interpretability of topics

generated from machine learning?

  • One common way is to indirectly measure

predicUve performance of these learned topics

– The higher the performance (say, accuracy), the be\er – Does it really measure interpretability? – Human experts seem to be the best evaluator

  • But involving human experts in evaluaUon may

not be scalable and reproducible

  • Hence, it is a challenging problem
slide-16
SLIDE 16

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

16

Big Text Data

  • Some example corpora:
  • Too much data to read
  • How can we begin to understand all of these

large bodies of text data?

Source Size Wikipedia 36 million arUcles World Wide Web 100+ billion staUc web pages Social Media 500 million new tweets each day

slide-17
SLIDE 17

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

17

Topic Models

slide-18
SLIDE 18

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

18

Measuring Interpretability

  • How do we measure the interpretability of

staUsUcal topic models

  • A dilemma

– Experts are credible, but not scalable, – Crowdsourcing needs no experts, so scalable, but has no exper4se, thus is not credible

slide-19
SLIDE 19

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

19

A Measure of Topic Interpretability

  • Model Precision
  • It shows a Turker 6 words in random order

– Top 5 words from the topic – 1 “Intruded” word – Ask the Turker to idenUfy the “Intruded” word

MPmodel,topic = # Correct Guesses /Total # Guesses

Chang, Jonathan, Sean Gerrish, Chong Wang, Jordan L. Boyd-Graber, and David M. Blei. "Reading Tea Leaves: How Humans Interpret Topic Models." In Advances in Neural InformaUon Processing Systems, pp. 288-296. 2009. cat dog bird truck horse snake

Topic i:

slide-20
SLIDE 20

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

20

Observing Model Precision (MP)

What does Model Precision measure? What doesn’t Model Precision measure? It seems we need another measure

slide-21
SLIDE 21

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

21

Measuring Coherence – Another Measure

  • Model Precision Choose Two
  • Nearly the same setup as Model Precision:

– Difference: A Turker is asked to choose top two words

  • IntuiUon: if the topic is coherent, then it would be

difficult to consistently choose a second word

slide-22
SLIDE 22

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

22

A ComparaIve Example

Model Precision Model Precision Choose Two

slide-23
SLIDE 23

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

23

News Corpus for Experiments

Property Value Documents 258,919 Tokens 6,888,693 Types 214,957 Yahoo! News Dataset

slide-24
SLIDE 24

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

24

  • Yahoo! News, Run with K = 10, 25, 50, 100.
  • “Random” Topics

Can MPCT Replace MP?

slide-25
SLIDE 25

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

25

MPCT vs. MP

MPCT Complements MP

  • Both measures are needed

with li\le extra overhead 0 0 | 1 0 0 1 | 1 1

slide-26
SLIDE 26

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

26

Summary

  • MPCT measures a topic’s within-topic distance
  • MPCT complements Model Precision
  • MPCT provides another dimension of topic quality

– Low correlaUon with Model Precision (ρ = 0.29)

  • Topics and scripts: h\p://bit.ly/mpchoose2
  • A recent blog post on the topic @

h\p://www.kdnuggets.com/2016/11/measuring- topic-interpretability-crowdsourcing.html

Fred Morsta\er and Huan Liu. ``A Novel Measure for Coherence in StaUsUcal Topic Models", AssociaUon of ComputaUonal LinguisUcs (ACL), August 2016. Berlin, Germany

slide-27
SLIDE 27

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

27

Addressing Don’t-Know-Don’t-Know Problems

  • When collecUng data, we onen don’t know

when we have a sufficient amount

– We don’t know when to stop collecUng, though we can’t collect forever

  • A dilemma in studying migraBon on social

media :

– If we know its existence, no need for the study – If we don’t know, how can we verify the result?

slide-28
SLIDE 28

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

28

IllustraIve Examples of DNDN

  • 1. When-to-Stop Dilemma: CollecUng data forever
  • vs. having credible pa\erns

– How much data vs. how credible

  • 2. Is There MigraUon on Social Media?

– Users are a primary source of revenue

  • Ads, RecommendaUons, Brand loyalty

– New SM sites need to a.ract users for expansion – ExisUng SM sites need to retain their users – CompeBBng for a,enBon entails the discovery of migraUon pa\erns

slide-29
SLIDE 29

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

29

MigraIon on Social Media

  • Site MigraUon

– Users leave a site by profile deleUon or profile removal – Difficult to convince a user who len to return – Hard to study these users cross sites because we need their registraUon informaUon

  • AaenIon MigraIon

– Users become inacUve on a site – A harbinger for site migraUon – Can be detected by observing user ac4vi4es across sites – Can take acUon to prevent site migraUon aner understanding migraUon pa\erns

Site 2 Site 3 Site 1 Site 2 Site 3 Aner Ume t Site 2 Site 3 Site 1 Aner Ume t Site 2 Site 3 Site 1

slide-30
SLIDE 30

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

30

Paaerns from ObservaIon

slide-31
SLIDE 31

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

31

Do We Know What We Didn’t Know?

  • If a pa\ern is significant, it is valid

– Significant differences observed in StumbleUpon, Twi\er, and YouTube

  • When to stop?

Stop when we are certain, conUnue otherwise

slide-32
SLIDE 32

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

32

“9 Bizarre and Surprising Insights from Data Science”

A ScienIfic American Guest Blog

  • 1. Pop-Tarts before a hurricane (Walmart)
  • 2. Higher crime, more Uber rides (Uber)
  • 3. Typing with proper capitalizaUon indicates

creditworthiness (A financial services startup)

  • 4. Users of the Chrome and Firefox browsers make

beaer employees (An HR firm over Xerox data)

  • 8. Female-named hurricanes are more deadly

(University Researchers) … Yes, they are bizarre, but are they true?

slide-33
SLIDE 33

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

33

EvaluaIon without Ground Truth

The CACM arUcle can be found at dl.acm.org

slide-34
SLIDE 34

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

34

More Challenges Ahead

  • EsUmaUng the impact of an event

– E.g., not all misinformaUon is catastrophic

  • PredicUng the future not the past

– Are they two sides of the same coin?

  • PredicUng general elecUon result with Twi\er data?
  • AutomaUng measures to replace crowdsourcing

evaluaUon

– Problems with evaluaUon methods involving AMT

slide-35
SLIDE 35

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

35

Revisit Challenges in Acquiring SM Intelligence

  • Social media data is obviously big, but why

are we onen sUll short of data?

– How can we make data `bigger’?

  • Data is power, so it can produce any result

– Can we algorithmically evaluate the results from big data?

  • We don’t know what we don’t know

– How can we know if our result of social media analysis is of any value?

slide-36
SLIDE 36

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

36 36

  • scikit-feature – an open source feature

selecUon repository in Python

  • Social CompuUng Repository
  • Some books available for free download

Repositories and Recent Books

slide-37
SLIDE 37

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

37 37

hap://dmml.asu.edu/smm/

slide-38
SLIDE 38

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

38

Discovering Social Media Intelligence

  • Graph Theories
  • Network Measures and Models
  • Data Mining, NLP, and Visual AnalyUcs
  • Community DetecUon and Analysis
  • InformaUon Diffusion
  • Influence and Homophily
  • Recommender Systems
  • Behavior AnalyUcs

– SenUment analysis hap://dmml.asu.edu/smm/

slide-39
SLIDE 39

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

39

THANK YOU ALL & Conference Organizers

  • for this opportunity to share our research
  • Acknowledgments

– Grants from NSF, ONR, ARO, among others – DMML members and project leaders – Collaborators: CMU (Minerva), CRA (IARPA- CAUSE) More informaUon by searching for “Huan Liu” or at h\p://www.public.asu.edu/~huanliu

slide-40
SLIDE 40

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

40

Further Readings

  • Jundong Li and Huan Liu. ``Challenges of Feature SelecUon for

Big Data AnalyUcs", Special Issue on Big Data, IEEE Intelligent

  • Systems. 32 (2), 9-15. 2017
  • Fred Morsta\er and Huan Liu. ``A Novel Measure for

Coherence in StaUsUcal Topic Models", AssociaUon of ComputaUonal LinguisUcs (ACL), August 2016. Berlin, Germany

  • Reza Zafarani and Huan Liu. ``EvaluaUon without Ground Truth

in Social Media Research", CommunicaUons of ACM, Volume 58 Issue 6, June 2015 Pages 54-60.

  • Lei Tang and Huan Liu. "Community DetecUon and Mining in

Social Media", Morgan & Claypool Publishers, September 2010.

slide-41
SLIDE 41

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

41

Making Thin Data

  • Most people like many of us are in the long tail

– Our data is thin or sparse – Without li\le data, machine learning is powerless

  • Social media data offers new opportuniUes

– Linked informaUon – MulUple platorms as they offer different funcUons

  • Two case studies

– Feature SelecUon using social network informaUon – ConnecUng users across more than one social media site

slide-42
SLIDE 42

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

42

Use Link InformaIon for Data Thickening

  • Where can we find addiUonal informaUon for

feature selecUon

  • Social media data contains various types of data

– Link informaUon is addiUonal – Other sources such as senUment, like, etc.

  • Are there theories to guide us in using link info?
  • Social influence
  • Homophily
  • ExtracUng disUncUve relaUons from linked data

for feature selecUon

slide-43
SLIDE 43

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

43

RepresentaIon for Social Media Data

Social Context

slide-44
SLIDE 44

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

44

  • 1. CoPost
  • 2. CoFollowing
  • 3. CoFollowed
  • 4. Following

RelaIon ExtracIon

slide-45
SLIDE 45

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

45

EvaluaIon Results on Digg Data

slide-46
SLIDE 46

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

46

Summary

  • LinkedFS is evaluated under varied circumstances

to understand how it works – Link informaUon can help feature selec4on for social media data

  • Unlabeled data is more onen in social media,

unsupervised learning is more sensible, but also more challenging

Jiliang Tang and Huan Liu. `` Unsupervised Feature SelecUon for Linked Social Media Data'', the Eighteenth ACM SIGKDD InternaUonal Conference on Knowledge Discovery and Data Mining , 2012. Jiliang Tang, Huan Liu. ``Feature SelecUon with Linked Data in Social Media'', SIAM InternaUonal Conference on Data Mining, 2012.

slide-47
SLIDE 47

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

47 47

  • CollecUvely, social media data is indeed big
  • For an individual, however, the data is li.le

– How much acUvity data do we generate daily? – How many posts did we post this week? – How many friends do we have?

  • When “big” social media data isn’t big,

– Searching for more data with liale data

  • We use different social media services for varied

purposes

– LinkedIn, Facebook, Twi\er, Instagram, YouTube, …

Gather more Data with Liale Data

slide-48
SLIDE 48

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

48 48

  • Liale data about an individual

+ Many social media sites

  • ParIal InformaUon

+ Complementary InformaUon

> Beaer User Profiles

An Example

Twiaer LinkedIn

Age LocaIon EducaIon

Reza Zafarani

N/A Tempe, AZ ASU

Can we connect individuals across sites?

ConnecIvity is not available Consistency in InformaIon Availability

Reza Zafarani and Huan Liu. ``ConnecUng Users across Social Media Sites: A Behavioral-Modeling Approach", the Nineteenth ACM SIGKDD InternaUonal Conference on Knowledge Discovery and Data Mining (KDD'2013), August 11 - 14, 2013. Chicago, Illinois.

slide-49
SLIDE 49

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

49 49

  • Each social media site can have varied amount
  • f user informaUon
  • Which informaUon definitely exists for all sites?

– But, a user’s usernames on different sites can be different

  • Our work is to connect the informaUon of the

same user provided across sites

Searching for More Data with Liale Data

slide-50
SLIDE 50

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

50 50

MOdeling Behavior for IdenUfying Users across Sites

  • InformaUon shared across sites provides a

behavioral fingerprint

– How to capture and use differenUable a\ributes

Our Behavior Generates InformaIon Redundancy

MOBIUS

  • Behavioral

Modeling

  • Machine

Learning

slide-51
SLIDE 51

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

51 51

Behaviors Human LimitaIon

Time & Memory LimitaIon Knowledge LimitaIon

Exogenous Factors

Typing Paaerns Language Paaerns

Endogenous Factors

Personal Aaributes & Traits Habits

slide-52
SLIDE 52

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

52 52

Behavior 1 Behavior 2 Behavior n InformaBon Redundancy InformaBon Redundancy InformaBon Redundancy Feature Set 1 Feature Set 2 Feature Set n

Generates Captured Via

Learning Framework

Data

IdenBficaBon FuncBon

Behavioral Modeling Approach with Learning

slide-53
SLIDE 53

AI, Social Media Intelligence, Big Data

Arizona State University Data Mining and Machine Learning Lab

SBP-BRiMS2017, DC

53 53

  • Gathering more data is onen necessary for

effecUve data mining

  • Reducing dimensionality can make data bigger
  • Social media data provides unique opportuniUes

to do so by using different sites and abundant user-generated content

  • TradiUonally available data can also be tapped to

make thin data “thicker”

Summary – Making Data Bigger

Jundon Li, et al. ``Feature SelecUon: A Data PerspecUve", 2016. h\p://arxiv.org/abs/1601.07996 Reza Zafarani and Huan Liu. ``ConnecUng Users across Social Media Sites: A Behavioral-Modeling Approach", SIGKDD, 2013.