Knowledge from Social Data in Web Prof. Jonice Oliveira UFRJ - - PowerPoint PPT Presentation

knowledge from social data in web
SMART_READER_LITE
LIVE PREVIEW

Knowledge from Social Data in Web Prof. Jonice Oliveira UFRJ - - PowerPoint PPT Presentation

Big Social Data: Analyzing and Extracting Knowledge from Social Data in Web Prof. Jonice Oliveira UFRJ Federal University of Rio de Janeiro DCC Computer Science Department CORES - Social Computing and Social Network Analysis Laboratory


slide-1
SLIDE 1

Big Social Data: Analyzing and Extracting Knowledge from Social Data in Web

  • Prof. Jonice Oliveira

UFRJ – Federal University of Rio de Janeiro DCC – Computer Science Department CORES - Social Computing and Social Network Analysis Laboratory

slide-2
SLIDE 2

CORES - Social Computing and Social Network Analysis Laboratory 2

Social Networks are NOT…

2

slide-3
SLIDE 3

CORES - Social Computing and Social Network Analysis Laboratory 3

3

slide-4
SLIDE 4

CORES - Social Computing and Social Network Analysis Laboratory 4

SOCIAL Data

 From crowd

 Social Media

 Events, opinions, social networks,...

 Mobile

 Location, Routes, Interactions, Emotions,

Velocity, ...

 Sensors

 Movement, Noise, …

 Web logs

 Access and updates

 Public Cameras

 Images!

slide-5
SLIDE 5

CORES - Social Computing and Social Network Analysis Laboratory 5

SOCIAL Data

 About crowd

 Official agencies

 Demography  Health  Transportation  Entertainment/Sports/Public Events  Violence  …

slide-6
SLIDE 6

CORES - Social Computing and Social Network Analysis Laboratory 6

Big Social Data

Data Size Speed of Change and Propagation Data Sources

Volume Velocity Variety

Uncertainty

  • f Data

Veracity

slide-7
SLIDE 7

CORES - Social Computing and Social Network Analysis Laboratory 7

What do we research?

 People interaction  People’s role in a group  Understanding and prediction of events  Recommendation of ‘things’/resources

 Documents  Routes  Groups  ...

slide-8
SLIDE 8

CORES - Social Computing and Social Network Analysis Laboratory 8

What do we research? Science | Academia Urban Centers

slide-9
SLIDE 9

Opinion Mining ETL (Extraction, Transformation and Load)

Analysis Level User Interface Level

Linking Mining Historical Information

Data Level

Behavioral Pattern Identification

Mining Level

Social Media

Scientific Sources

Patents Curricula

Publications CF Proposal Projects

Propagation Analysis Contextual Identification

Trend Prediction

Sociogram Visualization Reports

Identification of Reliability Information Influence and Relevance Detection

Dynamic Analysis

Social Scorecard

slide-10
SLIDE 10

ETL (Extraction, Transformation and Load) Historical Information

Data Level

Opinion Mining

Analysis Level User Interface Level

Linking Mining

Behavioral Pattern Identification

Mining Level

Propagation Analysis Contextual Identification

Trend Prediction

Sociogram Visualization Reports

Identification of Reliability Information Influence and Relevance Detection

Dynamic Analysis

Social Media Scientific Sources

Patents Curricula

slide-11
SLIDE 11

ETL (Extraction, Transformation and Load) Historical Information

Data Level

Publications

Opinion Mining

Analysis Level User Interface Level

Linking Mining

Behavioral Pattern Identification

Mining Level

Propagation Analysis Contextual Identification

Trend Prediction

Sociogram Visualization Reports

Identification of Reliability Information Influence and Relevance Detection

Dynamic Analysis

Social Media Scientific Sources

slide-12
SLIDE 12

Opinion Mining ETL (Extraction, Transformation and Load)

Analysis Level User Interface Level

Linking Mining Historical Information

Data Level

Behavioral Pattern Identification

Mining Level

Social Media

Scientific Sources

Patents Curricula

Publications CF Proposal Projects

Propagation Analysis Contextual Identification

Trend Prediction

Sociogram Visualization Reports

Identification of Reliability Information Influence and Relevance Detection

Dynamic Analysis

slide-13
SLIDE 13

Opinion Mining ETL (Extraction, Transformation and Load)

Analysis Level User Interface Level

Linking Mining Historical Information

Data Level

Behavioral Pattern Identification

Mining Level

Social Media

Scientific Sources

Patents Curricula

Publications CF Proposal Projects

Propagation Analysis Contextual Identification

Trend Prediction

Sociogram Visualization Reports

Identification of Reliability Information Influence and Relevance Detection

Dynamic Analysis

slide-14
SLIDE 14

CORES - Social Computing and Social Network Analysis Laboratory 14

Traffic Conditions Based on Twitter

 Static analysis

 Tweets in last 60 minutes  Remove interrogative sentences  Sentimental analysis: Positive, Negative or Neutral

 Problems in Linha Vermelha  Without problems in Linha Vermelha  Fast and Easy Traffic in Linha Vermelha #sqn (irony)

 Main streets – Dynamical analysis

slide-15
SLIDE 15

CORES - Social Computing and Social Network Analysis Laboratory 15

Traffic Conditions Based on Twitter

 Dynamical analysis  There are not tweets in last 60 minutes

 “We do not have enough information”

 Different opinions

 Interval between most recent-conflicting tweets

 > 15 minutes – last tweet  ≤ 15 minutes – #positive tweets - # negative tweets

 # negative > #positive tweets : “Probably you are in traffic jam”

slide-16
SLIDE 16

CORES - Social Computing and Social Network Analysis Laboratory 16

Traffic Conditions Based on Twitter

 Dynamical analysis  There are not tweets in last 60 minutes

 “We do not have enough information”

 Different opinions

 Interval between most recent-conflicting tweets

 > 15 minutes – last tweet  ≤ 15 minutes – #positive tweets - # negative tweets

 # negative > #positive tweets : “Probably you are in traffic jam”

slide-17
SLIDE 17

CORES - Social Computing and Social Network Analysis Laboratory 17

Traffic Conditions Based on Twitter

 Average by day Reliable users Common Users All Users Precision 0,4175 0,25 0,2925 Recall 0,75 0,375 0,625 Accuracy 0,542 0,225 0,275

LAUAND, B. ; OLIVEIRA, J. . TweeTraffic: ferramenta de análise das condições de trânsito baseado nas informações do Twitter. In: II Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2013 (in portuguese).

slide-18
SLIDE 18

CORES - Social Computing and Social Network Analysis Laboratory 18

Protests in Brazil (2013)

 Started in June – raises in bus fares  Biggest street demonstrations

 20 years ago - citizens took to the streets to demand

the impeachment of their president on corruption charges

 Social media has played an important role:

 Organization  Police brutality

slide-19
SLIDE 19

CORES - Social Computing and Social Network Analysis Laboratory 19

Protests in Brazil (2013)

 Supervised approach

 Categorized: positive, negative and neutral  Naive Bayes classifier

 70% - training  30% - test

slide-20
SLIDE 20

CORES - Social Computing and Social Network Analysis Laboratory 20

Protests in Brazil (2013)

Accuracy (A), Variance (V), Standard Deviation (DP), Precision (P%), Recall (R%), Macro-Averaged (Ma-A) e F-score (F%)

FRANCA, T. ; Oliveira, Jonice . Análise de Sentimento de Tweets Relacionados aos Protestos que

  • correram no Brasil entre Junho e Agosto de 2013. In: III Brazilian Workshop on Social Network Analysis

and Mining (BraSNAM), 2014. (in portuguese)

A(%) V DP P% R% Ma-A F% Corpus Positive Tweets 90% 0.0325 0.1803 79% 87% 1.18 83% Corpus Negative Tweets 72% 0.0325 0.1803 85% 77% 1.05 81%

slide-21
SLIDE 21

Opinion Mining ETL (Extraction, Transformation and Load)

Analysis Level User Interface Level

Linking Mining Historical Information

Data Level

Behavioral Pattern Identification

Mining Level

Social Media

Scientific Sources

Patents Curricula

Publications CF Proposal Projects

Propagation Analysis Contextual Identification

Trend Prediction

Sociogram Visualization Reports

Identification of Reliability Information

Influence and Relevance Detection Dynamic Analysis

slide-22
SLIDE 22

CORES - Social Computing and Social Network Analysis Laboratory 22

  • Retweet Network
  • User with a high number of followers are not

necessarily influencers. Ex: Paulo Coelho

  • 20 graphs (timestamp = 2 days)
  • Network evolution = diameter and quantity
  • f nodes

THEODORO, I. et al. Análise dos Influenciadores dos Protestos Brasileiros de 2013 via Twitter. In: III Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2014 (in portuguese).

slide-23
SLIDE 23

CORES - Social Computing and Social Network Analysis Laboratory 23

Protests in Brazil (2013)

  • Tweets – June, 23 to August, 02
  • Hashtags used in the search:

23

slide-24
SLIDE 24

CORES - Social Computing and Social Network Analysis Laboratory 24

Protests in Brazil (2013)

  • ‘Prestige’ by Wasserman e Faust [1994]
  • Degree Prestige – average of out-

degree

  • Proximity

Prestige – Eigenvector centrality

  • Status or Rank Prestige – (in  out)

PageRank

24

slide-25
SLIDE 25

CORES - Social Computing and Social Network Analysis Laboratory 25

Influence and Relevance Detection

  • VRABL, S. et al #twintera!: A social matching environment

based on microblogging. In: 15th International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2011.

  • Zudio, P. ; MENDONCA, L. ; Oliveira, Jonice . Um método para

recomendação de relacionamentos em redes sociais científicas heterogêneas. In: XI Simpósio Brasileiro de Sistemas Colaborativos (SBSC), 2014 (in portuguese)

slide-26
SLIDE 26

Opinion Mining ETL (Extraction, Transformation and Load)

Analysis Level User Interface Level

Linking Mining Historical Information

Data Level

Behavioral Pattern Identification

Mining Level

Social Media

Scientific Sources

Patents Curricula

Publications CF Proposal Projects

Propagation Analysis Contextual Identification

Trend Prediction

Sociogram Visualization Reports

Identification of Reliability Information Influence and Relevance Detection

Dynamic Analysis

Social Scorecard

slide-27
SLIDE 27

CORES - Social Computing and Social Network Analysis Laboratory 27

Monitoring the Cancer Research

  • ALBUQUERQUE, R. P. ; et al. Studying Group Dynamics

through Social Networks Analysis in a Medical Community. Social Networking, v. 03, p. 134-141, 2014.

  • https://www.youtube.com/watch?v=E0TlQWOIjoY
slide-28
SLIDE 28

CORES - Social Computing and Social Network Analysis Laboratory 28

Thanks

slide-29
SLIDE 29
slide-30
SLIDE 30

CORES - Social Computing and Social Network Analysis Laboratory 30

https://sites.google.com/site/brasnam/

slide-31
SLIDE 31

CORES - Social Computing and Social Network Analysis Laboratory 31

Contact Jonice Oliveira

UFRJ – Federal University of Rio de Janeiro CORES - Social Computing and Social Network Analysis Laboratory jonice@dcc.ufrj.br  http://lattes.cnpq.br/0990344839864230