IN MEXICO ( EL ESTADO DE NIMO DE LOS TUITEROS EN MXICO) Gera rard - - PowerPoint PPT Presentation

in mexico
SMART_READER_LITE
LIVE PREVIEW

IN MEXICO ( EL ESTADO DE NIMO DE LOS TUITEROS EN MXICO) Gera rard - - PowerPoint PPT Presentation

THE MOOD OF TWITTERERS IN MEXICO ( EL ESTADO DE NIMO DE LOS TUITEROS EN MXICO) Gera rard rdo Leyva va Octo tober, r, 20 2018 The three pillars of official statistics SURVEYS ADMINISTRATIVE CENSUSES REGISTERS The three four


slide-1
SLIDE 1

THE MOOD OF TWITTERERS IN MEXICO

(EL ESTADO DE ÁNIMO DE LOS TUITEROS EN MÉXICO)

Octo tober, r, 20 2018 Gera rard rdo Leyva va

slide-2
SLIDE 2

The three pillars of official statistics

CENSUSES SURVEYS

ADMINISTRATIVE REGISTERS

slide-3
SLIDE 3

The three four pillars of official statistics

CENSUSES SURVEYS

ADMINSITRATIVE

REGISTERS

BIG-DATA

slide-4
SLIDE 4

The Big Data definition evolves

Initially, it was about...

 Volume  Velocity  Variety  Veracity  Value

Instead...

Big Data is a flexible approach to use and re-use the

totality of a data set, structured or not, in a diversity of possible purposes, normally different to those that originated the information set in the first place.

The 3 V’s

slide-5
SLIDE 5

BIG DATA

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...” Dan Ariely

slide-6
SLIDE 6

Big Data (Google trends)

https://www.google.com.mx/trends/ @abxda

slide-7
SLIDE 7

PARADIGMS

Small data Big data

slide-8
SLIDE 8

Convergence of two agendas

Big data. Subjective Well Being (Martin Seligman).

slide-9
SLIDE 9

General idea

Goal: : Auto tomaticall lly measure re and re report rt th the mood of f tw twittere rers in in México. Meth thod: : superv rvised le learn rning

  • Humans tag a training set of tweets:
  • The system learns to automatically tag (classify) tweets as close as

possible to the way humans would have done it.

slide-10
SLIDE 10

Since February 2014

Coll llectin ing tw tweets

slide-11
SLIDE 11

More than 300 million tweets

slide-12
SLIDE 12

Set of tagged tweets

 9 330 people from Universidad Tecmilenio and INEGI.  Manually tagged 54 131 tweets.  Multiple tagging of each tweet.  Classification system:

https://cienciadedatos.inegi.org.mx/pioanalisis/

slide-13
SLIDE 13

Estar enamorada es como ir en un Ferrari a 240 kms/h. Se siente CHINGON pero sabes que en cualquier momento viene el putazo

(:

slide-14
SLIDE 14

18

SVM SVM SVM

Final solution

Tra raining se set Va Vali lidati tion se set Normali lized tw tweets

31

slide-15
SLIDE 15

19

SVM SVM SVM

Optimal results

(Assamble of SVM) Tra raining se set Va Vali lidati tion se set Normali lized tw tweets

slide-16
SLIDE 16

Norm rmali lizatio ion Vector r re repre resentation Clas lassification

Uncla lassif ified tw tweets Hundreds of millions of tagged tweets

Goal: Automatically classifying tweets

slide-17
SLIDE 17

The process for sentiment classification

 Cleaning  Text normalization  Vector representation of text  Training of the Machine Learning algorithm  Text classification on the fly

slide-18
SLIDE 18

“Clean” Tweets

Cleaning of the tagged set

Cle leaning

(contradictions and repetitions) (Entropy)

Tagged Tweets ts

Cleaning

Contradictions and repetitions

Cleaning

Entropy

slide-19
SLIDE 19

“Clean” Tweets

Cleaning of the tagged set (cleaning)

Cle leaning

(contradictions and repetitions) (Entropy)

Tagged Tweets ts

slide-20
SLIDE 20

Text normalization

Po Polarity ty of f Emoti ticons

(polarity tag)

Q-Grams 3,4 ,4,5,7

(q=4)

Oth thers

slide-21
SLIDE 21

Example of text normalization

ORIGINAL TEXT:

pésiiiimo auto :( @autoX fallan frenos y sistema de entretenimiento; no lo compren

NORMALIZED TEXT:

pesiiiimo auto _negativo _user

fallan frenos y

sistema de entretenimiento ; lo no_compren

slide-22
SLIDE 22

Example of text normalization with q-grams

{_pes, pesi, esii, siii, iiii, iiim, iimo, imo_, mo_a, o_au, _aut, auto, uto_, to__,

  • __n, __ne, _neg, nega, egat, gati, ativ, tivo, ivo_, vo__, o__u, __us, _use, user,

ser_, er_f, r_fa, _fal, fall, alla, llan, lan_, an_f, n_fr, _fre, fren, reno, enos, nos_, os_y, s_y_, _y_s, y_si, _sis, sist, iste, stem, tema, ema_, ma_d, a_de, _de_, de_e, e_en, _ent, entr, ntre, tret, rete, eten, teni, enim, nimi, imie, mien, ient, ento, nto_, to_;, o_;_, _;_l, ;_lo, _lo_, lo_n, o_no, _no_, no_c, o_co, _com, comp,

  • mpr, mpre, pren, ren_ }

_pesiiiimo_auto__negativo__user_fallan_frenos_y_sistema_de_entretenimiento_;_ lo_no_compren

q=4

slide-23
SLIDE 23

Example of text normalization with q-grams

{_pes, pesi, esii, siii, iiii, iiim, iimo, imo_, mo_a, o_au, _aut, auto, uto_, to__,

  • __n, __ne, _neg, nega, egat, gati, ativ, tivo, ivo_, vo__, o__u, __us, _use, user,

ser_, er_f, r_fa, _fal, fall, alla, llan, lan_, an_f, n_fr, _fre, fren, reno, enos, nos_, os_y, s_y_, _y_s, y_si, _sis, sist, iste, stem, tema, ema_, ma_d, a_de, _de_, de_e, e_en, _ent, entr, ntre, tret, rete, eten, teni, enim, nimi, imie, mien, ient, ento, nto_, to_;, o_;_, _;_l, ;_lo, _lo_, lo_n, o_no, _no_, no_c, o_co, _com, comp,

  • mpr, mpre, pren, ren_ }

_pesiiiimo_auto__negativo__user_fallan_frenos_y_sistema_de_entretenimiento_;_ lo_no_compren

q=4

slide-24
SLIDE 24

Example of text normalization with q-grams

{_pes, pesi, esii, siii, iiii, iiim, iimo, imo_, mo_a, o_au, _aut, auto, uto_, to__,

  • __n, __ne, _neg, nega, egat, gati, ativ, tivo, ivo_, vo__, o__u, __us, _use, user,

ser_, er_f, r_fa, _fal, fall, alla, llan, lan_, an_f, n_fr, _fre, fren, reno, enos, nos_, os_y, s_y_, _y_s, y_si, _sis, sist, iste, stem, tema, ema_, ma_d, a_de, _de_, de_e, e_en, _ent, entr, ntre, tret, rete, eten, teni, enim, nimi, imie, mien, ient, ento, nto_, to_;, o_;_, _;_l, ;_lo, _lo_, lo_n, o_no, _no_, no_c, o_co, _com, comp,

  • mpr, mpre, pren, ren_ }

_pesiiiimo_auto__negativo__user_fallan_frenos_y_sistema_de_entretenimiento_;_ lo_no_compren

q=4

slide-25
SLIDE 25

Example of text normalization with q-grams

{_pes, pesi, esii, siii, iiii, iiim, iimo, imo_, mo_a, o_au, _aut, auto, uto_, to__,

  • __n, __ne, _neg, nega, egat, gati, ativ, tivo, ivo_, vo__, o__u, __us, _use, user,

ser_, er_f, r_fa, _fal, fall, alla, llan, lan_, an_f, n_fr, _fre, fren, reno, enos, nos_, os_y, s_y_, _y_s, y_si, _sis, sist, iste, stem, tema, ema_, ma_d, a_de, _de_, de_e, e_en, _ent, entr, ntre, tret, rete, eten, teni, enim, nimi, imie, mien, ient, ento, nto_, to_;, o_;_, _;_l, ;_lo, _lo_, lo_n, o_no, _no_, no_c, o_co, _com, comp,

  • mpr, mpre, pren, ren_ }

_pesiiiimo_auto__negativo__user_fallan_frenos_y_sistema_de_entretenimiento_;_ lo_no_compren

q=4

slide-26
SLIDE 26

Vectoral representation of the text

slide-27
SLIDE 27

Machine learning algorithm SVM

slide-28
SLIDE 28

Training the SVM algorithm

Po Positive tw tweets Negati tive tw tweets

slide-29
SLIDE 29

The task of text classification…in a nutshell:

Tagged tweets Production New tweet Decision rule The mood of tweeterers Normalization and vector representation Training Normalization and vector representation

slide-30
SLIDE 30

Positivity quotient

POSIT ITIVES ES

Positivity quotient

NEGATIVES

slide-31
SLIDE 31

11/01/16 12/01/16 01/01/17 02/01/17 03/01/17 04/01/17 05/01/17 06/01/17 07/01/17 08/01/17 09/01/17 10/01/17 11/01/17 12/01/17 01/01/18 02/01/18 03/01/18 04/01/18 05/01/18 06/01/18 07/01/18 08/01/18 09/01/18 Index

The mood of tweeters in Mexico

Showing 2016/Nov-2018/Sep (daily)

Children’s day

(04/30)

.

Osca cars 201 2018 The The Sha Shape of

  • f

Wate ater (03/04)

. . .

Chri Christmas

(12/25 )

.

New New year year

(12/31 & 01/01)

. . . . . .

Deb ebates 201 2018

(04/22, 05/20 & 06/12)

Earth arthquake

(02/19)

Earth arthquakes

(17/09/08 & 19)

.

Vote Vote 201 2018

(07.01)

.

Ger ermany y vs vs Mexi exico

(06/17)

.

Sou South Kor Korea vs vs Mex exico

(06/23)

.

Mex exico vs vs Sw Sweden

(06/27)

.

Mex exico vs vs Brazi Brazil

(06/27)

“Journalist’s day” (01/04)

. .

Elec lections USA SA (11/08

& 09)

. . .

“Gas asolinazo zo”

(01/04 & 05)

Febr February y 14th 14th

.

Chri Christmas

(12/25)

. . .

New New year year

(12/31 & 01/01)

.

Ger ermany y vs vs Mexi exico

(06/17)

MTV TV Award ards

(05/19)

.

slide-32
SLIDE 32

Link:

http://www.inegi.org.mx/ http://www.beta.inegi.org.mx/app/animotu itero/#/app/multiline

slide-33
SLIDE 33

Visualization of Positivity Quotien, according to the selection of the period, the state and temporality Leads people wanting to help to another page

Reference period Help Methodology Help us to classify tweets Mood

slide-34
SLIDE 34

Shows, at the upper right corner, the National level and a selecting bar for the state of interest

Selection of states

Daily, Weekly, Monthly, Quarterly or Annual Indicator

Shows the temporality of the indicator

Shows periods for selection

Calendar

slide-35
SLIDE 35

Shows the number of tweets gathered

Gathering

slide-36
SLIDE 36

Shows, on the map, the states coloured according to the positivity quotien

Map

Shows the tweets of all people in the state or the country

All

Shows the tweets of people residing and present in the state

Residents Visitors

Shows the tweets of people visiting the state

slide-37
SLIDE 37

Other INEGI projects with Twitter:

Domestic tourism. Mental health. Mobility in Mexico City. New agglomerations. Consumer confidence. Insecurity.

slide-38
SLIDE 38

Other INEGI projects with big data:

CFE electricity consumption for nowcasting

  • f industrial activity.

Use

  • f

satellite images for diverse purposes including land cover, agricultural activity and new settlements. Cooperation with Telefonica and BBVA- Bancomer to generate a rapid response system to face natural disasters. Web scraping and scanner data for prices.

slide-39
SLIDE 39

¡Thank you!

slide-40
SLIDE 40

01 800 111 46 34 www.inegi.org.mx atencion.usuarios@inegi.org.mx

Conociendo México

@ INEGI_INFORMA INEGI Informa