Sources for User Profile Learning: a Big Data Study Aleksandr - PowerPoint PPT Presentation

Harvesting Multiple Sources for User Profile Learning: a Big Data Study Aleksandr Farseev , Liqiang Nie, Mohammad Akbari, and Tat-Seng Chua

What is user profile? 2

What is human mobility? • Mobility - contemporary paradigm, which explores various types of people movement. 3

What is human mobility? • Mobility - contemporary paradigm, which explores various types of people movement. • The movement of people • The quality or state of being mobile • (Physiology) the ability to move physically • (Sociology) movement within or between social classes and occupations • (Chess) the ability of a chess piece to move around 4 the board

Why human mobility? • Urban planning: understand the city and optimize services • Mobile applications and recommendations: study the user and offer services 5

If we want to know more? Mobility can describe people 6

Assistance Marketing Activity Trade are analysis recommendation, Demography and Venue interest - based recommendation, marketing Advertisement Wellness Etc. Demography and Health group interest - based prediction personalized Lifestyle Tent to stay at home, Morning excursive advertisement recommendation visit local pubs and with medium shopping mall daily. intensity. Medium overweight, Advertise new Beer 7 potential hypertonia brand and new car and diabetes. models.

User profile: Mobility + Demography User profile Mobility profile Demographic profile Location Movement Age Gender Personality Occupation preference patterns 8

Multiple sources describe user from multiple views More than 50% of online- active adults use more than one social network in their daily life* 9 *According Paw Research Internet Project's Social Media Update 2013 (www.pewinternet.org/fact-sheets/social-networking-fact- sheet/)

Multiple sources describe user from multiple views 10

Research Problems  Multi-source user profiling: • Geographical user mobility profiling • User demographic profiling • Data incompleteness • Multi – source multi – modal data integration 11

Multi-source dataset: NUS-MSS* *http://lms.comp.nus.edu.sg/ 12 research/NUS-MULTISOURCE.htm

NUS-MSS: Data sources 13

NUS-MSS: Data collection 14

NUS-MSS: Dataset Description 11,732,489 366,268 263,530 15 7,023

NUS-MSS: Dataset Statistics in Singapore 18

Demographic profiling 19

Data representation A text analysis software. • Linguistic features • LIWC • User Topics • Heuristic features • Writing behavior Dictionary Word category 80 An efficient and Percentage (%) effective method for 60 studying the various 40 emotional, cognitive, structural, and process 20 components present in 0 individuals' verbal and Qmarks Unique Dic Sixltr funct pronoun ppron i we you shehe they ipron article verb auxverb past present future adverb preps conj negate quant number swear social family written speech samples. 21 Can be highly related to one’s demography.

Data representation • Linguistic features • LIWC • User Topics • Behavioral features • Writing behavior Users of similar gender and age may talk about LDA word distribution similar topics e.g. over 50 topics for collected female users – about Twitter timeline. shopping, male – about cars; youth – about school while elderly – 22 about health.

Data representation Feature name Description Number of hash tags Number of hash tags mentioned in message • Linguistic features Number of slang words Number of slang words one use in his tweets. We calculate number of slang words / tweet and compute average slang • LIWC usage • User Topics Number of URLs Number of URL’s one usually use in his/her tweets Number of user mentions Number of user mentions – may represent one’s social activity • Heuristic features Number of repeated chars Number of repeated characters in one tweets (e.g. noooooooo, • Writing behavior wahhhhhhh) Number of emotion words Number of words that are marked with not – neutral emotion score in Sentiment WordNet Number of emoticons Number of common emoticons from Wikipedia article As we mention from our Average sentiment level Module of average sentiment level of tweet obtained from Sentiment WordNet research – user’s writing Average sentiment score Average sentiment level of tweet obtained from Sentiment WordNet behavioral patterns are Number of misspellings Number of misspellings fixed by Microsoft Word spell checker highly correlated with Number Of Mistakes Number of words that contains mistake but cannot be fixed by e.g. age (individuals Microsoft Word spell checker from 10 – 20 years old Number of rejected tweets Number of tweets where 70% of words either not in English or cannot be fixed by Microsoft Word spell checker are making two times Number of terms average Average number of terms per / tweet less grammatical errors Number of Foursquare check- Number of Foursquare check-ins performed by user ins 23 than 20 -30 years old Number of Instagram medias Number of Instagram medias posted by user Number of Foursquare tips Number of Foursquare Tips that user post in a venue individuals) Average time between check- Average time between two sequential check-ins - represents ins min Foursquare user activity frequency

Data representation We map all Foursquare check – ins to Foursquare categories from category hierarchy. • Location features • Location semantics • Location topics Venue semantics such as venue categories can be related to users For case when user performed check-ins in two restaurants demography. E.g. and airport but did not perform check-ins in other venues: individuals who tent to visit night clubs are 𝑫𝒃𝒖𝒇𝒉𝒑𝒔𝒛 𝟐 … 𝑫𝒃𝒖𝒇𝒉𝒑𝒔𝒛 𝒔𝒇𝒕𝒖𝒃𝒗𝒔𝒃𝒐𝒖 … 𝑫𝒃𝒖𝒇𝒉𝒑𝒔𝒛 𝒃𝒋𝒔𝒒𝒑𝒔𝒖 … 𝑫𝒃𝒖𝒇𝒉𝒑𝒔𝒛 𝒐 usually belong to 10 – 20 𝑽 𝟐 0 0 2 0 1 0 0 or 20 – 30 years old age 24 groups. … * * * * * * * 𝑽 𝒐 * * * * * * *

Data representation • Image features • Image concept learning Extracted image concepts may represents user interests and be related to one’s demography. For example female user may take pictures of flowers, food, while male – of cars or buildings. 25 *The concept learning Tool was provided by Lab of Media Search LMS. It was evaluated based on ILSVRC2012 competition dataset and performed with average accuracy @10 - 0.637

Ensemble learning 26

Ensemble learning 𝑇𝑑𝑝𝑠𝑓𝑠(𝑚) 𝑒 𝑗 × 𝑥 𝑗 × 𝑚 𝑗 𝑙 𝑄(𝑚) 𝑗 × 𝑒 𝑗 × 𝑥 𝑗 × 𝑚 𝑗 𝑇𝑑𝑝𝑠𝑓 𝑚 = 𝑙 𝑗=0 𝑄(𝑚) 𝑗 - model prediction confidence 𝑒 𝑗 - normalized data records number 27 𝑥 𝑗 - model trust weight 𝑚 𝑗 - model “strength” – learned by “Hill Climbing” optimization with step 0.05

Ensemble learning details  According to our evaluation, the bias of estimated ages does not exceed ±2.28 years. It is thus reasonable to use the estimated age for age group prediction task.  We have adopted SMOTE* oversampling to obtain balanced age-group labeling  By performing 10-fold cross validation, we determine the optimal number of constructed random trees for each classifier with iteration step equal to 5 as 45, 25, 35, 40, 105 random trees for Random Forest Classifiers learned based on location, LIWC, heuristic, LDA 50, and image concept features respectively.  We jointly learn the l i model “strength” coefficient by performing “Hill Climbing” optimization* * with step 0.05. The randomized “Hill Climbing” approach is able to obtain local optimum for non-convex problems and, thus, can produce resolvable ensemble weighting. *N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial 28 intelligence research, 2002. **An iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution. If the change produces a better solution, an incremental change is made to the new solution, repeating until no further improvements can be found.

Experimental results (Singapore) 29

Demographic mobility 30

Geographical user mobility: users movement (city level) 32

Geographical user mobility: users movement (city level) • Singapore population is concentrated in several regions, which represent peoples' housing (Regions 2 and 3) and working (Region 3) areas. 33 • There are some regions where male (Blue markers) user check-in density is much higher than female (Pink markers).

Geographical user mobility: users movement (region level) 34

Sources for User Profile Learning: a Big Data Study Aleksandr - PowerPoint PPT Presentation

Harvesting Multiple Sources for User Profile Learning: a Big Data Study Aleksandr Farseev , Liqiang Nie, Mohammad Akbari, and Tat-Seng Chua What is user profile? 2 What is human mobility? Mobility - contemporary paradigm, which

Sources Sources: Kinds of Sources Citizen witness Confidential informants Anonymous

Sources of Start Sources of Start- -up Capital up Capital up Capital Sources of Start Sources

RC circuits with DC sources A Circuit i (resistors, voltage sources, v C current sources,

Select the best sources by Currency Select the checking best sources by Range Select the

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

WebSite Re-design Presentation Ricardo Triana SKYTOURS 1 2 Name of the files:

dating + big data user . . . . . . . . dating + big data user profile limit access? .

Sources of Authority Sources of Authority Sources of Authority Lesson No. 3 ENV H 471

voice Kate Howland End-user programming? End-user programming? End-user programming?

User Pays User Committee User Pays User Committee 8 th August 2011 1 2 Agenda

Presentation on Electron Sources Chapter 5 Presented By, Ved Prakash Verma (Thermionic

Data Sources; SCNL Data Sources Data sources producing waveform data can come from a remote

WHAT ARE SOURCES SOUGHT NOTICES AND REQUESTS FOR INFORMATION? 5/6/2020 1 SOURCES SOUGHT NOTICE

STATE LEVEL ( STATE LEVEL (California California) 4 SOURCES OF LAW 4 SOURCES OF LAW by Brandon

Topic 12: Texture Mapping Motivation Sources of texture Texture coordinates Bump

Screen 1 Go to www.myenroll.com < Click Request User ID and Password> Acquire USER ID and

ConnectHome Nation Webinar Strategies and Best Practices for Launching a Strong Digital Inclusion

Learning Collaborative Strategic Planning for Suicide Prevention Learning Module 1: Strategic

Toward More High-Resolution Projections of rising heat stress over the western Maritime Continent

Coverage Transition Models Boston Small Group Convening April 23, 2012 Carolyn Ingram Senior

Membranes & used for any commercial purpose without the written permission of the owners.

De Demyst stify fying P ng Psy sychogr chographi phic M c Marketing ng Multi-View

COLLABORATIVE DEVELOPMENT OF INTERACTIVE LAB REPORT WRITING TOOL Fran Clements and Anuj Bhargava

(TOPCAT) AHA Nov 18, 2013 Late Breaking Session Marc A. Pfeffer MD, PhD, on behalf of the TOPCAT

Sambuz

Useful Links

Newsletter

Mail Us

Sources for User Profile Learning: a Big Data Study Aleksandr - PowerPoint PPT Presentation

Harvesting Multiple Sources for User Profile Learning: a Big Data Study Aleksandr Farseev , Liqiang Nie, Mohammad Akbari, and Tat-Seng Chua What is user profile? 2 What is human mobility? Mobility - contemporary paradigm, which

Sources Sources: Kinds of Sources Citizen witness Confidential informants Anonymous

Sources of Start Sources of Start- -up Capital up Capital up Capital Sources of Start Sources

RC circuits with DC sources A Circuit i (resistors, voltage sources, v C current sources,

Select the best sources by Currency Select the checking best sources by Range Select the

RUN groupadd -r user &amp;&amp; useradd -r -g user user USER user $ docker run --read-only debian

WebSite Re-design Presentation Ricardo Triana SKYTOURS 1 2 Name of the files:

dating + big data user . . . . . . . . dating + big data user profile limit access? .

Sources of Authority Sources of Authority Sources of Authority Lesson No. 3 ENV H 471

voice Kate Howland End-user programming? End-user programming? End-user programming?

User Pays User Committee User Pays User Committee 8 th August 2011 1 2 Agenda

Presentation on Electron Sources Chapter 5 Presented By, Ved Prakash Verma (Thermionic

Data Sources; SCNL Data Sources Data sources producing waveform data can come from a remote

WHAT ARE SOURCES SOUGHT NOTICES AND REQUESTS FOR INFORMATION? 5/6/2020 1 SOURCES SOUGHT NOTICE

STATE LEVEL ( STATE LEVEL (California California) 4 SOURCES OF LAW 4 SOURCES OF LAW by Brandon

Topic 12: Texture Mapping Motivation Sources of texture Texture coordinates Bump

Screen 1 Go to www.myenroll.com &lt; Click Request User ID and Password&gt; Acquire USER ID and

ConnectHome Nation Webinar Strategies and Best Practices for Launching a Strong Digital Inclusion

Learning Collaborative Strategic Planning for Suicide Prevention Learning Module 1: Strategic

Toward More High-Resolution Projections of rising heat stress over the western Maritime Continent

Coverage Transition Models Boston Small Group Convening April 23, 2012 Carolyn Ingram Senior

Membranes &amp; used for any commercial purpose without the written permission of the owners.

De Demyst stify fying P ng Psy sychogr chographi phic M c Marketing ng Multi-View

COLLABORATIVE DEVELOPMENT OF INTERACTIVE LAB REPORT WRITING TOOL Fran Clements and Anuj Bhargava

(TOPCAT) AHA Nov 18, 2013 Late Breaking Session Marc A. Pfeffer MD, PhD, on behalf of the TOPCAT

Sambuz

Useful Links

Newsletter

Mail Us

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

Screen 1 Go to www.myenroll.com < Click Request User ID and Password> Acquire USER ID and

Membranes & used for any commercial purpose without the written permission of the owners.