Imputing Missing Social Media Data Stream in Multisensor Studies of - - PowerPoint PPT Presentation

imputing missing social media data stream in multisensor
SMART_READER_LITE
LIVE PREVIEW

Imputing Missing Social Media Data Stream in Multisensor Studies of - - PowerPoint PPT Presentation

Saha, K., et al. 2019. Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior. In Proceedings of International Conference on Affective Computing and Intelligent Interaction (ACII 2019) .,


slide-1
SLIDE 1

Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior

Saha, K., Reddy, M. D., Das Swain, V., Gregg, J. M., Grover, T., Lin, S., Martinez, G. J., Mattingly, S. M., Mirjafari, S., Mulukutla, R., Nies, K., Robles-Granda, P., Sirigiri, A., Yoo,

  • D. W., Audia, P., Campbell, A. T., Chawla, N. V., D’Mello, S. K., Dey, A. K., Jiang, K., Liu,

Q., Mark, G., Moskal, E., Striegel, A., & De Choudhury, M.

Koustuv Saha, Georgia Tech

Saha, K., et al. 2019. Imputing Missing Social Media Data Stream in Multisensor Studies of Human

  • Behavior. In Proceedings of International Conference on Affective Computing and Intelligent Interaction

(ACII 2019)., http://koustuv.com/papers/ACII19_SM_Imputation.pdf

slide-2
SLIDE 2

Sensing Human Behavior

Survey Instruments

  • Self-Report Questionnaires

Passive Sensing

  • Smartphones and Wearables
  • Social Media

Active Sensing

  • Ecological Momentary

Assessments (EMAs)

2

slide-3
SLIDE 3

Social Media as a Passive Sensor

u Naturalistic setting u Unobtrusive access u Longitudinal and Extended Periods (beyond

study period)

u Verbal and Behavioral

There are limitations associated with the social media data stream

3

slide-4
SLIDE 4

Limitations (Social Media Data Stream)

Retrospective in nature:

So, the availability and quality of data depends on the social media use of the participant

4

slide-5
SLIDE 5

Limitations (Social Media Data Stream)

Not everybody is on social media

Social Media population skewed towards young adults (Pew, 2018)

5

slide-6
SLIDE 6

Limitations (Social Media Data Stream)

Data collection challenges

Changing nature of social media APIs (Facebook, Twitter, Instagram, Linkedin, etc.)

6

slide-7
SLIDE 7

Consequences in Studies of Human Behavior

Multimodal Sensing Studies have to focus on:

  • a very (social media) active participant cohort: hurts

generalizability and recruitment

  • disregard those with no social media data: hurts scalability
  • disregard the capability of social media data stream

altogether: hurts multisensor-fusion capabilities

7

slide-8
SLIDE 8

Our work concerns…

…how can we leverage the potential of social media data in multimodal sensing studies of human behavior, while navigating the challenges and limitations of acquiring social media data?

8

slide-9
SLIDE 9

Our work contributes…

… a statistical framework to impute missing social media features by learning individuals’ observed behaviors from other passive sensing streams (Bluetooth beacons, wearables, and smartphone sensors).

9

slide-10
SLIDE 10

Social Ecological Framework

Human behaviors and attributes can be considered to be deeply embedded in the complex interplay between an individual, their relationships, the communities they belong to, and societal factors+.

+Ralph Catalano. 1979. Health, behavior and the community: An ecological perspective. Pergamon Press New York.

10

slide-11
SLIDE 11

The Tesserae Project

757

Participants

By leveraging passive sensors, this study aims at proactively identifying changes in an individual that may impact their wellbeing and job performance

Wearable Smartphone BT Beacon Social Media Surveys

11

slide-12
SLIDE 12

Data and Problem (Predicting Psychological Attributes)

v 603 participants with physical sensor (Bluetooth, Smartphone, and

Wearable) data

v 496 participants with social media (Facebook) data (~82% of the dataset)

Therefore,

v to include all the participants, we could only incorporate the physical

sensor features, Or,

v To include all the sensor modalities, we can only include a subset of

participants.

Imputing missing social media features help us use all sensors and all participants’ data

12

slide-13
SLIDE 13

Feature Engineering

v Features known in theory to be predictive of psychological constructs

(personality traits, affect)

v Physical Sensor Features: heart rate, heart rate variability, sleep, stress,

step count, physical activity, mobility, phone use, call use, work duration (130 raw features)

v Social Media Features: psycholinguistic attributes (LIWC), top n-grams,

sentiment, social capital (number of check-ins, engagement, activity with friends, etc.) (5,077 raw features)

13

slide-14
SLIDE 14

Feature Engineering (Selection & Transformation)

Sensor Derived Features

HRV, Stress Fitness, Sleep, Phy. Activity, Location, Phone Use, Desk Activity

Sensor Transformed Features Feature Selection on Coefficient of Variance (cv) Drop features with cv > (mean + 2*stdev.) Feature Selection on Pairwise Correlation (r) Drop features from those pairs that show |r| > 0.8 Feature Transformation with Principal Component Analysis (PCA) Transform features to n components Based on explained variance (130) (124) (51) (30) Social Media Derived Features

LIWC + N-grams + Sentiment + Social Capital

Social Media Transformed Features (5,077) (4,806) (3,716) (200)

14

slide-15
SLIDE 15

Imputing Social Media Features

FaFebook Features Sensors Features

−0.2 −0.1 0.0 0.1 0.2

Pearson’s correlation (r) ranges between -0.21 and 0.22 showing the likelihood of weak correlation Can sensor features predict social media features?

15

slide-16
SLIDE 16

Imputing Social Media Features: Methods

For each of the 200 social media transformed features, we build a separate model that:

  • uses the sensor transformed features as the independent

variables and

  • predicts the corresponding social media transformed

feature as the dependent variable.

16

slide-17
SLIDE 17

Imputing Social Media Features: Results

k-fold cross-validation and pooled accuracy (Pearson’s correlation (r) between actual and predicted features

GBR (Gradient Boosted Random Forest Regression) performs the best: mean r = 0.78

0.2 0.4 0.6 0.8 1.0

Correlation

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0

# Components

0ean: 0.78

17

slide-18
SLIDE 18

Is Imputation Effective?

PREDICTING PSYCHOLOGICAL CONSTRUCTS WITH MULTIMODAL SENSING DATA

18

slide-19
SLIDE 19

Predicting Psychological Constructs with Multisensor Data

Participants Type 1 Participants Type 2 Psychological Constructs Actual Feature Set Final Feature Set Base Models (Who have social media data)

SS .X + X’ : Y

1 1 1 1

Models (Who do not have social media data)

S .X : Y

2 2 2

SS .X + X’ : Y

2 2 2 2

Final Models (All participants)

Imputation Model

  • Imp. X : X’

X’ : Imp(X )

1 1 2 2

S . (X + X ) : (Y + Y )

3 1 2 2 1

SS . (X + X ) + (X + X’ ) : (Y + Y )

3 1 2 1 1 1 2

Y Y X X X’ X X’ X X’

Sensor Transformed Features Social Media Transformed Features S .X : Y

1 1 1 1 2 1 2 1 2 1 2 1

19

slide-20
SLIDE 20

Predicting Psychological Constructs with Multisensor Data

We evaluate all our prediction models using three kinds of algorithms:

v Linear Regression v Gradient Boosted Regression v Neural Network Regression

The above algorithms cover a broad spectrum of algorithm families

20

slide-21
SLIDE 21

Effectiveness of Social Media Feature Imputation

SMAPE comparing three models that use physical sensor features vs. those that use sensor and imputed features to predict psychological constructs on the entire dataset

Outcome: Imputed Social Media Features Improve Predictions

10 20 30 40 50 60

60A3E %

extraversion agreeaEleness FonsFientiousness neurotiFisP

  • Senness

Sos.aIIeFt neg.aIIeFt stai.trait

6ensors+IPSuted F% 6ensors

10 20 30 40 50 60

60A3E %

extraversion agreeaEleness FonsFientiousness neurotiFisP

  • Senness

Sos.aIIeFt neg.aIIeFt stai.trait

6ensors+IPSuted F% 6ensors

10 20 30 40 50 60

60A3E %

extraversion agreeaEleness FonsFientiousness neurotiFisP

  • Senness

Sos.aIIeFt neg.aIIeFt stai.trait

6ensors+IPSuted F% 6ensors

Linear Regression Gradient Boosted Regression Multilayer Perceptron

21

slide-22
SLIDE 22

Robustness Against Other Imputation Approaches

SMAPE comparing prediction models that use sensor features vs. those that use sensor and mean- / random- imputed features

Outcome: Mean / Random Imputation does not improve (or even depletes) predictions

10 20 30 40 50 60

60A3E %

extraversion agreeaEleness FonsFientiousness neurotiFisP

  • Senness

Sos.aIIeFt neg.aIIeFt stai.trait

6ensors+IPSuted F% 6ensors

10 20 30 40 50 60

60A3E %

extraversion agreeaEleness FonsFientiousness neurotiFisP

  • Senness

Sos.aIIeFt neg.aIIeFt stai.trait

6ensors+IPSuted F% 6ensors

Mean Imputation Random Imputation

22

slide-23
SLIDE 23

Discussion

  • Contribution: A framework to impute social media

features in longitudinal and large-scale multimodal sensing studies of human behavior

  • Theoretically situated in the Social Ecological Model
  • Similar approach can be applied for other sensors
  • Ethics: Should imputation be done on those individuals

who do not want to share their social media data?

23

slide-24
SLIDE 24

Ethics

  • Latent dimensions do not necessarily translate to social

media activity or behavior

  • Caution against the use as a means to surveil
  • Should imputation be done on those individuals who do

not want to share their social media data?

24

slide-25
SLIDE 25

This research is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA Contract No. 2017-17042800007.

Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior

Thank You @kous2v| koustuv.saha@gatech.edu | koustuv.com

Saha, K., et al. 2019. Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior. In Proceedings of International Conference

  • n Affective Computing and Intelligent Interaction (ACII 2019)., http://koustuv.com/papers/ACII19_SM_Imputation.pdf