Dynamics, Complexity install R and RStudio, linked from here: - - PowerPoint PPT Presentation

dynamics complexity
SMART_READER_LITE
LIVE PREVIEW

Dynamics, Complexity install R and RStudio, linked from here: - - PowerPoint PPT Presentation

Plan (15:00-16:20) Social and Cognitive Dynamics with Natural Data 10 minutes - theoretical preamble 20 minutes - text as a source of dynamics Rick Dale and David W. Vinson 20 minutes - bridging variables in rich social data


slide-1
SLIDE 1

Social and Cognitive Dynamics with Natural Data

Rick Dale and David W. Vinson Cognitive & Information Sciences University of California, Merced cognaction.org

SCHOOL OF SOCIAL SCIENCES HUMANITIES AND ARTS

c gsCI

Plan (15:00-16:20)

  • 10 minutes - theoretical preamble
  • 20 minutes - text as a source of dynamics
  • 20 minutes - bridging variables in rich social data
  • 10 minutes - other tools to share
  • 20 minutes - questions and hands-on play

tinyurl.com/icps-workshop

install R and RStudio, linked from here:

tinyurl.com/icps-workshop

Dynamics, Complexity

i’m attending icps for the first time and so far it is quite exciting and even as i present this workshop right now i’m missing some other fascinating workshops and talks… but i shouldn’t remind you here of that either!

dynamics: structure in time complexity: interdependent processes

slide-2
SLIDE 2

“Big Data” Lots of Data

natural data in large volumes, and sometimes at high velocity, variety + issues of validity (“Four V’s”)

Plan (15:00-16:20)

  • 10 minutes - theoretical preamble
  • 20 minutes - text as a source of dynamics
  • 20 minutes - bridging variables in rich social data
  • 10 minutes - other tools to share
  • 20 minutes - questions and hands-on play

cognaction.org/rick/icps-workshop

Text as Dynamics

Laura Allen, ASU (McNamara lab)
 (Ph.D. student) Nick Duran, ASU 
 (Assistant Professor)

slide-3
SLIDE 3

General Strategy

Treat text as a source of temporal patterns of behavior Transform the data in a manner that can be subjected to new, emerging dynamic analysis

recurrence quantification analysis (RQA)

20 40 60 80 100 120 140 20 40 60 80 100 120 140 Time (letter) Time (letter) All I really need is a song in my heart Food in my belly and love in my family All I really need is a song in my heart And love in my family

All I really need is a song in my heart Food in my belly and love in my family All I really need is a song in my heart And love in my family

(19,99) (116,116) (65,6)

Recurrence Plot (RP)

line of identity

All I really need is a song in my heart Food in my belly and love in my family All I really need is a song in my heart And love in my family All I really need is a song in my heart Food in my belly and love in my family All I really need is a song in my heart And love in my family

20 40 60 80 100 120 140 20 40 60 80 100 120 140 Time (letter) Time (letter)

Temporal Patterns

Lyrics Quote

100 200 300 400 100 200 300 400 Time (letter) Time (letter) 100 200 300 100 200 300 Time (letter) Time (letter)
slide-4
SLIDE 4

Recurrence rate (%REC): Total percentage of the plot occupied by points. %

http://www.recurrence-plot.tk

%REC (or, RR) RQA Measures

100 200 300 400 100 200 300 400 Time (letter) Time (letter) 100 200 300 100 200 300 Time (letter) Time (letter)

7.0% 8.2%

%

Quote Lyrics

100 200 300 400 100 200 300 400 Time (letter) Time (letter) 100 200 300 100 200 300 Time (letter) Time (letter)

Percent determinism (%DET): Percentage of the points on the plot that fall on diagonal lines (length > 1).

http://www.recurrence-plot.tk

%DET (or, DET) RQA Measures

18.1% 26.4%

Quote Lyrics

TASA Corpus

37,808 paragraphs from books and textbooks Each approximately 300-400 words Language Arts, Science, Social Studies

(Touchstone Applied Science Associates, Inc.)

Includes reading difficulty score word-stem dynamics reflect genre and reading difficulty?

Mean Recurrence Rate (RR)

Mean RR (%) 1 1.275 1.55 1.825 2.1 Overall Easy Difficult

language arts social studies science

**** R =10%

2

**** X

slide-5
SLIDE 5

Local vs. Global Cohesion?

  • 20

40 60 80 100 120 140 10 20 30 40 50 60 Window size Mean DET (%)

  • ****

16 18 20 22 24 Overall Easy ficult

language arts social studies science

Text as Dynamics

Laura Allen, ASU (McNamara lab)
 (Ph.D. student) Nick Duran, ASU 
 (Assistant Professor)

Recurrent patterns in word usage could be predictive of how texts support learning; interacts with genre…

Plan (15:00-16:20)

  • 10 minutes - theoretical preamble
  • 20 minutes - text as a source of dynamics
  • 20 minutes - bridging variables in rich social data
  • 10 minutes - other tools to share
  • 20 minutes - questions and hands-on play

tinyurl.com/icps-workshop

Dynamics, Complexity

i’m attending icps for the first time and so far it is quite exciting and even as i present this workshop right now i’m missing some other fascinating workshops and talks… but i shouldn’t remind you here of that either!

dynamics: structure in time complexity: interdependent processes

slide-6
SLIDE 6

Yelp, Inc. Dataset

Dave Vinson, Ph.D. student, UCM IBM Ph.D. Fellow

General Strategy

Fuse measures of cognition with context variables Sample from the large data set, in a manner that permits exploration of interdependence

information and graph theory

Adam T.’s network

Yelp Data Format

{ 'type': 'user', 'user_id': (encrypted user id), 'friends': [(friend user_ids)], 'stars': (star rating,1-5), 'text': (review text), 'name': (first name), 'review_count': (review count), 'votes': {(vote type): (count)}, 'fans': (num_fans), … }

Friends Text

JSON

slide-7
SLIDE 7

Lexical Richness

High richness (low ACI): … unprecedented photography, modern and contemporary art conceived before World War II, to American and European fashion designs portraying the 18th

  • century. Fortunately for the flexible

museum hours, one may lolly-gag through the corridors on Wednesday ...one of the most captivating subjects of art... juxtaposes racist caricatures such as Malcolm X and Martin Luther King Jr. Interestingly, this mixed media, graffiti style display withheld a deep meaning of America's religious fervor and external cultural adolescence…. Low richness (low ACI): … This is a great place for lunch and dinner. The food is great, the price is good and the service is friendly and quick.

ACIj = − 1 N − 1

N

X

i=2

log2p(wi|wi−1)

AUI (bits / word) Frequency 9 10 11 12 2000 5000

bits

−1.0 −0.5 0.0 0.5 0.0 2.5

Edges Residual ACI

Baseline True Network

community innovation community alignment

−1.0 −0.5 0.0 0.5 0.0 2.5

Edges Residual ACI

Baseline True Network

Edges (z-score)

Tr

Social context may modulate subtle aspects of communication strategy, measurable in large natural data.

2017

Plan (15:00-16:20)

  • 10 minutes - theoretical preamble
  • 20 minutes - text as a source of dynamics
  • 20 minutes - bridging variables in rich social data
  • 10 minutes - other tools to share
  • 20 minutes - questions and hands-on play

tinyurl.com/icps-workshop

slide-8
SLIDE 8

Other Resources

Focus on libraries designed for larger datasets. Among many: dplyr a clear example (for managing, filtering, transforming large data frames). Two from my lab: crqa and cmscu tinyurl.com/icps-workshop

crqa

  • Led by Moreno Coco, a version of

tools for conducting RQA on categorical and continuous time series.

  • With help of the R community, crqa

is almost twice as fast as its standard comparison toolbox, in MATLAB.

  • See paper by Coco & Dale (2014) for

summary (linked on workshop website).

cmscu

  • Fast and memory-efficient way of

generating frequency tables for n- grams.

  • Vinson, Davis, Sindi & Dale (2016)

describe how you can be certain of arbitrarily small error in estimating frequency of (say) unigrams and bigrams in large corpora.

  • Quick deployment on corpora, and can

be used to estimate wide variety of statistical and information-theoretic measures.

cmscu vs. tm http://languagegoldmine.com http://www.dataonthemind.org

slide-9
SLIDE 9

Plan (15:00-16:20)

  • 10 minutes - theoretical preamble
  • 20 minutes - text as a source of dynamics
  • 20 minutes - bridging variables in rich social data
  • 10 minutes - other tools to share
  • 20 minutes - questions and hands-on play

cognaction.org/rick/icps-workshop

Some Code

part I part 2

tinyurl.com/icps-workshop

Thanks

David Vinson (UC Merced) Laura Allen (ASU) Nick Duran (ASU) Moreno Coco (Edinburgh) Alexandra Paxton (UC Berkeley) Danielle McNamara (ASU)

BCS-0826825 BCS-1344279

SCHOOL OF SOCIAL SCIENCES HUMANITIES AND ARTS

c gsCI