Sequential data analysis with TraMineR, Part 1 Gilbert Ritschard - - PowerPoint PPT Presentation

sequential data analysis with traminer part 1
SMART_READER_LITE
LIVE PREVIEW

Sequential data analysis with TraMineR, Part 1 Gilbert Ritschard - - PowerPoint PPT Presentation

Sequential data analysis - 1 Sequential data analysis with TraMineR, Part 1 Gilbert Ritschard Department of Econometrics and Laboratory of Demography University of Geneva http://mephisto.unige.ch/biomining APA-ATI Workshop on Exploratory Data


slide-1
SLIDE 1

Sequential data analysis - 1

Sequential data analysis with TraMineR, Part 1

Gilbert Ritschard

Department of Econometrics and Laboratory of Demography University of Geneva http://mephisto.unige.ch/biomining

APA-ATI Workshop on Exploratory Data Mining University of Southern California, Los Angeles, CA, July 2009

8/7/2009gr 1/66

slide-2
SLIDE 2

Sequential data analysis - 1

Outline

1

Introduction

2

Concepts and definitions

3

Rendering and summarizing state sequences

8/7/2009gr 2/66

slide-3
SLIDE 3

Sequential data analysis - 1 Introduction

Outline

1

Introduction

2

Concepts and definitions

3

Rendering and summarizing state sequences

8/7/2009gr 3/66

slide-4
SLIDE 4

Sequential data analysis - 1 Introduction Objectives

Section outline

1

Introduction Objectives Overview of what you will learn

8/7/2009gr 4/66

slide-5
SLIDE 5

Sequential data analysis - 1 Introduction Objectives

Objectives

Concepts and questioning about sequential categorical data Types of sequences: with or without time content, states, transitions, events. Principles of sequence analysis

exploratory approaches more causal and predictive approaches

Practice of sequence analysis (TraMineR)

8/7/2009gr 5/66

slide-6
SLIDE 6

Sequential data analysis - 1 Introduction Objectives

Objectives

Concepts and questioning about sequential categorical data Types of sequences: with or without time content, states, transitions, events. Principles of sequence analysis

exploratory approaches more causal and predictive approaches

Practice of sequence analysis (TraMineR)

8/7/2009gr 5/66

slide-7
SLIDE 7

Sequential data analysis - 1 Introduction Objectives

Objectives

Concepts and questioning about sequential categorical data Types of sequences: with or without time content, states, transitions, events. Principles of sequence analysis

exploratory approaches more causal and predictive approaches

Practice of sequence analysis (TraMineR)

8/7/2009gr 5/66

slide-8
SLIDE 8

Sequential data analysis - 1 Introduction Objectives

Objectives

Concepts and questioning about sequential categorical data Types of sequences: with or without time content, states, transitions, events. Principles of sequence analysis

exploratory approaches more causal and predictive approaches

Practice of sequence analysis (TraMineR)

8/7/2009gr 5/66

slide-9
SLIDE 9

Sequential data analysis - 1 Introduction Objectives

The research project

Course mainly based on results of NSF project Mining event histories: Towards new insights on personal Swiss life courses Project FN 100012-113998 and FN-100015-122230 Start: February 1, 2007 End: January 31, 2011 Gilbert Ritschard, main applicant Eric Widmer, professor of Sociology, co-applicant Alexis Gabadinho, Demography Nicolas S. M¨ uller, Sociology, Computer science Matthias Studer, Economics, Sociology

8/7/2009gr 6/66

slide-10
SLIDE 10

Sequential data analysis - 1 Introduction Overview of what you will learn

Section outline

1

Introduction Objectives Overview of what you will learn

8/7/2009gr 7/66

slide-11
SLIDE 11

Sequential data analysis - 1 Introduction Overview of what you will learn

Rendering sequences

8/7/2009gr 8/66

slide-12
SLIDE 12

Sequential data analysis - 1 Introduction Overview of what you will learn

Characterizing set of sequences

Sequence of transversal measures (modal state, between entropy, ...) id t1 t2 t3 · · · 1 B B D · · · 2 A B C · · · 3 B B A · · · Summary of longitudinal measures (within entropy, transition rates, mean duration ...) id t1 t2 t3 · · · 1 B B D · · · 2 A B C · · · 3 B B A · · · Other global characteristics: Centro-type sequence, diversity

  • f sequences, ...

8/7/2009gr 9/66

slide-13
SLIDE 13

Sequential data analysis - 1 Introduction Overview of what you will learn

Characterizing set of sequences

Sequence of transversal measures (modal state, between entropy, ...) id t1 t2 t3 · · · 1 B B D · · · 2 A B C · · · 3 B B A · · · Summary of longitudinal measures (within entropy, transition rates, mean duration ...) id t1 t2 t3 · · · 1 B B D · · · 2 A B C · · · 3 B B A · · · Other global characteristics: Centro-type sequence, diversity

  • f sequences, ...

8/7/2009gr 9/66

slide-14
SLIDE 14

Sequential data analysis - 1 Introduction Overview of what you will learn

Characterizing set of sequences

Sequence of transversal measures (modal state, between entropy, ...) id t1 t2 t3 · · · 1 B B D · · · 2 A B C · · · 3 B B A · · · Summary of longitudinal measures (within entropy, transition rates, mean duration ...) id t1 t2 t3 · · · 1 B B D · · · 2 A B C · · · 3 B B A · · · Other global characteristics: Centro-type sequence, diversity

  • f sequences, ...

8/7/2009gr 9/66

slide-15
SLIDE 15

Sequential data analysis - 1 Introduction Overview of what you will learn

Mean time in each state

Missing Full time Part time

  • Neg. break
  • Pos. break

At home Retired Education

Men Women State Mean time in years 5 10 15 8/7/2009gr 10/66

slide-16
SLIDE 16

Sequential data analysis - 1 Introduction Overview of what you will learn

Transition rates

[-> 0] [-> 1] [-> 2] [-> 3] [-> 4] [-> 5] [-> 6] [-> 7] Missing 0.969 0.005 0.004 0.001 0.001 0.011 0.000 0.008 Full time 0.003 0.971 0.009 0.001 0.001 0.013 0.000 0.003 Part time 0.005 0.026 0.939 0.001 0.001 0.018 0.000 0.010

  • Neg. break

0.040 0.047 0.027 0.880 0.000 0.007 0.000 0.000

  • Pos. break

0.105 0.316 0.105 0.000 0.404 0.018 0.000 0.053 At home 0.003 0.007 0.032 0.000 0.000 0.956 0.000 0.002 Retired 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 Education 0.044 0.236 0.045 0.001 0.002 0.006 0.000 0.664

8/7/2009gr 11/66

slide-17
SLIDE 17

Sequential data analysis - 1 Introduction Overview of what you will learn

Heterogeneity: Sequence of transversal entropies

Cohabitational Trajectories

Age Entropy A20 A23 A26 A29 A32 A35 A38 A41 A44 0.3 0.4 0.5 0.6 0.7 0.8

Occupational Trajectories

Age Entropy A20 A23 A26 A29 A32 A35 A38 A41 A44 0.3 0.4 0.5 0.6 0.7 0.8 1910−1924 1925−1945 1946−1957

8/7/2009gr 12/66

slide-18
SLIDE 18

Sequential data analysis - 1 Introduction Overview of what you will learn

Longitudinal entropy

  • 1910−1924

1925−1945 1946−1957 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Men: Occupational Trajectories

1910−1924 1925−1945 1946−1957 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Women: Occupational Trajectories

8/7/2009gr 13/66

slide-19
SLIDE 19

Sequential data analysis - 1 Introduction Overview of what you will learn

Dissimilarities between pairs of sequences

Distance between sequences

Different metrics metrics (LCP, LCS, OM)

Once we have 2 by 2 dissimilarities, we can

Determine a central sequence (centro-type) Measure the discrepancy between sequences Clustering a set of sequences MDS scatterplot representation of sequences Heterogeneity analysis of a set of sequences (ANOH) Dissimilarity analysis (Induction trees)

8/7/2009gr 14/66

slide-20
SLIDE 20

Sequential data analysis - 1 Introduction Overview of what you will learn

Dissimilarities between pairs of sequences

Distance between sequences

Different metrics metrics (LCP, LCS, OM)

Once we have 2 by 2 dissimilarities, we can

Determine a central sequence (centro-type) Measure the discrepancy between sequences Clustering a set of sequences MDS scatterplot representation of sequences Heterogeneity analysis of a set of sequences (ANOH) Dissimilarity analysis (Induction trees)

8/7/2009gr 14/66

slide-21
SLIDE 21

Sequential data analysis - 1 Introduction Overview of what you will learn

Dissimilarities between pairs of sequences

Distance between sequences

Different metrics metrics (LCP, LCS, OM)

Once we have 2 by 2 dissimilarities, we can

Determine a central sequence (centro-type) Measure the discrepancy between sequences Clustering a set of sequences MDS scatterplot representation of sequences Heterogeneity analysis of a set of sequences (ANOH) Dissimilarity analysis (Induction trees)

8/7/2009gr 14/66

slide-22
SLIDE 22

Sequential data analysis - 1 Introduction Overview of what you will learn

Dissimilarities between pairs of sequences

Distance between sequences

Different metrics metrics (LCP, LCS, OM)

Once we have 2 by 2 dissimilarities, we can

Determine a central sequence (centro-type) Measure the discrepancy between sequences Clustering a set of sequences MDS scatterplot representation of sequences Heterogeneity analysis of a set of sequences (ANOH) Dissimilarity analysis (Induction trees)

8/7/2009gr 14/66

slide-23
SLIDE 23

Sequential data analysis - 1 Introduction Overview of what you will learn

Cluster analysis: determining typologies

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 1 : Full Time Trajectoires (53 %)

  • Freq. (n=795)

0.0 0.2 0.4 0.6 0.8 1.0 A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 2 : Mixed Part Time − Home Trajectories (13 %)

  • Freq. (n=155)

0.0 0.2 0.4 0.6 0.8 1.0 A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 3 : At Home Trajectories (16 %)

  • Freq. (n=277)

0.0 0.2 0.4 0.6 0.8 1.0 A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 4 : Part Time Trajectories (7 %)

  • Freq. (n=101)

0.0 0.2 0.4 0.6 0.8 1.0 A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 5 : Missing Data (11 %)

  • Freq. (n=175)

0.0 0.2 0.4 0.6 0.8 1.0

Missing Full time Part time Negative break Positive break At home Retired Education

8/7/2009gr 15/66

slide-24
SLIDE 24

Sequential data analysis - 1 Introduction Overview of what you will learn

Event sequences: discriminating sub-sequences

man

0.0 0.1 0.2 0.3 0.4 0.5 0.6 (Full time>At home) (Full time)−(Full time>At home) (At home>Part time) (Full time>At home)−(At home>Part time) (Full time)−(At home>Part time) (Full time>Part time) (Full time)−(Full time>At home)−(At home>Part time) (Full time)−(Full time>Part time) (Education>Full time) (Education) (Education)−(Education>Full time) (Part time>Full time) (Full time) (Missing)

woman

0.0 0.1 0.2 0.3 0.4 0.5 0.6 (Full time>At home) (Full time)−(Full time>At home) (At home>Part time) (Full time>At home)−(At home>Part time) (Full time)−(At home>Part time) (Full time>Part time) (Full time)−(Full time>At home)−(At home>Part time) (Full time)−(Full time>Part time) (Education>Full time) (Education) (Education)−(Education>Full time) (Part time>Full time) (Full time) (Missing)

Pearson residuals

− 4 − 2 neutral 2 4

8/7/2009gr 16/66

slide-25
SLIDE 25

Sequential data analysis - 1 Introduction Overview of what you will learn

What you will not find in this course ...

Transition analysis by means of Markovian and other statistical models. for Markovian models, see for instance Berchtold and Raftery (2002) Survival analysis e.g. Hosmer and Lemeshow (1999), Hothorn et al. (2006) Determination of association rules between sub-sequences Rarely considered in the literature! (NSM is woking hard on it!!)

8/7/2009gr 17/66

slide-26
SLIDE 26

Sequential data analysis - 1 Introduction Overview of what you will learn

What you will not find in this course ...

Transition analysis by means of Markovian and other statistical models. for Markovian models, see for instance Berchtold and Raftery (2002) Survival analysis e.g. Hosmer and Lemeshow (1999), Hothorn et al. (2006) Determination of association rules between sub-sequences Rarely considered in the literature! (NSM is woking hard on it!!)

8/7/2009gr 17/66

slide-27
SLIDE 27

Sequential data analysis - 1 Introduction Overview of what you will learn

What you will not find in this course ...

Transition analysis by means of Markovian and other statistical models. for Markovian models, see for instance Berchtold and Raftery (2002) Survival analysis e.g. Hosmer and Lemeshow (1999), Hothorn et al. (2006) Determination of association rules between sub-sequences Rarely considered in the literature! (NSM is woking hard on it!!)

8/7/2009gr 17/66

slide-28
SLIDE 28

Sequential data analysis - 1 Concepts and definitions

Outline

1

Introduction

2

Concepts and definitions

3

Rendering and summarizing state sequences

8/7/2009gr 18/66

slide-29
SLIDE 29

Sequential data analysis - 1 Concepts and definitions Definitions and types of sequences

Section outline

2

Concepts and definitions Definitions and types of sequences Some examples Alternative sequence data organizations

8/7/2009gr 19/66

slide-30
SLIDE 30

Sequential data analysis - 1 Concepts and definitions Definitions and types of sequences

Sequence

Definition: Alphabet A: finite set Sequence of length k: ordered list of k successively chosen elements of A Examples:

Text: A = set of letters, but can also be set of words, of n-grams, ... Biology: A = set of nucleotides, of proteins, ... On-off signals: A = {0, 1} Buying behaviors: A = set of items. Life course: A = set of considered cohabitation states, types of

  • ccupation, ...

8/7/2009gr 20/66

slide-31
SLIDE 31

Sequential data analysis - 1 Concepts and definitions Definitions and types of sequences

Sequences: notations

Sequence x of length k

x = (x1, x2, . . . , xk) If no ambiguity: x = x1x2 · · · xk separator necessary when A includes a composite symbol

(ex: S single, M married, MC married with child S-S-M-M-MS-MS-MS )

8/7/2009gr 21/66

slide-32
SLIDE 32

Sequential data analysis - 1 Concepts and definitions Definitions and types of sequences

Types of sequences

Nature of sequences Depends on Information conveyed by position j in the sequence

Temporal dimension?

Nature of the elements of the alphabet

  • bjects or changes

states, transitions or events

Temporal dimension Alphabet No Yes Objects/States Object sequence State sequence Transitions/Events (sequence of object changes) Event sequence

8/7/2009gr 22/66

slide-33
SLIDE 33

Sequential data analysis - 1 Concepts and definitions Definitions and types of sequences

Types of sequences

Nature of sequences Depends on Information conveyed by position j in the sequence

Temporal dimension?

Nature of the elements of the alphabet

  • bjects or changes

states, transitions or events

Temporal dimension Alphabet No Yes Objects/States Object sequence State sequence Transitions/Events (sequence of object changes) Event sequence

8/7/2009gr 22/66

slide-34
SLIDE 34

Sequential data analysis - 1 Concepts and definitions Definitions and types of sequences

Types of sequences

Nature of sequences Depends on Information conveyed by position j in the sequence

Temporal dimension?

Nature of the elements of the alphabet

  • bjects or changes

states, transitions or events

Temporal dimension Alphabet No Yes Objects/States Object sequence State sequence Transitions/Events (sequence of object changes) Event sequence

8/7/2009gr 22/66

slide-35
SLIDE 35

Sequential data analysis - 1 Concepts and definitions Definitions and types of sequences

Types of sequences

Nature of sequences Depends on Information conveyed by position j in the sequence

Temporal dimension?

Nature of the elements of the alphabet

  • bjects or changes

states, transitions or events

Temporal dimension Alphabet No Yes Objects/States Object sequence State sequence Transitions/Events (sequence of object changes) Event sequence

8/7/2009gr 22/66

slide-36
SLIDE 36

Sequential data analysis - 1 Concepts and definitions Definitions and types of sequences

Ontology of chronological data

(Aristotelian tree)

Longitudinal data States

  • ne state per time unit t

not several states at each t not not Events time stamped events not event sequence not not spell duration not 8/7/2009gr 23/66

slide-37
SLIDE 37

Sequential data analysis - 1 Concepts and definitions Some examples

Section outline

2

Concepts and definitions Definitions and types of sequences Some examples Alternative sequence data organizations

8/7/2009gr 24/66

slide-38
SLIDE 38

Sequential data analysis - 1 Concepts and definitions Some examples

Alternative views of chronological sequences

Table: Time stamped events, record for Sandra ending secondary school in 1970 first job in 1971 marriage in 1973 Table: State sequence view, Sandra year 1969 1970 1971 1972 1973 marital status single single single single married education level primary secondary secondary secondary secondary job no no first first first

8/7/2009gr 25/66

slide-39
SLIDE 39

Sequential data analysis - 1 Concepts and definitions Some examples

Alternative views of chronological sequences

Table: Time stamped events, record for Sandra ending secondary school in 1970 first job in 1971 marriage in 1973 Table: State sequence view, Sandra year 1969 1970 1971 1972 1973 marital status single single single single married education level primary secondary secondary secondary secondary job no no first first first

8/7/2009gr 25/66

slide-40
SLIDE 40

Sequential data analysis - 1 Concepts and definitions Some examples

Transforming time stamped events into state sequences

Example: the“BioFam”data

Data from the retrospective survey conducted in 2002 by the Swiss Household Panel (SHP) (with support of Federal Statistical Office, Swiss National Fund for Scientific Research, University of Neuchatel.) Retrospective survey: 5560 individuals Retained familial life events: Leaving Home, First childbirth, First marriage and First divorce. Age 15 to 45 → 2601 remaining individuals, born between 1909 et 1957.

8/7/2009gr 26/66

slide-41
SLIDE 41

Sequential data analysis - 1 Concepts and definitions Some examples

Transforming time stamped events into state sequences

Example: the“BioFam”data

Data from the retrospective survey conducted in 2002 by the Swiss Household Panel (SHP) (with support of Federal Statistical Office, Swiss National Fund for Scientific Research, University of Neuchatel.) Retrospective survey: 5560 individuals Retained familial life events: Leaving Home, First childbirth, First marriage and First divorce. Age 15 to 45 → 2601 remaining individuals, born between 1909 et 1957.

8/7/2009gr 26/66

slide-42
SLIDE 42

Sequential data analysis - 1 Concepts and definitions Some examples

Transforming time stamped events into state sequences

Example: the“BioFam”data

Data from the retrospective survey conducted in 2002 by the Swiss Household Panel (SHP) (with support of Federal Statistical Office, Swiss National Fund for Scientific Research, University of Neuchatel.) Retrospective survey: 5560 individuals Retained familial life events: Leaving Home, First childbirth, First marriage and First divorce. Age 15 to 45 → 2601 remaining individuals, born between 1909 et 1957.

8/7/2009gr 26/66

slide-43
SLIDE 43

Sequential data analysis - 1 Concepts and definitions Some examples

Transforming time stamped events into state sequences

Example: the“BioFam”data

Data from the retrospective survey conducted in 2002 by the Swiss Household Panel (SHP) (with support of Federal Statistical Office, Swiss National Fund for Scientific Research, University of Neuchatel.) Retrospective survey: 5560 individuals Retained familial life events: Leaving Home, First childbirth, First marriage and First divorce. Age 15 to 45 → 2601 remaining individuals, born between 1909 et 1957.

8/7/2009gr 26/66

slide-44
SLIDE 44

Sequential data analysis - 1 Concepts and definitions Some examples

Deriving the states

Associate one state to each combination of events: LHome marriage childbirth divorce no no no no 1 yes no no no 2 no yes yes/no no 3 yes yes no no 4 no no yes no 5 yes no yes no 6 yes yes yes no 7 yes/no yes yes/no yes

8/7/2009gr 27/66

slide-45
SLIDE 45

Sequential data analysis - 1 Concepts and definitions Some examples

From events to states

Example of transformation : events: individual LHome marriage childbirth divorce 1 1989 1990 1992 NA states: individual ... 1988 1989 1990 1991 1992 1993 ... 1 ... 1 3 3 6 ... Can we automatize the transformation of

events into states? states into events?

8/7/2009gr 28/66

slide-46
SLIDE 46

Sequential data analysis - 1 Concepts and definitions Some examples

From events to states

Example of transformation : events: individual LHome marriage childbirth divorce 1 1989 1990 1992 NA states: individual ... 1988 1989 1990 1991 1992 1993 ... 1 ... 1 3 3 6 ... Can we automatize the transformation of

events into states? states into events?

8/7/2009gr 28/66

slide-47
SLIDE 47

Sequential data analysis - 1 Concepts and definitions Alternative sequence data organizations

Section outline

2

Concepts and definitions Definitions and types of sequences Some examples Alternative sequence data organizations

8/7/2009gr 29/66

slide-48
SLIDE 48

Sequential data analysis - 1 Concepts and definitions Alternative sequence data organizations

State sequences

Formats supported by TraMineR

Code Data type Several rows for same case Usage examples STS State-sequence No Markov modeling, OM SPS State-permanence No Markov modeling, OM SSS∗ State-start No Markov modeling, OM SRS Shifted-replicated- sequence Yes Mobility tree DSS Distinct-state- sequence No OM without time reference SPELL Spell Yes Survival analysis PPER∗ Person-period Yes Discrete survival analysis

8/7/2009gr 30/66

slide-49
SLIDE 49

Sequential data analysis - 1 Concepts and definitions Alternative sequence data organizations

Formats of state sequences: examples - I

Code Example STS Id 18 19 20 21 22 23 24 25 26 27 101 S S S M M MC MC MC MC D 102 S S S MC MC MC MC MC MC MC SPS Id 1 2 3 4 101 (S,3) (M,2) (MC,4) (D,1) 102 (S,3) (MC,7) SSS∗ Id 1 2 3 4 101 (S,18) (M,21) (MC,23) (D,27) 102 (S,18) (MC,21) SRS

Id t − 9 t − 8 t − 7 t − 6 t − 5 t − 4 t − 3 t − 2 t − 1 t 101 S S S M M MC MC MC MC D 101 . S S S M M MC MC MC MC 101 . . S S S M M MC MC MC . . . 101 . . . . . . . . S S 102 S S S MC MC MC MC MC MC MC 102 . S S S MC MC MC MC MC MC . . .

DSS Id 1 2 3 4 101 S M MC D 102 S MC

8/7/2009gr 31/66

slide-50
SLIDE 50

Sequential data analysis - 1 Concepts and definitions Alternative sequence data organizations

Formats of state sequences: examples - II

Code Example SPELL Id Index From To State 101 1 18 20 Single (S) 101 2 21 22 Married (M) 101 3 23 26 Married w Children (MC) 101 4 27 27 Divorced (D) 102 1 18 20 Single (S) 102 2 21 27 Married w Children (MC) PPER∗ Id Index Age State 101 1 18 Single (S) 101 2 19 Single (S) 101 3 20 Single (S) 101 4 21 Married (M) . . . . . . . . . 101 10 27 Divorced (D) 102 1 18 Single (S) . . . . . . . . .

8/7/2009gr 32/66

slide-51
SLIDE 51

Sequential data analysis - 1 Concepts and definitions Alternative sequence data organizations

Event sequences

Formats supported by TraMineR

Code Data type Several rows for same case Usage examples FCE∗ Fixed-column-event No Survival analysis HTSE∗ Horizontal time-stamped-event No Event sequence mining TSE Vertical time-stamped-event Yes Event sequence mining

8/7/2009gr 33/66

slide-52
SLIDE 52

Sequential data analysis - 1 Concepts and definitions Alternative sequence data organizations

Event sequences: examples

Code Example FCE∗

Id #marr. 1st marr. 2nd marr. · · · #child. 1st child 2nd child · · · 101 1 21 . . 2 23 26 . 102 1 21 . . 1 21 . .

HTSE∗ Id 1 2 3 · · · 101 (marriage, 21) (childbirth, 23) (childbirth, 26) (divorce, 27) 102 (marriage, 21) (childbirth, 21) TSE Id Time Event 101 21 Marriage 101 23 Childbirth 101 26 Childbirth 101 27 Divorce 102 21 Marriage 102 21 Childbirth

8/7/2009gr 34/66

slide-53
SLIDE 53

Sequential data analysis - 1 Rendering and summarizing state sequences

Outline

1

Introduction

2

Concepts and definitions

3

Rendering and summarizing state sequences

8/7/2009gr 35/66

slide-54
SLIDE 54

Sequential data analysis - 1 Rendering and summarizing state sequences

The ‘mvad’ data set

For illustration, we use the mvad data set (McVicar and Anyadike-Danes, 2002) Data about transition from school to employment in North Ireland 712 cases 72 monthly activity statuses (July 1993-June 1999) 14 additional variables The follow-up starts when respondents finished compulsory school.

8/7/2009gr 36/66

slide-55
SLIDE 55

Sequential data analysis - 1 Rendering and summarizing state sequences

mvad variables

id unique individual identifier weight sample weights male binary dummy for gender, 1=male catholic binary dummy for community, 1=Catholic Belfast binary dummies for location of school, one of five Education and Library Board areas in Northern Ireland N.Eastern ” Southern ” S.Eastern ” Western ” Grammar binary dummy indicating type of secondary education, 1=grammar school funemp binary dummy indicating father’s employment status at time of survey, 1=father unemployed gcse5eq binary dummy indicating qualifications gained by the end of compulsory education, 1=5+ GCSEs at grades A-C, or equivalent fmpr binary dummy indicating SOC code of father?s current or most recent job,1=SOC1 (profes- sional, managerial or related) livboth binary dummy indicating living arrangements at time of first sweep of survey (June 1995), 1=living with both parents jul93 Monthly Activity Variables are coded 1-6, 1=school, 2=FE, 3=employment, 4=training, 5=joblessness, 6=HE . . . ” jun99 ” 8/7/2009gr 37/66

slide-56
SLIDE 56

Sequential data analysis - 1 Rendering and summarizing state sequences

Creating the sequence object

Loading the data set and creating the ‘state sequence’ object

R> data(mvad) R> mvad.lab <- seqstatl(mvad[, 17:86]) R> mvad.shortlab <- c("EM", "FE", "HE", "JL", "SC", + "TR") R> mvad.seq <- seqdef(mvad, 17:86, states = mvad.shortlab, + labels = mvad.lab)

8/7/2009gr 38/66

slide-57
SLIDE 57

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

Section outline

3

Rendering and summarizing state sequences Three basic plots Sequences of transversal summaries Other aggregated summaries Longitudinal characteristics of individual sequences

8/7/2009gr 39/66

slide-58
SLIDE 58

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

i-plot: Plot of individual sequences (A)

The plot of individual sequences (i-plot) visualizes each sequence with a horizontal bar. (Scherer, 2001; Brzinsky-Fay et al.,

2006)

i-plot of 10 first sequences (mvad data)

R> seqiplot(mvad.seq, cex.legend = 1.3)

  • Seq. 1 to 10 (n=712)

Sep.93 Feb.94 Jul.94 Dec.94 May.95 Oct.95 Mar.96 Aug.96 Jan.97 Jun.97 Nov.97 Apr.98 Sep.98 Feb.99 1 2 3 4 5 6 7 8 9 10

employment FE HE joblessness school training

8/7/2009gr 40/66

slide-59
SLIDE 59

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

i-plot: Plot of individual sequences (B)

The i-plot of the whole set of sequences exhibits the diversity among sequences. It may be useful to sort the sequences according to some factor. Here is how to i-plot data grouped according to grade

  • btained at end of compulsory school (gcse5eq) and sorted

by religion

R> seqiplot(mvad.seq, tlim = 0, space = 0, group = mvad$gcse5eq, + sortv = mvad$catholic, border = NA)

8/7/2009gr 41/66

slide-60
SLIDE 60

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

i-plot: Plot of individual sequences (B)

The i-plot of the whole set of sequences exhibits the diversity among sequences. It may be useful to sort the sequences according to some factor. Here is how to i-plot data grouped according to grade

  • btained at end of compulsory school (gcse5eq) and sorted

by religion

R> seqiplot(mvad.seq, tlim = 0, space = 0, group = mvad$gcse5eq, + sortv = mvad$catholic, border = NA)

8/7/2009gr 41/66

slide-61
SLIDE 61

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

i-plots by CS-grade and sorted by religion

8/7/2009gr 42/66

slide-62
SLIDE 62

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

Sequence frequencies

What are the most frequent sequences? seqtab() computes the frequencies and displays sequences in decreasing frequency order (here the 10 most frequent)

R> seqtab(mvad.seq, tlim = 10) Freq Percent (EM,70) 50 7.02 (TR,22)-(EM,48) 18 2.53 (FE,22)-(EM,48) 17 2.39 (SC,24)-(HE,46) 16 2.25 (SC,25)-(HE,45) 13 1.83 (FE,25)-(HE,45) 8 1.12 (FE,34)-(EM,36) 7 0.98 (FE,46)-(EM,24) 7 0.98 (FE,10)-(EM,60) 6 0.84 (FE,24)-(HE,46) 6 0.84

8/7/2009gr 43/66

slide-63
SLIDE 63

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

Sequence frequencies

What are the most frequent sequences? seqtab() computes the frequencies and displays sequences in decreasing frequency order (here the 10 most frequent)

R> seqtab(mvad.seq, tlim = 10) Freq Percent (EM,70) 50 7.02 (TR,22)-(EM,48) 18 2.53 (FE,22)-(EM,48) 17 2.39 (SC,24)-(HE,46) 16 2.25 (SC,25)-(HE,45) 13 1.83 (FE,25)-(HE,45) 8 1.12 (FE,34)-(EM,36) 7 0.98 (FE,46)-(EM,24) 7 0.98 (FE,10)-(EM,60) 6 0.84 (FE,24)-(HE,46) 6 0.84

8/7/2009gr 43/66

slide-64
SLIDE 64

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

f-plot: most frequent sequences

seqfplot() visualizes the most frequent sequences (here according to gcse5eq).

R> seqfplot(mvad.seq, group = mvad$gcse5eq, pbarw = TRUE)

no

  • Cum. % freq. (n=452)

Sep.93 Sep.94 Sep.95 Sep.96 Sep.97 Sep.98 0% 21.2%

yes

  • Cum. % freq. (n=260)

Sep.93 Sep.94 Sep.95 Sep.96 Sep.97 Sep.98 0% 26.2% employment FE HE joblessness school training

8/7/2009gr 44/66

slide-65
SLIDE 65

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

Sequence of transversal state distributions

Distributions at each (age, calendar, ...) position. seqstatd() computes the distribution for each position (here just for the 8 first positions).

R> seqstatd(mvad.seq[, 1:8]) Sep.93 Oct.93 Nov.93 Dec.93 Jan.94 Feb.94 Mar.94 Apr.94 EM 0.117 0.124 0.133 0.138 0.140 0.140 0.149 0.157 FE 0.386 0.388 0.382 0.381 0.369 0.364 0.361 0.353 HE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 JL 0.024 0.021 0.020 0.021 0.028 0.038 0.034 0.035 SC 0.251 0.246 0.244 0.242 0.240 0.242 0.240 0.240 TR 0.222 0.222 0.221 0.219 0.222 0.216 0.216 0.215 N 712.000 712.000 712.000 712.000 712.000 712.000 712.000 712.000 Entropy 0.775 0.774 0.777 0.780 0.793 0.805 0.803 0.809

8/7/2009gr 45/66

slide-66
SLIDE 66

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

Sequence of transversal state distributions

Distributions at each (age, calendar, ...) position. seqstatd() computes the distribution for each position (here just for the 8 first positions).

R> seqstatd(mvad.seq[, 1:8]) Sep.93 Oct.93 Nov.93 Dec.93 Jan.94 Feb.94 Mar.94 Apr.94 EM 0.117 0.124 0.133 0.138 0.140 0.140 0.149 0.157 FE 0.386 0.388 0.382 0.381 0.369 0.364 0.361 0.353 HE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 JL 0.024 0.021 0.020 0.021 0.028 0.038 0.034 0.035 SC 0.251 0.246 0.244 0.242 0.240 0.242 0.240 0.240 TR 0.222 0.222 0.221 0.219 0.222 0.216 0.216 0.215 N 712.000 712.000 712.000 712.000 712.000 712.000 712.000 712.000 Entropy 0.775 0.774 0.777 0.780 0.793 0.805 0.803 0.809

8/7/2009gr 45/66

slide-67
SLIDE 67

Sequential data analysis - 1 Rendering and summarizing state sequences Three basic plots

d-plot: Sequences of transversal distributions

seqdplot() renders the sequence of transversal distributions (here according to gcse5eq).

R> seqdplot(mvad.seq, group = mvad$gcse5eq)

no

  • Freq. (n=452)

Sep.93 Oct.94 Oct.95 Oct.96 Oct.97 Oct.98 0.0 0.2 0.4 0.6 0.8 1.0

yes

  • Freq. (n=260)

Sep.93 Oct.94 Oct.95 Oct.96 Oct.97 Oct.98 0.0 0.2 0.4 0.6 0.8 1.0 employment FE HE joblessness school training

8/7/2009gr 46/66

slide-68
SLIDE 68

Sequential data analysis - 1 Rendering and summarizing state sequences Sequences of transversal summaries

Section outline

3

Rendering and summarizing state sequences Three basic plots Sequences of transversal summaries Other aggregated summaries Longitudinal characteristics of individual sequences

8/7/2009gr 47/66

slide-69
SLIDE 69

Sequential data analysis - 1 Rendering and summarizing state sequences Sequences of transversal summaries

Transversal Entropies

Entropy of each transversal distribution (p1, . . . , pa), with a = |A| size of alphabet Shannon’s Entropy h(p1, . . . , pa) = −

a

  • i=1

pi log2(pi) h is 0 when all cases are in same state (good prediction of state at that position) h is maximal when states are equi-frequent (worth case for predicting state at that position)

8/7/2009gr 48/66

slide-70
SLIDE 70

Sequential data analysis - 1 Rendering and summarizing state sequences Sequences of transversal summaries

Transversal Entropies

Entropy of each transversal distribution (p1, . . . , pa), with a = |A| size of alphabet Shannon’s Entropy h(p1, . . . , pa) = −

a

  • i=1

pi log2(pi) h is 0 when all cases are in same state (good prediction of state at that position) h is maximal when states are equi-frequent (worth case for predicting state at that position)

8/7/2009gr 48/66

slide-71
SLIDE 71

Sequential data analysis - 1 Rendering and summarizing state sequences Sequences of transversal summaries

Plotting the series of entropies

R> sd <- seqstatd(mvad.seq) R> plot(sd$Entropy, main = "Entropy of mvad state distribution by time point", + xlab = "Months since Sept 93", ylab = "Entropy", type = "l", + lwd = 3.5, col = "blue")

10 20 30 40 50 60 70 0.55 0.60 0.65 0.70 0.75 0.80 0.85

Entropy of mvad state distribution by time point

Months since Sept 93 Entropy

8/7/2009gr 49/66

slide-72
SLIDE 72

Sequential data analysis - 1 Rendering and summarizing state sequences Other aggregated summaries

Section outline

3

Rendering and summarizing state sequences Three basic plots Sequences of transversal summaries Other aggregated summaries Longitudinal characteristics of individual sequences

8/7/2009gr 50/66

slide-73
SLIDE 73

Sequential data analysis - 1 Rendering and summarizing state sequences Other aggregated summaries

Time spent in each state (A)

Time spent in each state by individual sequence

R> mvad.statd <- seqistatd(mvad.seq) R> mvad.statd[1:5, ] EM FE HE JL SC TR 1 68 2 2 0 36 34 3 10 34 2 0 24 4 14 9 0 47 5 0 25 45

Computing the mean time by column

R> mt <- apply(mvad.statd, 2, mean) R> mt EM FE HE JL SC TR 31.721910 11.426966 8.398876 5.674157 5.723315 7.054775

8/7/2009gr 51/66

slide-74
SLIDE 74

Sequential data analysis - 1 Rendering and summarizing state sequences Other aggregated summaries

Plot of mean times

Plot of mean time by gcse5eq

R> seqmtplot(mvad.seq, group = mvad$gcse5eq)

EM FE HE JL SC TR no Mean time (n=452) 14 28 42 56 70 EM FE HE JL SC TR yes Mean time (n=260) 14 28 42 56 70 employment FE HE joblessness school training

8/7/2009gr 52/66

slide-75
SLIDE 75

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Section outline

3

Rendering and summarizing state sequences Three basic plots Sequences of transversal summaries Other aggregated summaries Longitudinal characteristics of individual sequences

8/7/2009gr 53/66

slide-76
SLIDE 76

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Transition rates

Transition rate: estimation of probability to be in state i at t when we are in state j at previous position t − 1 p(xit | xj(t−1))

R> round(seqtrate(mvad.seq), digits = 4) [-> EM] [-> FE] [-> HE] [-> JL] [-> SC] [-> TR] [EM ->] 0.9864 0.0020 0.0025 0.0065 0.0004 0.0022 [FE ->] 0.0279 0.9514 0.0066 0.0090 0.0010 0.0041 [HE ->] 0.0102 0.0002 0.9872 0.0019 0.0000 0.0005 [JL ->] 0.0418 0.0084 0.0023 0.9387 0.0005 0.0084 [SC ->] 0.0142 0.0081 0.0182 0.0056 0.9509 0.0029 [TR ->] 0.0383 0.0036 0.0000 0.0136 0.0004 0.9442

8/7/2009gr 54/66

slide-77
SLIDE 77

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Longitudinal entropy

Entropy computed within each sequence is 0 when the sequence contains only a single state (when the person stays in same state during the observed period, for example A-A-A-A-A-A-A-A) maximum when sequence has a same number of each state in the alphabet (person spent same time in each possible state, for example A-A-B-B-C-C-D-D) By default, TraMineR normalizes the longitudinal entropy by the entropy of the alphabet hstd(p1, . . . , pa) = − a

i=1 pi log2(pi)

h(A) with pi proportion of positions in same state i.

8/7/2009gr 55/66

slide-78
SLIDE 78

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Longitudinal entropy

Entropy computed within each sequence is 0 when the sequence contains only a single state (when the person stays in same state during the observed period, for example A-A-A-A-A-A-A-A) maximum when sequence has a same number of each state in the alphabet (person spent same time in each possible state, for example A-A-B-B-C-C-D-D) By default, TraMineR normalizes the longitudinal entropy by the entropy of the alphabet hstd(p1, . . . , pa) = − a

i=1 pi log2(pi)

h(A) with pi proportion of positions in same state i.

8/7/2009gr 55/66

slide-79
SLIDE 79

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Computing the longitudinal entropies (B)

seqient() computes the longitudinal entropies (here for mvad sequences)

R> mvad.ient <- seqient(mvad.seq) R> mvad.ient[1:6, ] 1 2 3 4 5 6 0.07240966 0.38662498 0.61243051 0.47611545 0.36375226 0.42259527

We check that values are comprised between 0 and 1 (by default the entropy is normalized)

R> min(mvad.ient) [1] 0 R> max(mvad.ient) [1] 0.854786

8/7/2009gr 56/66

slide-80
SLIDE 80

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Longitudinal entropies - Histogram

Distribution of entropies for mvad data

R> hist(mvad.ient, col = "LightBlue")

Histogram of mvad.ient mvad.ient Frequency 0.0 0.2 0.4 0.6 0.8 50 100 150 200

8/7/2009gr 57/66

slide-81
SLIDE 81

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence

Entropy does not account for the state sequencing Turbulence: alternative measure proposed by Elzinga and Liefbroer (2007) which is sensitive to the sequencing. It is based on

the number φ(x) of subsequences of distinct states that can be extracted from the sequence of distinct consecutive states x=S-U-M-C (16 sub-sequences) more turbulent than y=S-U-S-C (15 sub-sequences) the variance of time ti spent in each distinct state i S/10-U/2-M/132 is less turbulent trajectory than S/48-U/48-M/48

8/7/2009gr 58/66

slide-82
SLIDE 82

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence

Entropy does not account for the state sequencing Turbulence: alternative measure proposed by Elzinga and Liefbroer (2007) which is sensitive to the sequencing. It is based on

the number φ(x) of subsequences of distinct states that can be extracted from the sequence of distinct consecutive states x=S-U-M-C (16 sub-sequences) more turbulent than y=S-U-S-C (15 sub-sequences) the variance of time ti spent in each distinct state i S/10-U/2-M/132 is less turbulent trajectory than S/48-U/48-M/48

8/7/2009gr 58/66

slide-83
SLIDE 83

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence

Entropy does not account for the state sequencing Turbulence: alternative measure proposed by Elzinga and Liefbroer (2007) which is sensitive to the sequencing. It is based on

the number φ(x) of subsequences of distinct states that can be extracted from the sequence of distinct consecutive states x=S-U-M-C (16 sub-sequences) more turbulent than y=S-U-S-C (15 sub-sequences) the variance of time ti spent in each distinct state i S/10-U/2-M/132 is less turbulent trajectory than S/48-U/48-M/48

8/7/2009gr 58/66

slide-84
SLIDE 84

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence

Entropy does not account for the state sequencing Turbulence: alternative measure proposed by Elzinga and Liefbroer (2007) which is sensitive to the sequencing. It is based on

the number φ(x) of subsequences of distinct states that can be extracted from the sequence of distinct consecutive states x=S-U-M-C (16 sub-sequences) more turbulent than y=S-U-S-C (15 sub-sequences) the variance of time ti spent in each distinct state i S/10-U/2-M/132 is less turbulent trajectory than S/48-U/48-M/48

8/7/2009gr 58/66

slide-85
SLIDE 85

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence (continued)

We need the sequence of distinct consecutive states (DSS) In SPS format, a state sequence is represented by the sequence of distinct states with their associated durations.

R> print(mvad.seq[1, ], format = "SPS") Sequence [1] (EM,4)-(TR,2)-(EM,64)

The DSS for the previous sequence is

R> seqdss(mvad.seq[1, ]) Sequence 1 EM-TR-EM

The number of sub-sequences of the above DSS is

R> seqsubsn(mvad.seq[1, ], DSS = TRUE) Subseq. 1 7

8/7/2009gr 59/66

slide-86
SLIDE 86

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence (continued)

We need the sequence of distinct consecutive states (DSS) In SPS format, a state sequence is represented by the sequence of distinct states with their associated durations.

R> print(mvad.seq[1, ], format = "SPS") Sequence [1] (EM,4)-(TR,2)-(EM,64)

The DSS for the previous sequence is

R> seqdss(mvad.seq[1, ]) Sequence 1 EM-TR-EM

The number of sub-sequences of the above DSS is

R> seqsubsn(mvad.seq[1, ], DSS = TRUE) Subseq. 1 7

8/7/2009gr 59/66

slide-87
SLIDE 87

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence (continued)

We need the sequence of distinct consecutive states (DSS) In SPS format, a state sequence is represented by the sequence of distinct states with their associated durations.

R> print(mvad.seq[1, ], format = "SPS") Sequence [1] (EM,4)-(TR,2)-(EM,64)

The DSS for the previous sequence is

R> seqdss(mvad.seq[1, ]) Sequence 1 EM-TR-EM

The number of sub-sequences of the above DSS is

R> seqsubsn(mvad.seq[1, ], DSS = TRUE) Subseq. 1 7

8/7/2009gr 59/66

slide-88
SLIDE 88

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence: formula

Formula for a sequence x T(x) = log2

  • φ(x) s2

t,max(x) + 1

s2

t (x) + 1

  • where s2

t is the variance of the time spent in each distinct

states and s2

t,max is the maximal value that this variance can

reach for the given sequence length. This maximum is s2

t,max = (n − 1)(1 − ¯

t) where ¯ t is the mean of the consecutive time spent in each distinct state: ¯ t =

sequence length number of distinct consecutive states

8/7/2009gr 60/66

slide-89
SLIDE 89

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence: formula

Formula for a sequence x T(x) = log2

  • φ(x) s2

t,max(x) + 1

s2

t (x) + 1

  • where s2

t is the variance of the time spent in each distinct

states and s2

t,max is the maximal value that this variance can

reach for the given sequence length. This maximum is s2

t,max = (n − 1)(1 − ¯

t) where ¯ t is the mean of the consecutive time spent in each distinct state: ¯ t =

sequence length number of distinct consecutive states

8/7/2009gr 60/66

slide-90
SLIDE 90

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Computing the turbulence

seqST() computes the turbulence of the provided sequences. Displaying turbulences of 6 first sequences

R> mvad.turb <- seqST(mvad.seq) R> mvad.turb[1:6] [1] 3.076599 11.176173 6.411073 4.807756 5.517962 4.987055

The measure is not normalized

R> min(mvad.turb) [1] 1 R> max(mvad.turb) [1] 12.95858

8/7/2009gr 61/66

slide-91
SLIDE 91

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Turbulence - Histogram

Distribution of turbulence among the mvad sequences

R> hist(mvad.turb, col = attr(mvad.seq, "cpal")[6])

Histogram of mvad.turb mvad.turb Frequency 2 4 6 8 10 12 20 40 60 80 100 120 140

8/7/2009gr 62/66

slide-92
SLIDE 92

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

Comparing Turbulence and Longitudinal Entropy

R> plot(mvad.turb, mvad.ient, xlab = "Turbulence", ylab = "Entropy")

  • 2

4 6 8 10 12 0.0 0.2 0.4 0.6 0.8 Turbulence Entropy

8/7/2009gr 63/66

slide-93
SLIDE 93

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

References I

Abbott, A. and A. Tsay (2000). Sequence analysis and optimal matching methods in sociology, Review and prospect. Sociological Methods and Research 29(1), 3–33. (With discussion, pp 34-76). Berchtold, A. and A. E. Raftery (2002). The mixture transition distribution model for high-order Markov chains and non-gaussian time series. Statistical Science 17(3), 328–356. Billari, F. C. (2001). The analysis of early life courses: Complex description of the transition to adulthood. Journal of Population Research 18(2), 119–142. Brzinsky-Fay, C., U. Kohler, and M. Luniak (2006). Sequence analysis with

  • Stata. The Stata Journal 6(4), 435–460.

Elzinga, C. H. and A. C. Liefbroer (2007). De-standardization of family-life trajectories of young adults: A cross-national comparison using sequence

  • analysis. European Journal of Population 23, 225–250.

8/7/2009gr 64/66

slide-94
SLIDE 94

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

References II

Gabadinho, A., G. Ritschard, M. Studer, and N. S. M¨ uller (2008). Mining sequence data in R with TraMineR: A user’s guide. Technical report, Department of Econometrics and Laboratory of Demography, University of Geneva, Geneva. (TraMineR is on CRAN the Comprehensive R Archive Network). Hosmer, D. W. and S. Lemeshow (1999). Applied Survival Analysis, Regression Modeling of Time to Event Data. New York: Wiley. Hothorn, T., K. Hornik, and A. Zeileis (2006). party: A laboratory for recursive part(y)itioning. User’s manual. McVicar, D. and M. Anyadike-Danes (2002). Predicting successful and unsuccessful transitions from school to work using sequence methods. Journal of the Royal Statistical Society A 165(2), 317–334. Ritschard, G., A. Gabadinho, N. S. M¨ uller, and M. Studer (2008). Mining event histories: A social science perspective. International Journal of Data Mining, Modelling and Management 1(1), 68–90.

8/7/2009gr 65/66

slide-95
SLIDE 95

Sequential data analysis - 1 Rendering and summarizing state sequences Longitudinal characteristics of individual sequences

References III

Ritschard, G., A. Gabadinho, M. Studer, and N. S. M¨ uller (2009). Converting between various sequence representations. In Z. Ras and A. Dardzinska (Eds.), Advances in Data Management, Volume 223 of Studies in Computational Intelligence, pp. 155–175. Berlin: Springer. Scherer, S. (2001). Early career patterns: A comparison of Great Britain and West Germany. European Sociological Review 17(2), 119–144.

8/7/2009gr 66/66