Assessing Outcomes and Processes of Student Collaboration Peter F. - - PowerPoint PPT Presentation

assessing outcomes and processes of student collaboration
SMART_READER_LITE
LIVE PREVIEW

Assessing Outcomes and Processes of Student Collaboration Peter F. - - PowerPoint PPT Presentation

Assessing Outcomes and Processes of Student Collaboration Peter F. Halpin April 19, 2016 Joint work with: Alina von Davier, Yoav Bergner, Jiangang Hao, Lei Liu (ETS); Jacqueline Gutman (NYU) 1 / 89 Outline Part 1: Wherefore assessments


slide-1
SLIDE 1

Assessing Outcomes and Processes

  • f Student Collaboration

Peter F. Halpin April 19, 2016

Joint work with: Alina von Davier, Yoav Bergner, Jiangang Hao, Lei Liu (ETS); Jacqueline Gutman (NYU)

1 / 89

slide-2
SLIDE 2

Outline

Part 1: Wherefore assessments involving collaboration?

◮ Set up the current perspective: performance assessments ◮ Selective review of research on small group productivity 2 / 89

slide-3
SLIDE 3

Outline

Part 1: Wherefore assessments involving collaboration?

◮ Set up the current perspective: performance assessments ◮ Selective review of research on small group productivity

Part 2: Outcomes of collaboration

◮ Combining psychometric models with research on small group

productivity

◮ Testing models against observed team performance 3 / 89

slide-4
SLIDE 4

Outline

Part 1: Wherefore assessments involving collaboration?

◮ Set up the current perspective: performance assessments ◮ Selective review of research on small group productivity

Part 2: Outcomes of collaboration

◮ Combining psychometric models with research on small group

productivity

◮ Testing models against observed team performance

Part 3: Processes of collaboration

◮ Focus on chat data (for now!) ◮ Modeling engagement among collaborators using temporal

point processes1

1Halpin, von Davier, Hao, & Lui (under review). Journal of Educational Measurement. 4 / 89

slide-5
SLIDE 5

Part 1: Why?

◮ 21st-century skills, non-cognitive skills, soft skills,

hard-to-measure skills, social skills, ...

◮ Theme: traditional educational tests target a relatively narrow

set of constructs

5 / 89

slide-6
SLIDE 6

Part 1: Why?

◮ 21st-century skills, non-cognitive skills, soft skills,

hard-to-measure skills, social skills, ...

◮ Theme: traditional educational tests target a relatively narrow

set of constructs

◮ Analyses of US labour markets indicate that such skills are

valued by employers (Burrus et al.,2013; Deming, 2015)

6 / 89

slide-7
SLIDE 7

Part 1: Why?

◮ 21st-century skills, non-cognitive skills, soft skills,

hard-to-measure skills, social skills, ...

◮ Theme: traditional educational tests target a relatively narrow

set of constructs

◮ Analyses of US labour markets indicate that such skills are

valued by employers (Burrus et al.,2013; Deming, 2015)

◮ There is a salient demand for assessments of a broader range

  • f student competencies

7 / 89

slide-8
SLIDE 8

With apologies to Dr. Duckworth...

upenn.app.box.com/8itemgrit 8 / 89

slide-9
SLIDE 9

Self-reports

◮ Self-report measures often do not require the respondent to

exhibit the skills about which we wish to make inferences

→ Unsuitable for supporting consequential decisions in educational settings2

  • 2cf. Duckworth, & Yeager. (2015). Measurement matters: Assessing personal qualities other than cognitive

ability for educational purposes. Educational Researcher, 44(4), 237-251. 9 / 89

slide-10
SLIDE 10

Educational assessments

Reliability and generalizability in traditional content domains

10 / 89

slide-11
SLIDE 11

Educational assessments

Reliability and generalizability in traditional content domains Current psychometric models don’t seem entirely appropriate to “next generation assessments”

◮ e.g., IRT models don’t use process data 11 / 89

slide-12
SLIDE 12

Educational assessments

Reliability and generalizability in traditional content domains Current psychometric models don’t seem entirely appropriate to “next generation assessments”

◮ e.g., IRT models don’t use process data

Collateral damage: teaching to the test, test anxiety, bubble-filling, ...

◮ NY opt-out movement: 20% of students (parents) boycotted

state test last year3

3www.wnyc.org/story/ new-york-city-students-make-modest-gains-state-tests-opt-out-numbers-triple/ 12 / 89

slide-13
SLIDE 13

Performance assessments4

4Davey, Ferrara, Holland, Shavelson, Webb, & Wise (2015). Psychometric Considerations for the Next Generation of Performance Assessment. Princeton, NJ. p. 10 13 / 89

slide-14
SLIDE 14

Collaboration as a modality of performance assessment

◮ Small group interactions are a highly-valued educational

practice

◮ The Jigsaw Classroom (Aronson et al., 1978; jigsaw.org) ◮ Group-worthy tasks (Cohen et al., 1999)

◮ The use of information technology to support student

collaboration is well established

◮ CSCL (e.g., Hmelo-Silver et al., 2013) 14 / 89

slide-15
SLIDE 15

Collaboration as a modality of performance assessment

◮ Small group interactions are a highly-valued educational

practice

◮ The Jigsaw Classroom (Aronson et al., 1978; jigsaw.org) ◮ Group-worthy tasks (Cohen et al., 1999)

◮ The use of information technology to support student

collaboration is well established

◮ CSCL (e.g., Hmelo-Silver et al., 2013)

◮ The use of group work in assessment contexts has a relatively

long-standing history

◮ e.g., Webb, 1995; 2015 15 / 89

slide-16
SLIDE 16

Intellective tasks

◮ Defined as having a demonstrably

“correct” answer with respect to an agreed upon system of knowledge

◮ Differentiated from decision /

judgement tasks on a continuum

  • f demonstrability (Laughlin

2011)

◮ Differentiated from mixed-motive

tasks in that the goals and

  • utcomes are the same for all

members

McGrath’s (1984) group task circumplex 16 / 89

slide-17
SLIDE 17

Lorge & Solomon 19555

5Two models of group behavior in the solution of Eureka-type problems. Psychometrika, 1955, 20 (2), p. 141 17 / 89

slide-18
SLIDE 18

Lorge & Solomon 19556

6Two models of group behavior in the solution of Eureka-type problems. Psychometrika, 1955, 20 (2), p. 141 18 / 89

slide-19
SLIDE 19

Smoke and Zajonc 19627

If p is the probability that a given individual member is correct, the group has a probability h(p) of being correct, where h(p) is a function of p depending upon the type of decision scheme accepted by the group. We shall call h(p) a decision function. Intuitively, it would seem that a decision scheme is desirable to the extent that it surpasses p.

7On the reliability of group judgements and decisions. In Mathematical methods for small group processes (Eds. Criswell, Solomon, Suppes), p. 322 19 / 89

slide-20
SLIDE 20

Schiflett 19798

8Towards a general model of group productivity. Psychological Bulletin, 86 (1), pp. 67-68 20 / 89

slide-21
SLIDE 21

Summary

◮ Building on research on small groups:

◮ Intellective tasks (vs decision tasks) ◮ Cooperative group interactions (vs competitive or

mixed-motive)

◮ Describing group outcomes via decision / functions that

depend on characteristics of individuals

◮ But with a focus on:

◮ Letting probability of success vary over individuals (e.g., via

ability)

◮ Describing relevant task characteristics (e.g., via difficulty) ◮ The performance of individual groups rather than groups in

aggregate

21 / 89

slide-22
SLIDE 22

Outcomes of collaboration: A basic scenario

◮ Two students each write a conventional math assessment ◮ Their math ability is estimated to be θj and θk ◮ The two students then work together on a second

conventional math assessment

◮ What do we expect about their performance on the second

test, based on the first?

22 / 89

slide-23
SLIDE 23

Collaboration as a psychometric question

◮ Traditional psychometric models assume conditional

independence of the items p(xj | θj) =

N

  • i

p(xij | θj) (1)

◮ Traditional psychometric models also assume that the

responses of two (or more) persons are independent p(xj xk | θj θk) = p(xj | θj) p(xk | θk) (2)

◮ When people work together does equation (2) hold?

23 / 89

slide-24
SLIDE 24

“Working together” in terms of scoring rules9

◮ For binary items and pairs of responses, consider:

◮ The conjunctive rule

xijk =

  • 1

if xij = 1 and xik = 1

  • therwise

◮ The disjunctive rule

xijk = if xij = 0 and xik = 0 1

  • therwise

◮ More possibilities, especially for items with > 2 responses or groups

with > 2 collaborators

  • 9cf. Steiner’s 1966 classification of task types

24 / 89

slide-25
SLIDE 25

Scoring rules vs decision functions

◮ Scoring rules describe what “counts” as a correct group

response

◮ Under control of the test designer10

◮ Decision functions describe the strategies adopted by a team

◮ Under control of the team 10Maris & van der Maas (2012). Speed-accuracy response models: scoring rules based on response time and

  • accuracy. Psychometrika, 77 (4), 615-633

25 / 89

slide-26
SLIDE 26

Scoring rules vs decision functions

◮ Scoring rules describe what “counts” as a correct group

response

◮ Under control of the test designer10

◮ Decision functions describe the strategies adopted by a team

◮ Under control of the team

◮ Basic research strategy

◮ Assume a certain scoring rule ◮ Consider plausible models for team strategies ◮ Test the models against data 10Maris & van der Maas (2012). Speed-accuracy response models: scoring rules based on response time and

  • accuracy. Psychometrika, 77 (4), 615-633

26 / 89

slide-27
SLIDE 27

“Working together” in terms of scoring rules

◮ For binary items and pairs of responses, consider:

◮ The conjunctive rule

xijk = 1 if xij = 1 and xik = 1

  • therwise

◮ The disjunctive rule

xijk =

  • if xij = 0 and xik = 0

1

  • therwise

◮ More possibilities, especially for items with > 2 responses or groups

with > 2 collaborators

27 / 89

slide-28
SLIDE 28

Defining successful pairwise collaboration

◮ The independence model

Eind[xijk | θj θk] = E[xij | θj] E[xik | θk]

28 / 89

slide-29
SLIDE 29

Defining successful pairwise collaboration

◮ The independence model

Eind[xijk | θj θk] = E[xij | θj] E[xik | θk]

◮ Successful collaboration

E[xijk | θj θk] > Eind[xijk | θj θk]

◮ Unsuccessful collaboration

E[xijk | θj θk] < Eind[xijk | θj θk]

◮ Note: these definitions are item- and dyad- specific

29 / 89

slide-30
SLIDE 30

Some models for successful collaboration

◮ Minimum individual performance (disruptive team member)

Emin[xijk | θj θk] = min{E[xij | θj], E[xik | θk]}

30 / 89

slide-31
SLIDE 31

Some models for successful collaboration

◮ Minimum individual performance (disruptive team member)

Emin[xijk | θj θk] = min{E[xij | θj], E[xik | θk]}

◮ Maximum individual performance (cheating / tutor)

Emax[xijk | θj θk] = max{E[xij | θj], E[xik | θk]}

31 / 89

slide-32
SLIDE 32

Some models for successful collaboration

◮ Minimum individual performance (disruptive team member)

Emin[xijk | θj θk] = min{E[xij | θj], E[xik | θk]}

◮ Maximum individual performance (cheating / tutor)

Emax[xijk | θj θk] = max{E[xij | θj], E[xik | θk]}

◮ “True collaboration”

E[xijk | θj θk] ≥ max{E[xij | θj], E[xik | θk]}

32 / 89

slide-33
SLIDE 33

A model for “true collaboration”

◮ An additive model

Eadd[xijk | θj θk] = E[xij | θj] + E[xik | θk] − E[xijk | θj θk]

33 / 89

slide-34
SLIDE 34

A model for “true collaboration”

◮ An additive model

Eadd[xijk | θj θk] = E[xij | θj] + E[xik | θk] − E[xijk | θj θk]

◮ Recalling E[xijk | θj θk] > E[xij | θj]E[xik | θk], define an

additive independence (AI) model EAI[xijk | θj θk] = E[xij | θj] + E[xik | θk] − E[xij | θj] E[xik | θk] ≥ Eadd[xijk | θj θk]

◮ AI is an upper bound on any “more interesting” additive model for

successful collaboration

34 / 89

slide-35
SLIDE 35

More on AI model

◮ Can also be written as:

EAI[xijk | θj θk] = E[xij | θj] (1 − E[xik | θk]) + E[xik | θk] (1 − E[xij | θj]) + E[xij | θj] E[xik | θk]

◮ Which has an interpretation in terms of three cases

35 / 89

slide-36
SLIDE 36

More on AI model

◮ And is also equivalent to Lorge & Solomon’s Model A

EAI[xijk | θj θk] = 1 − (1 − E[xij | θj])(1 − E[xik | θk])

◮ Except the “probability an individual can solve the problem”

now depends on both the individual and the problem

36 / 89

slide-37
SLIDE 37

More on AI model11

◮ We probably want some constraints on what counts as a good

collaborative IRF

11Holland & Rosenbaum (1986). Conditional Association and Unidimensionality in Monotone Latent Variable

  • Models. The Annals of Statistics, 14 (4), 1523 – 1543

37 / 89

slide-38
SLIDE 38

More on AI model11

◮ We probably want some constraints on what counts as a good

collaborative IRF

◮ Easy to show that AI satisfies latent monotonicity, if the

individual IRFs do (trivial for other models also)

11Holland & Rosenbaum (1986). Conditional Association and Unidimensionality in Monotone Latent Variable

  • Models. The Annals of Statistics, 14 (4), 1523 – 1543

38 / 89

slide-39
SLIDE 39

AI: latent monotonicity

Assumptions: f(x) ≥ f(x′) for x > x′ and 0 ≤ g(y) ≤ 1 Show: f(x) + g(y) − f(x) g(y) ≥ f(x′) + g(y) − f(x′) g(y) Contradiction: f(x) + g(y) − f(x) g(y) < f(x′) + g(y) − f(x′) g(y) → f(x) − f(x′) < g(y) (f(x) − f(x′))

39 / 89

slide-40
SLIDE 40

AI: example IRF12

theta1 p r

  • b

12Using 2PL model for individual IRFs with α = 1 and β = 0 40 / 89

slide-41
SLIDE 41

Models abound!

◮ Basic idea: write down IRFs for collaboration based on

assumed-to-be-known individual abilities (and item parameters)

◮ But how do we characterize empirical team performance?

41 / 89

slide-42
SLIDE 42

Empirical team performance

◮ We have

◮ Observed collaborative responses xjk = (x1jk, x1jk, . . . , xmjk) ◮ A model for individual performance on the m (conventional)

math items

42 / 89

slide-43
SLIDE 43

Empirical team performance

◮ We have

◮ Observed collaborative responses xjk = (x1jk, x1jk, . . . , xmjk) ◮ A model for individual performance on the m (conventional)

math items

◮ So we can get “team theta,” e.g.,

ˆ θjk = argmax

θ

{L0(xjk | θ)} (3)

◮ Where L0 is the likelihood of the model calibrated on

individual performance (reference model)

43 / 89

slide-44
SLIDE 44

Proposed method for testing models

◮ Testing of different models against reference model

Dmodel = −2 ln Lmodel(xjk | θj θk) L0(xjk | ˆ θjk) (4)

◮ Also a “direct test” of effect of collaboration for each

individual D0 = −2 ln L0(xjk | θj) L0(xjk | ˆ θjk) (5) with effect size δjk = θjk−θj

σθ

44 / 89

slide-45
SLIDE 45

Proposed method: reference distribution

◮ Ind and AI models are not nested with reference model → No

Wilk’s theorem

◮ Can use Vuong’s 198913 results for LR with non-nested

models, but asymptotic in m

◮ Good news: we can bootstrap a null distribution for (4) and

(5) pretty easily

13Likelihood ratio tests for model selection and non-nested hypotheses. Econometrika, 57(2), 307 – 333. 45 / 89

slide-46
SLIDE 46

Bootstrapping the reference distribution

Assuming known item parameters and θj, θk. For r = 1, . . . , R

Step 1 Generate collaborative response patterns x(r)

jk from

Emodel[xijk | θj θk] Step 2 Compute Lmodel(x(r)

jk | θj θk)

Step 2 Estimate θ(r)

jk for each x(r) jk ; save L0(x(r) jk | θjk)

Step 4 Compute D(r)

model or D(r)

46 / 89

slide-47
SLIDE 47

Example 1

◮ Design

◮ Pool of pre-calibrated math items (grade 12 NAEP, modified

to be numeric response)

◮ Individual “pre-test” → estimate individual abilities ◮ Collaborative “post-test” → evaluate models, estimate δjk ◮ Modality of collaboration: online chat

◮ Limitations:

◮ Small calibration sample; crowd workers ◮ Individual and collaborative forms were not counterbalanced

(neither in order nor content)

47 / 89

slide-48
SLIDE 48

NAEP grade 12 math items, deployed via OpenEdx

48 / 89

slide-49
SLIDE 49

AMT crowdworkers (calibration sample)

Variable Levels n % % Gender Female 155 46.5 46.5 Male 178 53.5 100.0 Age 18-30 117 35.2 35.2 30-40 129 38.9 74.1 40-55 71 21.4 95.5 55+ 15 4.5 100.0 Education Some Grade School 3 0.9 0.9 High School Diploma 49 14.7 15.6 Some College 118 35.4 51.0 Bachelor’s Degree 132 39.6 90.7 Master’s Degree 22 6.6 97.3 Ph.D or Advanced Degree 9 2.7 100.0 Country United States 313 94.0 94.0 India 16 4.8 98.8 Canada 3 0.9 99.7 United Kingdom 1 0.3 100.0 English First Lang Yes 321 96.4 96.4 No 12 3.6 100.0

49 / 89

slide-50
SLIDE 50

Deltas

  • 2

2

  • 2

2

Individual Theta Collaborative Theta

Collaborative vs Individual Performance

50 / 89

slide-51
SLIDE 51

Model tests: Sanity check using individual pre-test

Figure reports P(Dmodel > |obs|) for individual pre-tests scored using conjunctive scoring rule

51 / 89

slide-52
SLIDE 52

Model tests: Collaborative data

Figure reports P(Dmodel > |obs|) for collaborative tests scored using conjunctive scoring rule

52 / 89

slide-53
SLIDE 53

Ind Model

  • 2

2

  • 2

2

Individual Theta Collaborative Theta pairs

3 4 12 15 16 35

Collaborative vs Individual Performance

53 / 89

slide-54
SLIDE 54

Min Model

  • 2

2

  • 2

2

Individual Theta Collaborative Theta pairs

7 8 9 13 29 36

Collaborative vs Individual Performance

54 / 89

slide-55
SLIDE 55

Max Model

  • 2

2

  • 2

2

Individual Theta Collaborative Theta pairs

2 5 10 14 17 19 22 24 25 26 27 28 30 38 44 45

Collaborative vs Individual Performance

55 / 89

slide-56
SLIDE 56

AI Model

  • 2

2

  • 2

2

Individual Theta Collaborative Theta pairs

6 18 20 21 23 32 33 37 39 40 41 42 43

Collaborative vs Individual Performance

56 / 89

slide-57
SLIDE 57

Not one of our four models

  • 2

2

  • 2

2

Individual Theta Collaborative Theta pairs

1 11 31 34

Collaborative vs Individual Performance

57 / 89

slide-58
SLIDE 58

Summary of collaborative outcomes

◮ Can define, estimate, and test models of collaboration on

academic performance using IRT-based methods

◮ But how distinct are these models, really? ◮ Models do not cover all cases

58 / 89

slide-59
SLIDE 59

Possible next step – one model to rule them all!

Let w1, w2 ∈ [0, 1] and define the weighted additive independence model

EWAI[Xijk | θj θk] = wjPi(θj) Qi(θk) + wkPi(θk) Qi(θj) + Pi(θj) Pi(θk)

◮ Includes original four and everything in between ◮ Includes (Pi(θj) + Pi(θk))/2 when w1 = w2 = .5 ◮ Weights describe how well each individual obtains his/her “optimal

collaboration level”

59 / 89

slide-60
SLIDE 60

Part 3: What are process data?14

14Halpin & von Davier 2013, Hao, & Lui (under review). Journal of Educational Measurement. 60 / 89

slide-61
SLIDE 61

Part 3: What are process data?14

◮ Any task-related actions of a respondent performed during the

completion of a task

◮ In ed tech context, typically associated with time-stamped user

logs (“trace data”)

14Halpin & von Davier 2013, Hao, & Lui (under review). Journal of Educational Measurement. 61 / 89

slide-62
SLIDE 62

Part 3: What are process data?14

◮ Any task-related actions of a respondent performed during the

completion of a task

◮ In ed tech context, typically associated with time-stamped user

logs (“trace data”)

◮ All the stuff IRT ignores:

p(x | θ) =

  • i

p(xi | θ)

14Halpin & von Davier 2013, Hao, & Lui (under review). Journal of Educational Measurement. 62 / 89

slide-63
SLIDE 63

Part 3: What are collaborative process data?

◮ Ideally a richly detailed recording of the sequence of actions

taken by each team member during the completion of a task

◮ ATC21S collaborative problem solving prototype items15 ◮ CPS frame16 15http://www.atc21s.org/uploads/3/7/0/0/37007163/pd_module_3_nonadmin.pdf 16In alpha at Computational Psychometrics lab at ETS 63 / 89

slide-64
SLIDE 64

Part 3: What are collaborative process data?

◮ Ideally a richly detailed recording of the sequence of actions

taken by each team member during the completion of a task

◮ ATC21S collaborative problem solving prototype items15 ◮ CPS frame16

◮ Focus today: chat messages sent between online collaborators

15http://www.atc21s.org/uploads/3/7/0/0/37007163/pd_module_3_nonadmin.pdf 16In alpha at Computational Psychometrics lab at ETS 64 / 89

slide-65
SLIDE 65

Two perspectives on the analysis of chat / email / etc.

◮ Text-based analysis of strategy and sentiment

◮ e.g., Howley, Mayfield, & Ros`

e, 2013; Liu, Hao, von Davier, Kyllonen, & Zapata-Rivera, 2015

◮ Time series analysis of sending times

◮ e.g., Barab`

asi, 2005; Ebel, Mielsch, & Bornholdt, 2002; Halpin & De Boeck, 2013

65 / 89

slide-66
SLIDE 66

Temporal point process: basic idea (more tomorrow)17

◮ Data: events that have negligible duration relative to a period

  • f observation

◮ Contrast events with states, regimes

◮ Basic idea: model the Bernoulli probability of an event

happening in a small window of time [t, t + ∆), conditional on the events that have happened before t ∈ R+.

◮ “Instantaneous probability” of an event, denoted p(t) 17Daley, D. J., & Vera-Jones. (2003). An introduction to the theory of point processes: Elementary theory and methods (2nd ed., Vol. 1). New York: Springer. 66 / 89

slide-67
SLIDE 67

Temporal point process in interpersonal context

◮ Modeling p(t) to describe

◮ How the probability of each person’s actions changes in

continuous time

◮ How this depends on their previous actions ◮ Emergent or group-level phenomena like coordination,

reciprocity, ...

67 / 89

slide-68
SLIDE 68

Chat engagement via Hawkes processes

◮ Hawkes process provides a means of modeling instantaneous

probabilities in a multivariate context

◮ Halpin et al. (under review) suggest the response intensity

parameter as a measure of engagement of student j with k αjk > ¯ njk/nk (6)

◮ ¯

njk is the expected total number of responses made by student j to student k is (inferred from model)

◮ nk is the number of actions of student k (observed) ◮ Lower bound is tight in practice; not necessary for

computations

68 / 89

slide-69
SLIDE 69

Chat engagement via Hawkes processes

◮ Aggregating to team (dyad) level

α ≡ α12n2 + α21n1 n1 + n2 (7)

◮ Interpretation: the proportion of all group members’ actions,

n1 + n2, that were responded to by any other member during a collaboration

◮ See paper for more details, including initial results on SEs of

αjk

69 / 89

slide-70
SLIDE 70

Example: Tetralogue

◮ A simulation-based science game with an embedded

assessment recently developed at ETS (Hao, Liu, von Davier, & Kyllonen, 2015)

1 Dyads work together to learn and make predictions about volcano activity 2 At various points in the simulation, the students are asked to individually submit their responses to an assessment item without discussing the item 3 Following submission of responses from both students, they are invited to discuss the question and their answers 4 Lastly, they are given an opportunity to revise their responses to the item, with the final answers counting towards the team’s score

70 / 89

slide-71
SLIDE 71

Example: Full sample

◮ 286 dyads solicited via AMT and randomly paired (based on

arrival in queue)

◮ Median reported age was 31.5 years ◮ 52.5% reported that they were female ◮ 79.2% reported that they were White. ◮ Additionally, all participants were required to

◮ Have an IP address located in the United States ◮ Self-identify as speaking English as their primary language ◮ Self-identify as having at least one year of college education 71 / 89

slide-72
SLIDE 72

Example: Estimating chat engagement

5 10 15 0.0 0.2 0.4 0.6 0.8

Alpha count

Engagement Index

0.00 0.05 0.10 0.15 50 75 100 125

Number of Chats of Partner Standard Error

Method Hessian Lower Bound

Standard Error Against Number of Chats

0.0 0.2 0.4 0.6 0.2 0.4 0.6

Alpha Alpha

Relation with Partner's Index

1 2 3 4 0.0 0.2 0.4 0.6

Alpha Difference in Number of Chats

Relation with Number of Chats

Note: Alpha denotes the estimated response intensities from Equation 6. Hessian denotes standard errors obtained via the Hessian of the log-likelihood. See appendix of Halpin et al. for Lower Bound. Difference in Number of Chats was scaled using the log of the absolute value of the difference. 72 / 89

slide-73
SLIDE 73

Example: Relation with revision on embedded assessment

0.25 0.30 0.35 0.40

Alpha Partner Alpha Team Alpha Mean Engagement

No Revisions Revisions

Measures of Chat Engagement vs Item Revisions

Note: Comparison of mean levels of engagement indices for individuals who either did or did not revise at least one response after discussion with their partners. Alpha denotes the estimated response intensities from Equation 6; Partner’s Alpha denotes the partner’s response intensity; Team Alpha denotes the team-level index in Equation 7. For the latter, the data are reported for dyads, not individuals, and no revisions means that both individuals on the team made no revisions. Error bars are 95% confidence intervals on the means. 73 / 89

slide-74
SLIDE 74

Example: Relation with revision on embedded assessment

Table 1: Summary of group differences.

Index Group Mean SD N Hedges’ g r Alpha No Revisions 0.31 0.13 82 – Alpha Revisions 0.36 0.10 66 0.40 .20 Partner’s Alpha No Revisions 0.31 0.14 82 – Partner’s Alpha Revisions 0.37 0.14 66 0.44 .21 Team Alpha No Revisions 0.27 0.11 26 – Team Alpha Revisions 0.37 0.13 48 0.84 .38

Note: Alpha denotes the estimated response intensities from Equation alpha2; Partner’s Alpha denotes the engagement index of the individual’s partner; Team Alpha denotes the team-level index in Equation 7. Hedges’ g used the correction factor described by Hedges (1981) and r denotes the point-biserial correlation. 74 / 89

slide-75
SLIDE 75

Summary of collaborative processes

◮ Hawkes processes are a feasible model for process data obtained on

collaborative tasks

◮ Resulting measures of chat engagement are meaningfully related to

task performance

◮ Future modeling work

◮ Random effects models for simultaneous estimation over multiple groups ◮ Inclusion of model parameters describing task characteristics ◮ Analytic expressions for standard errors of model parameters ◮ Methods for improving optimization with relatively small numbers of

events

◮ Integration with text-based analyses (e.g., using marks / time-varying

covariates)

75 / 89

slide-76
SLIDE 76

What’s next

◮ Integration of task design, outcomes, processes, ... and theory!!

76 / 89

slide-77
SLIDE 77

Contact: peter.halpin@nyu.edu Support: This research was funded by a postdoctoral fellowship from the Spencer Foundation and an Education Technology grant from NYU Steinhardt.

77 / 89

slide-78
SLIDE 78

References not already included in footnotes

Aronson, E., Blaney, N., Stephan, C., Sikes, J., & Snapp, M. (1978). The jigsaw classroom. Beverly Hills, CA: Sage. Burrus, J., Carlson, J., Bridgeman, B., Golub-smith, M., & Greenwood, R. (2013). Identifying the Most Important 21st Century Workforce Competencies : An Analysis of the Occupational Information Network ( O * NET ) (ETS RR-13-21). Princeton, NJ. Cohen, E. G., Lotan, R. A., Scarloss, B. A., & Arellano, A. R. (1999). Complex instruction: Equity in cooperative learning classrooms. Theory Into Practice, 38, 80-86. Davey, T., Ferrara, S., Holland, P. W., Shavelson, R. J., Webb, N. M., & Wise, L. L. (2015). Psychometric Considerations for the Next Generation of Performance Assessment. Princeton, NJ. Deming, D. J. (2015). The Growing Importance of Social Skills in the Labor Market. National Bureau of Economic Research Working Paper Series, (21473). Griffin, P., & Care, E. (2015). Assessment and teaching of 21st century skills: Methods and approach. New York: Springer. Hmelo-Silver, C. E., Chinn, C. A., Chan, C. K., & O?Donnel, A. M. (2013). International handbook of collaborative learning. New York: Taylor and Francis. McGrath, J. E. (1984). Groups: Interaction and performance. (Prentice-Hall, Ed.). Englewood Cliffs, NJ. Organisation for Economic Co-operation and Development. (2013). PISA 2015 Draft Collaborative Problem Solving Framework. Retrieved from http://www.oecd.org/pisa/pisaproducts/DraftPISA2015CollaborativeProblemSolvingFramework.pdf Webb, N. M. (1995). Group Collaboration in Assessment: Multiple Objectives, Processes, and Outcomes. Educational Evaluation and Policy Analysis, 17(2), 239-261. 78 / 89

slide-79
SLIDE 79

Bootstrapping the reference distribution for Dmodel

Assuming known item parameters and θj, θk. For r = 1, . . . , R

Step 1 Generate collaborative response patterns x(r)

jk from

Emodel[xijk | θj θk] Step 2 Compute Lmodel(x(r)

jk | θj θk)

Step 2 Estimate θ(r)

jk for each x(r) jk ; save L0(x(r) jk | θjk)

Step 4 Compute D(r)

model or D(r)

79 / 89

slide-80
SLIDE 80

Instructions

80 / 89

slide-81
SLIDE 81

Jigsaw / information sharing items

81 / 89

slide-82
SLIDE 82

Jigsaw / information sharing items

82 / 89

slide-83
SLIDE 83

Jigsaw / information sharing items

83 / 89

slide-84
SLIDE 84

Jigsaw / information sharing items

84 / 89

slide-85
SLIDE 85

Hints / information requesting items

85 / 89

slide-86
SLIDE 86

Hints / information requesting items

86 / 89

slide-87
SLIDE 87

Hints / information requesting items

87 / 89

slide-88
SLIDE 88

Multiple answer / negotiation items

88 / 89