How Many Is Too Much? ? Exploring Cos Costs ts of of Coor - - PowerPoint PPT Presentation

how many is too much exploring cos costs ts of of coor
SMART_READER_LITE
LIVE PREVIEW

How Many Is Too Much? ? Exploring Cos Costs ts of of Coor - - PowerPoint PPT Presentation

How Many Is Too Much? ? Exploring Cos Costs ts of of Coor Coordinati tion on Du Duri ring Outages Dr. Laura M.D. Maguire Cognitive Systems Engineering Lab The Ohio State University @LauraMDMaguire @LauraMDMaguire @LauraMDMaguire


slide-1
SLIDE 1

@LauraMDMaguire @LauraMDMaguire

How Many Is Too Much? ? Exploring Cos Costs ts of

  • f Coor

Coordinati tion

  • n Du

Duri ring Outages

  • Dr. Laura M.D. Maguire

Cognitive Systems Engineering Lab

The Ohio State University

slide-2
SLIDE 2

@LauraMDMaguire

slide-3
SLIDE 3

@LauraMDMaguire

slide-4
SLIDE 4

@LauraMDMaguire

slide-5
SLIDE 5

@LauraMDMaguire

Zoom

slide-6
SLIDE 6

@LauraMDMaguire

slide-7
SLIDE 7

@LauraMDMaguire

(c) AleksandarNakic NASA

slide-8
SLIDE 8

@LauraMDMaguire

Software is increasingly managing critical societal functions

911 call routing systems Financial markets Electronic health records

slide-9
SLIDE 9

@LauraMDMaguire

Overview

Changing nature of ‘control rooms’ & implications Cognitive & Coordinative work Coordination 4 interesting findings Implications for your work

*Hint: You are probably going to want to rethink the need for an Incident Commander.

slide-10
SLIDE 10

@LauraMDMaguire

Cognitive work

Perceiving Reasoning Attending Forming an Action

slide-11
SLIDE 11

@LauraMDMaguire

Cognitive work

Planning Prioritizing Troubleshooting Recognizing change Observing Inferring Anticipating Diagnosing Correcting Modifying Reacting Reasoning

Perceiving Reasoning Attending Forming an Action

slide-12
SLIDE 12

@LauraMDMaguire

Cognitive work

Planning Prioritizing Troubleshooting Recognizing change Observing Inferring Anticipating Diagnosing Correcting Modifying Reacting Reasoning

Coordinative work

Synchronizing Grounding Updating Signaling Taking Initiative Taking Direction Delegating Relaxing goals or constraints Reciprocity

Perceiving Reasoning Attending Forming an Action

Recruiting

slide-13
SLIDE 13

@LauraMDMaguire

Cognitive costs of coordination – additional mental effort, load and delay required to participate in joint activity.

slide-14
SLIDE 14

@LauraMDMaguire

Wait… but then why y coordinate?

  • 24/7 ops
  • Geographically distributed
  • Dependencies
  • Specialized functions
  • Characterized by continuous change
  • Complex, interactive systems
  • Operating at speed & scale
slide-15
SLIDE 15

@LauraMDMaguire

Wait, but then why y coordinate?

  • 24/7 ops
  • Geographically distributed
  • Dependencies
  • Specialized functions
  • Characterized by continuous change
  • Complex, interactive systems
  • Operating at speed & scale

“Woods' Theorem: As the complexity of a system increases, the accuracy of any single agent's own model of that system decreases rapidly.”

  • Stella report

(stella.io)

slide-16
SLIDE 16

@LauraMDMaguire

Which people are important…

slide-17
SLIDE 17

@LauraMDMaguire

Which people are important… …in what collaborative interplay…

slide-18
SLIDE 18

@LauraMDMaguire

Which people are important… …in what collaborative interplay… …in what sequence?

slide-19
SLIDE 19

@LauraMDMaguire

The progression of an incident

Cognitive demands Coordinative demands

slide-20
SLIDE 20

@LauraMDMaguire

The coordination paradox

  • In complex adaptive systems, everyone’s model

is going to be partial and incomplete (Woods,

2017).

slide-21
SLIDE 21

@LauraMDMaguire

The coordination paradox

  • In complex adaptive systems, everyone’s model

is going to be partial and incomplete (Woods

2017).

  • Therefore we need multiple, diverse

perspectives to handle non-routine or exceptional events (Grayson, 2018, Watts-Perotti &

Woods, 2001).

slide-22
SLIDE 22

@LauraMDMaguire

The coordination paradox

  • In complex adaptive systems, everyone’s model

is going to be partial and incomplete (Woods

2017).

  • Therefore we need multiple, diverse

perspectives to handle non-routine or exceptional events (Grayson, 2018, Watts-Perotti &

Woods, 2001).

  • But there is additional cognitive load working

with others (Klein et al, 2005; Maguire, 2019).

slide-23
SLIDE 23

@LauraMDMaguire

The coordination paradox

How to reap the benefits of joint activity without the costs of coordination becoming too high? What strategies do software engineers use to control the costs of coordination?

slide-24
SLIDE 24

@LauraMDMaguire

What did I find?

1) Incident response 2) Incident command 3) Adaptation was key 4) Tooling can increase CoC

slide-25
SLIDE 25

@LauraMDMaguire

SNAFU Catchers Consortium Cycle 2

slide-26
SLIDE 26

@LauraMDMaguire

Incident Response – a model

Blogs.cisco.com

slide-27
SLIDE 27

@LauraMDMaguire

Incident Response – the hidden stuff

Is this an incident? Anuj would know. How is the tech debt from this incident going to impact us later? Not sure why that worked or how long it will hold… I better tell the other devs

You have new mail From: CEO To: Responder Subject: WTF is going on??!?

How is the tech debt from last incident going to impact us now?

I need to get Sarah, she can do this better than I can

I’m not sure its actually

  • ver. We

should make sure we don’t burn out.

I think I need help but I don’t want to wake anyone up until I’m sure

I don’t know what it is yet but we need to take action NOW.

slide-28
SLIDE 28

@LauraMDMaguire

Incident Response – the hidden stuff

How is the tech debt from this incident going to impact us later? Not sure why that worked or how long it will hold… I better tell the other devs

You have new mail From: CEO To: Responder Subject: WTF is going on??!?

How is the tech debt from this incident going to impact us now? I’m not sure its actually

  • ver. We

should make sure we don’t burn out.

I think I need help but I don’t want to wake anyone up until I’m sure

I don’t know what it is yet but we need to take action NOW. Is this an incident? Anuj would know.

I need to get Sarah, she can do this better than I can

slide-29
SLIDE 29

@LauraMDMaguire

Incident Response – the hidden stuff

Not sure why that worked or how long it will hold… I better tell the other devs

You have new mail From: CEO To: Responder Subject: WTF is going on??!?

I’m not sure its actually

  • ver. We

should make sure we don’t burn out.

I think I need help but I don’t want to wake anyone up until I’m sure

I don’t know what it is yet but we need to take action NOW. Is this an incident? Anuj would know.

I need to get Sarah, she can do this better than I can

How is the tech debt from this incident going to impact us later? How is the tech debt from last incident going to impact us now?

slide-30
SLIDE 30

@LauraMDMaguire How is the tech debt from this incident going to impact us later? How is the tech debt from last incident going to impact us now?

Incident Response – the hidden stuff

You have new mail From: CEO To: Responder Subject: WTF is going on??!?

I’m not sure its actually

  • ver. We

should make sure we don’t burn out. Is this an incident? Anuj would know.

I need to get Sarah, she can do this better than I can

Not sure why that worked or how long it will hold… I better tell the other devs

I think I need help but I don’t want to wake anyone up until I’m sure

I don’t know what it is yet but we need to take action NOW.

slide-31
SLIDE 31

@LauraMDMaguire How is the tech debt from this incident going to impact us later? How is the tech debt from last incident going to impact us now?

Incident Response – the hidden stuff

You have new mail From: CEO To: Responder Subject: WTF is going on??!?

Is this an incident? Anuj would know.

I need to get Sarah, she can do this better than I can

Not sure why that worked or how long it will hold… I better tell the other devs

I think I need help but I don’t want to wake anyone up until I’m sure

I don’t know what it is yet but we need to take action NOW. I’m not sure it is actually

  • ver. We

should make sure we don’t burn out.

slide-32
SLIDE 32

@LauraMDMaguire

slide-33
SLIDE 33

@LauraMDMaguire

“The incident commander holds the high-level state about the incident. They structure the incident response task force, assigning responsibilities according to need and priority.

De facto, the commander holds all positions that they have not delegated.”

Beyer et al (2016)

slide-34
SLIDE 34

@LauraMDMaguire

slide-35
SLIDE 35

@LauraMDMaguire t0

slide-36
SLIDE 36

@LauraMDMaguire t0

slide-37
SLIDE 37

@LauraMDMaguire t0

slide-38
SLIDE 38

@LauraMDMaguire t0

slide-39
SLIDE 39

@LauraMDMaguire

Responder

slide-40
SLIDE 40

@LauraMDMaguire

Responder Responder Responder

slide-41
SLIDE 41

@LauraMDMaguire

slide-42
SLIDE 42

@LauraMDMaguire

slide-43
SLIDE 43

@LauraMDMaguire

slide-44
SLIDE 44

@LauraMDMaguire

Ad Adaptive Ch Choreography.

Dynamically reconfiguring how coordination happens.

slide-45
SLIDE 45

@LauraMDMaguire

slide-46
SLIDE 46

@LauraMDMaguire Taking Direction Recruiting others Being recruitable Backfilling IC tasks Model Updating Investing Taking Initiative Updating Sharing Info Deciding Anticipating Adjusting

slide-47
SLIDE 47

@LauraMDMaguire Taking Direction Recruiting others Being recruitable Backfilling IC tasks Model Updating Investing Taking Initiative Updating Sharing Info Deciding Anticipating Adjusting

Ad Adaptive Ch Choreography.

slide-48
SLIDE 48

@LauraMDMaguire

Which people/machines are important… …in what collaborative interplay… …in what sequence?

slide-49
SLIDE 49

@LauraMDMaguire

Costs of coordination with tooling

  • Lag/delay
  • Reduced functionality
  • Glitches
  • Updating
  • Calibration
  • Difficulty with access
  • Limited observability
  • Investments in:
  • Selecting
  • Testing
  • Piloting
  • Launching
  • Switching
  • Calibration
  • Re-calibrating
  • Working around

limitations

slide-50
SLIDE 50

@LauraMDMaguire

What did I find?

1) Incident response has technical and coordinative demands 2) Incident command should be A role, not THE role 3) Adaptation was key 4) Tooling can increase costs of coordination.

slide-51
SLIDE 51

@LauraMDMaguire

Call to Action

slide-52
SLIDE 52

@LauraMDMaguire

References

  • Klein, G., Feltovich, P. J., Bradshaw, J. M., & Woods, D. D. (2005). Common ground and coordination in joint

activity.

  • Woods, D. D., ed. (2017). STELLA Report from the SNAFU Catchers Workshop on Coping With Complexity.
  • Allspaw, J. (2015). Trade-Offs under Pressure: Heuristics and Observations of Teams Resolving Internet

Service Outages

  • Maguire, L. (2019). Managing the hidden costs of coordination. ACM Queue
  • Grayson, M. R. (2018). Approaching Overload: Diagnosis and Response to Anomalies in Complex and

Automated Production Software Systems.

  • Patterson, E. S., Watts-Perotti, J., & Woods, D. D. (1999). Voice loops as coordination aids in space shuttle

mission control.

  • Patterson, E. S., & Woods, D. D. (2001). Shift changes, updates, and the on-call architecture in space shuttle

mission control.

  • Watts-Perotti, J. and Woods, D. D. (2007). How Anomaly Response is Distributed Across Functionally Distinct

Teams in Space Shuttle Mission Control.

Interested in chatting further? workshops@jeli.io

slide-53
SLIDE 53

@LauraMDMaguire @LauraMDMaguire

How Many Is Too Much ch?.

  • Dr. Laura M.D. Maguire

Cognitive Systems Engineer & Researcher

Exploring Costs of Coordination During Outages