Common Execution Infrastructure (CEI) Subsystem OOI CI System - - PowerPoint PPT Presentation

common execution infrastructure cei subsystem
SMART_READER_LITE
LIVE PREVIEW

Common Execution Infrastructure (CEI) Subsystem OOI CI System - - PowerPoint PPT Presentation

Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: R3 Kickoff Meeting 1 CEI Developers CEI Developer Patrick Armstrong CEI Senior Developer University of Chicago Pierre Riteau


slide-1
SLIDE 1

R3 Kickoff Meeting

Ocean Observatories Initiative

Common Execution Infrastructure (CEI) Subsystem

OOI CI System Architecture Team:

1

slide-2
SLIDE 2

R3 Kickoff Meeting

CEI Developers

2

10/18/12 2 CEI Developer John Bresnahan Argonne National Lab (part-time) CEI Developer Patrick Armstrong University of Chicago CEI Developer Pierre Riteau University of Chicago (part-time) CEI Senior Developer Pierre Riteau University of Chicago

slide-3
SLIDE 3

R3 Kickoff Meeting

Subsystem Purpose

  • Allow OOI applications and

system to

– Provide Highly Available (HA) services – Scale to demand

  • Enact OOI deployment

policies in elastic environment

  • Provide a deployment

foundation for OOI CI

3

slide-4
SLIDE 4

R3 Kickoff Meeting

Core System Structure: Service Layers

4

slide-5
SLIDE 5

R3 Kickoff Meeting

CEI Scope

  • Elastic Computing Services

– Implement elastic computing services to provide on-demand scaling and high availability.

  • Execution Engine Catalog & Repository Services

– Working with operations and ITV to develop and refine tools to upload and sync the different deployable type representations adapted to each site.

  • Process Management Services

– Provide the management services for policy-based process execution within specified deployable types intended to support the data distribution services; as such the processes are sequential and require primarily a process to resource match.

  • Process Catalog & Repository Services

– The Process Catalog and Repository Services maintain process definitions as well as lists active processes.

  • Integration with the National Computing Infrastructure

– Provide the capability to deploy OOI processing on the Amazon cloud services as well as academic clouds

5

slide-6
SLIDE 6

R3 Kickoff Meeting

High Availability and Scaling

  • High Availability

– Towards an always-on service model – Failures in outsourced resources – Providing a pool of replenishable compute resources

  • Autoscaling

– Provide resources for peaks in demand – Ensure good utilization during “valleys” in demand – Flexible resource mix

10/1 8/12 6

slide-7
SLIDE 7

R3 Kickoff Meeting

Resources for HA and Scaling

10/1 8/12 7

– Cloud resources are available on-demand, but any particular resource may fail at any time – Applications/processes can absorb new resources – Applications/processes can tolerate failures

slide-8
SLIDE 8

R3 Kickoff Meeting

Resources for HA and Scaling

10/1 8/12 7

– Cloud resources are available on-demand, but any particular resource may fail at any time – Applications/processes can absorb new resources – Applications/processes can tolerate failures

slide-9
SLIDE 9

R3 Kickoff Meeting

Resources for HA and Scaling

10/1 8/12 7

– Cloud resources are available on-demand, but any particular resource may fail at any time – Applications/processes can absorb new resources – Applications/processes can tolerate failures

slide-10
SLIDE 10

R3 Kickoff Meeting

Resources for HA and Scaling

10/1 8/12 7

– Cloud resources are available on-demand, but any particular resource may fail at any time – Applications/processes can absorb new resources – Applications/processes can tolerate failures

slide-11
SLIDE 11

R3 Kickoff Meeting

Resources for HA and Scaling

10/1 8/12 7

– Cloud resources are available on-demand, but any particular resource may fail at any time – Applications/processes can absorb new resources – Applications/processes can tolerate failures

slide-12
SLIDE 12

R3 Kickoff Meeting

Resources for HA and Scaling

10/1 8/12 7

– Cloud resources are available on-demand, but any particular resource may fail at any time – Applications/processes can absorb new resources – Applications/processes can tolerate failures

slide-13
SLIDE 13

R3 Kickoff Meeting

Resources for HA and Scaling

10/1 8/12 7

– Cloud resources are available on-demand, but any particular resource may fail at any time – Applications/processes can absorb new resources – Applications/processes can tolerate failures

EPU

slide-14
SLIDE 14

R3 Kickoff Meeting

Resources for HA and Scaling

10/1 8/12 7

EPU Management Monitor and regulate set properties based on system-specific and application-specific metrics

– Cloud resources are available on-demand, but any particular resource may fail at any time – Applications/processes can absorb new resources – Applications/processes can tolerate failures

EPU

slide-15
SLIDE 15

R3 Kickoff Meeting

Managing Resources

8

slide-16
SLIDE 16

R3 Kickoff Meeting

Elastic Processing Unit (EPU) Management

9

AMQP Other

slide-17
SLIDE 17

R3 Kickoff Meeting

EPU Management

Elastic Processing Unit (EPU) Management

9

Decision Engine

AMQP Other

slide-18
SLIDE 18

R3 Kickoff Meeting

EPU Management

Elastic Processing Unit (EPU) Management

9

Decision Engine Provisioner create instance

AMQP Other

slide-19
SLIDE 19

R3 Kickoff Meeting

EPU Management

Elastic Processing Unit (EPU) Management

9

Decision Engine Provisioner create instance

AMQP Other

DTRS

slide-20
SLIDE 20

R3 Kickoff Meeting

EPU Management

Elastic Processing Unit (EPU) Management

9

Decision Engine Provisioner IaaS create instance

AMQP Other

DTRS

slide-21
SLIDE 21

R3 Kickoff Meeting

EE ioncore 1.3 EPU Management

Elastic Processing Unit (EPU) Management

9

EE ioncore 1.2

context-agent

  • u-agent

EE matlab 6.1

context-agent

  • u-agent

Decision Engine

context-agent

  • u-agent

Provisioner IaaS create instance

AMQP Other

DTRS

slide-22
SLIDE 22

R3 Kickoff Meeting

EE ioncore 1.3 EPU Management

Elastic Processing Unit (EPU) Management

9

EE ioncore 1.2

context-agent

  • u-agent

EE matlab 6.1

context-agent

  • u-agent

Decision Engine

context-agent

  • u-agent

Provisioner IaaS create instance

AMQP Other

DTRS CB

slide-23
SLIDE 23

R3 Kickoff Meeting

EE ioncore 1.3 EPU Management

Elastic Processing Unit (EPU) Management

9

EE ioncore 1.2

context-agent

  • u-agent

EE matlab 6.1

context-agent

  • u-agent

Decision Engine

context-agent

  • u-agent

Provisioner IaaS create instance

AMQP Other

DTRS CB

slide-24
SLIDE 24

R3 Kickoff Meeting

EE ioncore 1.3 EPU Management EPU Management EPU Management

Elastic Processing Unit (EPU) Management

9

EE ioncore 1.2

context-agent

  • u-agent

EE matlab 6.1

context-agent

  • u-agent

Decision Engine

context-agent

  • u-agent

Provisioner IaaS create instance

AMQP Other

DTRS CB

slide-25
SLIDE 25

R3 Kickoff Meeting

EE ioncore 1.3 EPU Management EPU Management EPU Management

Elastic Processing Unit (EPU) Management

9

EE ioncore 1.2

context-agent

  • u-agent

EE matlab 6.1

context-agent

  • u-agent

Decision Engine

context-agent

  • u-agent

Provisioner IaaS create instance

AMQP Other

DTRS CB

slide-26
SLIDE 26

R3 Kickoff Meeting

Making the EPU HA

AMQP Other

slide-27
SLIDE 27

R3 Kickoff Meeting

Making the EPU HA

Bootstrap EPU Dedicated DE

AMQP Other

slide-28
SLIDE 28

R3 Kickoff Meeting

Making the EPU HA

Bootstrap EPU Dedicated DE Provisioner/DTRS create instance

AMQP Other

slide-29
SLIDE 29

R3 Kickoff Meeting

Making the EPU HA

  • u-agent
  • u-agent
  • u-agent

EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker Bootstrap EPU Dedicated DE Provisioner/DTRS IaaS create instance

AMQP Other

slide-30
SLIDE 30

R3 Kickoff Meeting

Making the EPU HA

  • u-agent
  • u-agent
  • u-agent

EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker Bootstrap EPU Dedicated DE Provisioner/DTRS IaaS create instance

AMQP Other

slide-31
SLIDE 31

R3 Kickoff Meeting

Making the EPU HA

  • u-agent
  • u-agent
  • u-agent

EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker Bootstrap EPU Dedicated DE Provisioner/DTRS IaaS create instance

AMQP Other

cloudinit.d

slide-32
SLIDE 32

R3 Kickoff Meeting

Managing Processes

slide-33
SLIDE 33

R3 Kickoff Meeting

Creating a Process I

12

Process Dispatcher EE type A instance ee-agent Decision Engine

AMQP Other

slide-34
SLIDE 34

R3 Kickoff Meeting

Creating a Process I

12

Process Dispatcher EE type A instance request to activate process X ee-agent Decision Engine

AMQP Other

slide-35
SLIDE 35

R3 Kickoff Meeting

Creating a Process I

12

Process Definition Registry Process Dispatcher EE type A instance request to activate process X ee-agent Decision Engine lookup

AMQP Other

slide-36
SLIDE 36

R3 Kickoff Meeting

Creating a Process I

12

Process Definition Registry Process Dispatcher EE type A instance request to activate process X ee-agent Decision Engine lookup launch

AMQP Other

slide-37
SLIDE 37

R3 Kickoff Meeting

Creating a Process I

12

Process Definition Registry Process Dispatcher EE type A instance Process Instance Registry request to activate process X ee-agent Decision Engine lookup launch enter

AMQP Other

slide-38
SLIDE 38

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Process Instance Registry Decision Engine

AMQP Other

slide-39
SLIDE 39

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Process Instance Registry request to activate process X Decision Engine

AMQP Other

slide-40
SLIDE 40

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Process Instance Registry request to activate process X Decision Engine lookup

AMQP Other

slide-41
SLIDE 41

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher EPU Management Process Instance Registry request to activate process X Decision Engine lookup request instance

AMQP Other

slide-42
SLIDE 42

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Provisioner/DTRS EPU Management Process Instance Registry request to activate process X Decision Engine lookup request instance create instance

AMQP Other

slide-43
SLIDE 43

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Provisioner/DTRS IaaS EPU Management Process Instance Registry request to activate process X Decision Engine lookup request instance create instance

AMQP Other

slide-44
SLIDE 44

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Provisioner/DTRS IaaS EE type A instance EPU Management Process Instance Registry request to activate process X ee-agent Decision Engine lookup request instance create instance

AMQP Other

slide-45
SLIDE 45

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Provisioner/DTRS IaaS EE type A instance EPU Management Process Instance Registry request to activate process X ee-agent Decision Engine lookup request instance create instance

AMQP Other

slide-46
SLIDE 46

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Provisioner/DTRS IaaS EE type A instance EPU Management Process Instance Registry request to activate process X ee-agent Decision Engine lookup launch request instance create instance

AMQP Other

slide-47
SLIDE 47

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry Process Dispatcher Provisioner/DTRS IaaS EE type A instance EPU Management Process Instance Registry request to activate process X ee-agent Decision Engine lookup launch enter request instance create instance

AMQP Other

slide-48
SLIDE 48

R3 Kickoff Meeting

Inside an Execution Engine

14

EE type A instance context-agent

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-49
SLIDE 49

R3 Kickoff Meeting

CC instance

Inside an Execution Engine

14

EE type A instance context-agent

  • u-agent

supervisord C CMR

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-50
SLIDE 50

R3 Kickoff Meeting

CC instance

Inside an Execution Engine

14

EE type A instance context-agent

  • u-agent

supervisord C CMR EPU Management

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-51
SLIDE 51

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord C C CMR CMR EPU Management

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-52
SLIDE 52

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord C C CMR CMR Process Dispatcher EPU Management

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-53
SLIDE 53

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord C C M CMR CMR Process Dispatcher EPU Management

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-54
SLIDE 54

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord C C M CMR CMR Process Dispatcher EPU Management

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-55
SLIDE 55

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord C C M CMR CMR Process Dispatcher EPU Management Package Server

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-56
SLIDE 56

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord C C M CMR CMR CMK Process Dispatcher EPU Management Package Server process (adapter) 1

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-57
SLIDE 57

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord C C M CMR CMR CMK datastream subscription Process Dispatcher EPU Management Package Server process (adapter) 1

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-58
SLIDE 58

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord C C M CMR CMR CMK datastream subscription result Process Dispatcher EPU Management Package Server process (adapter) 1

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-59
SLIDE 59

R3 Kickoff Meeting

CC instance CC instance

Inside an Execution Engine

14

EE type A instance context-agent ee-agent

  • u-agent

supervisord supervisord supervisord Matlab script C C M CMR CMR CMK CMKO CMKO datastream subscription result Process Dispatcher EPU Management Package Server process (adapter) 1

AMQP Other

C – create M – monitor R – restart K – kill O – I/O

slide-60
SLIDE 60

R3 Kickoff Meeting

Adventures in Availability

10/1 8/12 15

A = MTBF MTBF+MTTR

slide-61
SLIDE 61

R3 Kickoff Meeting

Adventures in Availability

10/1 8/12 15

A = MTBF MTBF+MTTR

Mean time between failures

slide-62
SLIDE 62

R3 Kickoff Meeting

Adventures in Availability

10/1 8/12 15

A = MTBF MTBF+MTTR

Mean time between failures Mean time to repair

slide-63
SLIDE 63

R3 Kickoff Meeting

Adventures in Availability

  • Time to repair (TTR)

– Diagnosis – Time to scale (TTS)

  • PENDING (request)
  • STARTED (deployment)
  • RUNNING

(contextualization)

10/1 8/12 15

A = MTBF MTBF+MTTR

Mean time between failures Mean time to repair

slide-64
SLIDE 64

R3 Kickoff Meeting

Adventures in Availability

  • Time to repair (TTR)

– Diagnosis – Time to scale (TTS)

  • PENDING (request)
  • STARTED (deployment)
  • RUNNING

(contextualization)

10/1 8/12 15

A = MTBF MTBF+MTTR

Mean time between failures Mean time to repair TTS: preliminary results for 2,000 VMs provisioned on AWS EC2

slide-65
SLIDE 65

R3 Kickoff Meeting

R3 Scope

  • Process management

– Activation and validation – New execution site registration

  • Integration with National Infrastructure

– Framework for integration of academic cloud providers, TeraGrid and OSG – Integration with Microsoft cloud

16

slide-66
SLIDE 66

R3 Kickoff Meeting

R3 Activities

  • Refine/change scope to achieve a

complete and maintainable system

  • Decide on specific solutions for R3 scope

17

slide-67
SLIDE 67

R3 Kickoff Meeting

Questions?

18