CS615 - Aspects of System Administration Monitoring, Configuration - - PowerPoint PPT Presentation

cs615 aspects of system administration monitoring
SMART_READER_LITE
LIVE PREVIEW

CS615 - Aspects of System Administration Monitoring, Configuration - - PowerPoint PPT Presentation

CS615 - Aspects of System Administration Slide 1 CS615 - Aspects of System Administration Monitoring, Configuration Management Department of Computer Science Stevens Institute of Technology Jan Schaumann jschauma@stevens-tech.edu


slide-1
SLIDE 1

CS615 - Aspects of System Administration Slide 1

CS615 - Aspects of System Administration Monitoring, Configuration Management

Department of Computer Science Stevens Institute of Technology Jan Schaumann jschauma@stevens-tech.edu https://stevens.netmeister.org/615/

Monitoring, Configuration Management April 13, 2020

slide-2
SLIDE 2

CS615 - Aspects of System Administration Slide 2

Hooray! 5 minute break

Monitoring, Configuration Management April 13, 2020

slide-3
SLIDE 3

CS615 - Aspects of System Administration Slide 3

Problem Report

“Something’s wrong.”

Monitoring, Configuration Management April 13, 2020

slide-4
SLIDE 4

CS615 - Aspects of System Administration Slide 4

Now what?

Monitoring, Configuration Management April 13, 2020

slide-5
SLIDE 5

CS615 - Aspects of System Administration Slide 5

Problem Report

“The system feels slow.” “I can’t log in.” “My mail was not delivered.” “The site is down.”

Monitoring, Configuration Management April 13, 2020

slide-6
SLIDE 6

CS615 - Aspects of System Administration Slide 6

Now what?

Monitoring, Configuration Management April 13, 2020

slide-7
SLIDE 7

CS615 - Aspects of System Administration Slide 7

To the logs!

Monitoring, Configuration Management April 13, 2020

slide-8
SLIDE 8

CS615 - Aspects of System Administration Slide 8

Answers

“The system feels slow.” up 1318 days, 13:46, 1 user, load averages: 993.81, 272.91, 1012.18 “I can’t log in.” Apr 6 09:25:56 <auth.info>hostname sshd[1624]: Failed password for jdoe from 115.239.231.100 port 1047 ssh2 “My mail was not delivered.” Apr 11 16:15:40 panix postfix/smtpd[7566]: connect from unknown[122.3.68.122] Apr 11 16:15:41 panix postfix/smtpd[7566]: NOQUEUE: reject_warning: RCPT from unknown[122.3.68.122]: 450 4.7.1 Client host rejected: cannot find your hostname, [122.3.68.122]; from=<McneilRomany28@pldt.net> to=<jschauma@stevens.edu> proto=ESMTP helo=<122.3.68.122.pldt.net>

Monitoring, Configuration Management April 13, 2020

slide-9
SLIDE 9

CS615 - Aspects of System Administration Slide 9

Answers

“The site is down.” 94.242.252.41 - "" [11/Apr/2016:19:18:47 -0400] "GET /secret/ HTTP/1.1" 403 524 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0"

Monitoring, Configuration Management April 13, 2020

slide-10
SLIDE 10

CS615 - Aspects of System Administration Slide 10

Answers

“The site is down.” 94.242.252.41 - "" [11/Apr/2016:19:18:47 -0400] "GET /secret/ HTTP/1.1" 403 524 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0"

Monitoring, Configuration Management April 13, 2020

slide-11
SLIDE 11

CS615 - Aspects of System Administration Slide 11

Events

“Something’s wrong.” is just an unexpected or undesirable event.

Monitoring, Configuration Management April 13, 2020

slide-12
SLIDE 12

CS615 - Aspects of System Administration Slide 12

Events

“Something’s wrong.” is just an unexpected or undesirable event. Events happen all the time.

Monitoring, Configuration Management April 13, 2020

slide-13
SLIDE 13

CS615 - Aspects of System Administration Slide 13

Events

“Something’s wrong.” is just an unexpected or undesirable event. Events happen all the time. Being able to identify relevant events allows you to diagnose, predict and even prevent undesirable events.

Monitoring, Configuration Management April 13, 2020

slide-14
SLIDE 14

CS615 - Aspects of System Administration Slide 14

Events

In order to be able to identify an event as unexpected, you have to have expected events.

Monitoring, Configuration Management April 13, 2020

slide-15
SLIDE 15

CS615 - Aspects of System Administration Slide 15

Expected Events

Know your applications.

Monitoring, Configuration Management April 13, 2020

slide-16
SLIDE 16

CS615 - Aspects of System Administration Slide 16

Expected Events

Know your applications. Know your users.

Monitoring, Configuration Management April 13, 2020

slide-17
SLIDE 17

CS615 - Aspects of System Administration Slide 17

Expected Events

Know your applications. Know your users. Know your traffic patterns.

Monitoring, Configuration Management April 13, 2020

slide-18
SLIDE 18

CS615 - Aspects of System Administration Slide 18

Expected Events

Know your applications. Know your users. Know your traffic patterns. Know your systems.

Monitoring, Configuration Management April 13, 2020

slide-19
SLIDE 19

CS615 - Aspects of System Administration Slide 19

Events and Metrics

$ dict event event n 1: something that happens at a given place and time 2: a special set of circumstances; "in that event, the first possibility is excluded"; "it may rain in which case the picnic will be canceled" [syn: {event}, {case}] $ dict metric metric 3: a system of related measures that facilitates the quantification of some particular characteristic [syn: {system of measurement}, {metric}]

Monitoring, Configuration Management April 13, 2020

slide-20
SLIDE 20

CS615 - Aspects of System Administration Slide 20

Events and Metrics

Monitoring, Configuration Management April 13, 2020

slide-21
SLIDE 21

CS615 - Aspects of System Administration Slide 21

Events and Metrics

Events may occur rarely / frequently / constantly can be collected in logs may be comprised of other events may be: something happened may be: nothing (new) happened Metrics: correlation of related events may help identify outliers may trigger events may help make (automated or interactive) decisions

Monitoring, Configuration Management April 13, 2020

slide-22
SLIDE 22

CS615 - Aspects of System Administration Slide 22

Collecting Data

Counters: easy, numeric data tracking individual events. Example: HTTP status codes Timers: easy, numeric data tracking event duration. Example: Time to send all data for a successful HTTP request. Thresholds: easy, numeric trigger for events; may itself trigger events or

  • metrics. Example: more than N HTTP hits in X seconds yield 404.

Monitoring, Configuration Management April 13, 2020

slide-23
SLIDE 23

CS615 - Aspects of System Administration Slide 23

Know Your Systems

Profile your application: execution time (for example: time(1)) data sources and destination affect execution strace(1) and friends for more detailed analysis Understand your system performance: CPU load, memory (for example: top(1), vmstat(1)) disk I/O (for example: iostat(1)) user activity (for example: ac(1), lsof(8), sa(8))

Monitoring, Configuration Management April 13, 2020

slide-24
SLIDE 24

CS615 - Aspects of System Administration Slide 24

Know Your Systems

Network statistics: ports and applications (for example: lsof(8), netstat(8)) packets in and out connection origin NetFlow etc.

Monitoring, Configuration Management April 13, 2020

slide-25
SLIDE 25

CS615 - Aspects of System Administration Slide 25

Context

Context lets you find relevant events in your haystack of metrics.

Monitoring, Configuration Management April 13, 2020

slide-26
SLIDE 26

CS615 - Aspects of System Administration Slide 26

No context.

CPU load - 12 hours

Monitoring, Configuration Management April 13, 2020

slide-27
SLIDE 27

CS615 - Aspects of System Administration Slide 27

No context.

Disk I/O - 12 hours

Monitoring, Configuration Management April 13, 2020

slide-28
SLIDE 28

CS615 - Aspects of System Administration Slide 28

No context.

Load Average - 12 hours

Monitoring, Configuration Management April 13, 2020

slide-29
SLIDE 29

CS615 - Aspects of System Administration Slide 29

No context.

Memory - 12 hours

Monitoring, Configuration Management April 13, 2020

slide-30
SLIDE 30

CS615 - Aspects of System Administration Slide 30

Some context.

12 hours

Monitoring, Configuration Management April 13, 2020

slide-31
SLIDE 31

CS615 - Aspects of System Administration Slide 31

With context.

7 days

Monitoring, Configuration Management April 13, 2020

slide-32
SLIDE 32

CS615 - Aspects of System Administration Slide 32

Know your systems.

30 days

Monitoring, Configuration Management April 13, 2020

slide-33
SLIDE 33

CS615 - Aspects of System Administration Slide 33

Turn events into metrics.

Log it! Export counters/timers from within your application. Process logs and produce counters/timers: awk ’{print $9}’ /var/log/httpd/access.log | sort | uniq -c create a baseline Graph it. https://is.gd/tDCmQI

Monitoring, Configuration Management April 13, 2020

slide-34
SLIDE 34

CS615 - Aspects of System Administration Slide 34

Monitoring/graphing

SNMP based: Cacti: http://www.cacti.net/ MRTG: http://oss.oetiker.ch/mrtg/ Observium: http://demo.observium.org/ ... Other / complementary: Ganglia: http://ganglia.info/ Munin: http://munin-monitoring.org/ Nagios: http://nagioscore.demos.nagios.com/ Graphite: http://graphite.wikidot.com/

Monitoring, Configuration Management April 13, 2020

slide-35
SLIDE 35

CS615 - Aspects of System Administration Slide 35

Context doesn’t explain everything...

...but it helps you look into what to investigate.

Monitoring, Configuration Management April 13, 2020

slide-36
SLIDE 36

CS615 - Aspects of System Administration Slide 36

Context doesn’t explain everything...

...but it helps you look into what to investigate.

Monitoring, Configuration Management April 13, 2020

slide-37
SLIDE 37

CS615 - Aspects of System Administration Slide 37

To the cloud!

Theres a service for that. In the cloud. Consider: support / convenience vs. do-it-yourself integration with your other services data confidentiality data lock-in (esp. when trending data over years)

Monitoring, Configuration Management April 13, 2020

slide-38
SLIDE 38

CS615 - Aspects of System Administration Slide 38

Monitoring Pitfalls

Increasing the size of your haystack does not always help in finding the needle.

Monitoring, Configuration Management April 13, 2020

slide-39
SLIDE 39

CS615 - Aspects of System Administration Slide 39

Monitoring Pitfalls

Increasing the size of your haystack does not always help in finding the needle. Email is not a scalable network monitoring solution.

Monitoring, Configuration Management April 13, 2020

slide-40
SLIDE 40

CS615 - Aspects of System Administration Slide 40

Monitoring Pitfalls

Increasing the size of your haystack does not always help in finding the needle. Email is not a scalable network monitoring solution. Absence of a signal can itself be a signal.

Monitoring, Configuration Management April 13, 2020

slide-41
SLIDE 41

CS615 - Aspects of System Administration Slide 41

Monitoring Pitfalls

Increasing the size of your haystack does not always help in finding the needle. Email is not a scalable network monitoring solution. Absence of a signal can itself be a signal. Most of the value of your metrics only becomes evident over time.

Monitoring, Configuration Management April 13, 2020

slide-42
SLIDE 42

CS615 - Aspects of System Administration Slide 42

Monitoring Pitfalls

Increasing the size of your haystack does not always help in finding the needle. Email is not a scalable network monitoring solution. Absence of a signal can itself be a signal. Most of the value of your metrics only becomes evident over time. This list is incomplete.

Monitoring, Configuration Management April 13, 2020

slide-43
SLIDE 43

CS615 - Aspects of System Administration Slide 43

Hooray! 5 minute break

Monitoring, Configuration Management April 13, 2020

slide-44
SLIDE 44

CS615 - Aspects of System Administration Slide 44

Team Missions

Red Team: https://is.gd/LfrKPi Blue Team: https://is.gd/kkXMQ2

Monitoring, Configuration Management April 13, 2020

slide-45
SLIDE 45

CS615 - Aspects of System Administration Slide 45

Entropy is the Enemy

The entropy of an isolated system never decreases.

Monitoring, Configuration Management April 13, 2020

slide-46
SLIDE 46

CS615 - Aspects of System Administration Slide 46

Entropy is the Enemy

A static system is a useless system. A useful system is being used. data is processed; files are created, modified, removed software is added, upgraded, removed systems are created, copied, decommissioned instances / containers are even more short-lived, coming into existence and disappearing again as needed

Monitoring, Configuration Management April 13, 2020

slide-47
SLIDE 47

CS615 - Aspects of System Administration Slide 47

Single Systems are Fragile

Individual systems created and configured by hand are fragile. Our processes need to be repeatable, automated, reliable. Recall previous lectures: OS installation package management multi-user basics automation recovery / restores

Monitoring, Configuration Management April 13, 2020

slide-48
SLIDE 48

CS615 - Aspects of System Administration Slide 48

Reproducable

“Never trust a computer you can’t throw out the window.” – Woz

Monitoring, Configuration Management April 13, 2020

slide-49
SLIDE 49

CS615 - Aspects of System Administration Slide 49

Evolution of Configuration Management

“I set up a server over here to do X. Replicate that setup on all the

  • thers.”

Monitoring, Configuration Management April 13, 2020

slide-50
SLIDE 50

CS615 - Aspects of System Administration Slide 50

Evolution of Configuration Management

“I set up a server over here to do X. Replicate that setup on all the

  • thers.”

“I know how to do this! Watch me!” $ ssh root@server1 # rsync -e ssh -avz / server2:/ “/etc? What’s that?”

Monitoring, Configuration Management April 13, 2020

slide-51
SLIDE 51

CS615 - Aspects of System Administration Slide 51

Evolution of Configuration Management

shareable content unshareable content static data /usr /boot /opt /etc variable data /home /tmp /var/mail /var/run

Monitoring, Configuration Management April 13, 2020

slide-52
SLIDE 52

CS615 - Aspects of System Administration Slide 52

Every Sysadmin ever...

  • 1. scp(1)
  • 2. rsync(1)
  • 3. some sort of parallel ssh(1) of the above
  • 4. switch to pull
  • 5. add mutual authentication
  • 6. but effectively ignore mismatches, because doing things the right

way is difficult and inconvenient

  • 7. switch to push with remote dæmon
  • 8. write an inventory database
  • 9. deploy a well-known CM system

Finally: find something it can’t do, goto 1.

Monitoring, Configuration Management April 13, 2020

slide-53
SLIDE 53

CS615 - Aspects of System Administration Slide 53

Base configuration vs. service definition

Your servers have unique, yet predictable properties. E.g.: network configuration critical services: DNS, NTP , Syslog minimum OS / software version user management common service configuration (e.g. sshd(8)) ...

Monitoring, Configuration Management April 13, 2020

slide-54
SLIDE 54

CS615 - Aspects of System Administration Slide 54

Base configuration vs. service definition

Different sets of servers have shared properties. For example, consider an HTTP server: minimum server software appropriate TLS specification shared TLS certificate and key database configuration static content (HTML / JS / CSS files) ...

Monitoring, Configuration Management April 13, 2020

slide-55
SLIDE 55

CS615 - Aspects of System Administration Slide 55

Pets vs. Cattle

“Pets”: unique, cheerful hostnames single systems grown over time, lovingly configured by hand when sick, everybody is very concerned slowly nursed back to life “Cattle”: predictable, boring hostnames almost identical to all others centrally managed, easy to recreate when sick, they get taken out back and shot quickly replaced by another

Monitoring, Configuration Management April 13, 2020

slide-56
SLIDE 56

CS615 - Aspects of System Administration Slide 56

Service definitions

class syslog { include cron include logrotate package { ’syslogng’ : ensure => latest , require => Service[’syslogng’]; } service { ’syslogng’ : ensure => running , enable => true; } file { ’/etc/syslogng/syslogng.conf’: ensure => file, source => ’puppet:///syslog/syslogng.conf’, mode => ’0644’,

  • wner

=> ’root’, group => ’root’, require => Package[’syslog-ng’], notify => Service[’syslog-ng’]; ’/etc/logrotate.d/syslog-ng’: ensure => file, source => ’puppet:///syslog/logrotate-syslogng’, mode => ’0644’,

  • wner

=> ’root’, group => ’root’, require => Package[’logrotate’]; } }

Monitoring, Configuration Management April 13, 2020

slide-57
SLIDE 57

CS615 - Aspects of System Administration Slide 57

Service definitions

package "ldap-utils" do action :upgrade end template "/etc/ldap.conf" do source "ldap.conf.erb" mode 00644

  • wner

"root" group "root" end %w{ account auth password session }.each do |pam| cookbook_file "/etc/pam.d/common-#{pam}" do source "common-#{pam}" mode 00644

  • wner

"root" group "root" notifies :restart, resources(:service => "ssh"), :delayed end end

Monitoring, Configuration Management April 13, 2020

slide-58
SLIDE 58

CS615 - Aspects of System Administration Slide 58

Service definitions

bundle agent sshd(parameter) { files: "/tmp/sshd_config.tmpl" perms => mog("0600","root","root"), copy_from => secure_cp("/templates/etc/ssh/sshd_config", "cf-master.example.com"); "/etc/ssh/sshd_config" perms => mog("0600","root","root"), create => true, edit_line => expand_template("/tmp/sshd_config.tmpl"), classes => if_repaired("restart_sshd"); commands: restart_sshd:: "/etc/rc.d/sshd restart" }

Monitoring, Configuration Management April 13, 2020

slide-59
SLIDE 59

CS615 - Aspects of System Administration Slide 59

Team Missions

Black Team: https://is.gd/zQFtGJ and https://is.gd/b1fN36 Green Team: https://is.gd/mWKosu and https://is.gd/ejJT1T

Monitoring, Configuration Management April 13, 2020

slide-60
SLIDE 60

CS615 - Aspects of System Administration Slide 60

CM Requirements

software installation

Monitoring, Configuration Management April 13, 2020

slide-61
SLIDE 61

CS615 - Aspects of System Administration Slide 61

CM Requirements

software installation service management / supervising

Monitoring, Configuration Management April 13, 2020

slide-62
SLIDE 62

CS615 - Aspects of System Administration Slide 62

CM Requirements

software installation service management / supervising file permissions / ownership

Monitoring, Configuration Management April 13, 2020

slide-63
SLIDE 63

CS615 - Aspects of System Administration Slide 63

CM Requirements

software installation service management / supervising file permissions / ownership static files

Monitoring, Configuration Management April 13, 2020

slide-64
SLIDE 64

CS615 - Aspects of System Administration Slide 64

CM Requirements

software installation service management / supervising file permissions / ownership static files host-specific data

Monitoring, Configuration Management April 13, 2020

slide-65
SLIDE 65

CS615 - Aspects of System Administration Slide 65

CM Requirements

software installation service management / supervising file permissions / ownership static files host-specific data command-execution

Monitoring, Configuration Management April 13, 2020

slide-66
SLIDE 66

CS615 - Aspects of System Administration Slide 66

CM Requirements

software installation service management / supervising file permissions / ownership static files host-specific data command-execution data collection

Monitoring, Configuration Management April 13, 2020

slide-67
SLIDE 67

CS615 - Aspects of System Administration Slide 67

One more layer of abstraction...

The objective of a CM system is not to make changes on a system. The objective of a CM system is to assert state.

Monitoring, Configuration Management April 13, 2020

slide-68
SLIDE 68

CS615 - Aspects of System Administration Slide 68

CM States

Monitoring, Configuration Management April 13, 2020

slide-69
SLIDE 69

CS615 - Aspects of System Administration Slide 69

Circles around things

Group your resources into sets. functional groupings services users hosts

Monitoring, Configuration Management April 13, 2020

slide-70
SLIDE 70

CS615 - Aspects of System Administration Slide 70

Circles around things

Monitoring, Configuration Management April 13, 2020

slide-71
SLIDE 71

CS615 - Aspects of System Administration Slide 71

Circles around things

Monitoring, Configuration Management April 13, 2020

slide-72
SLIDE 72

CS615 - Aspects of System Administration Slide 72

Circles around things

Monitoring, Configuration Management April 13, 2020

slide-73
SLIDE 73

CS615 - Aspects of System Administration Slide 73

CMs configure complex systems

CM systems are complex themselves. CM systems are inherently trusted. CM systems can break everything. To the degree that you can’t unbreak things afterwards. Consider: staged rollout of change sets automated error detection and rollback self-healing properties authentication and privilege

Monitoring, Configuration Management April 13, 2020

slide-74
SLIDE 74

CS615 - Aspects of System Administration Slide 74

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1|

Monitoring, Configuration Management April 13, 2020

slide-75
SLIDE 75

CS615 - Aspects of System Administration Slide 75

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1| $ rm resolv.conf

Monitoring, Configuration Management April 13, 2020

slide-76
SLIDE 76

CS615 - Aspects of System Administration Slide 76

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1| $ rm resolv.conf # idempotent $ echo "nameserver 192.168.0.1" > resolv.conf

Monitoring, Configuration Management April 13, 2020

slide-77
SLIDE 77

CS615 - Aspects of System Administration Slide 77

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1| $ rm resolv.conf # idempotent $ echo "nameserver 192.168.0.1" > resolv.conf # idempotent $ echo "nameserver 192.168.0.2" >> resolv.conf

Monitoring, Configuration Management April 13, 2020

slide-78
SLIDE 78

CS615 - Aspects of System Administration Slide 78

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1| $ rm resolv.conf # idempotent $ echo "nameserver 192.168.0.1" > resolv.conf # idempotent $ echo "nameserver 192.168.0.2" >> resolv.conf # not idempotent $ chown root:wheel resolv.conf

Monitoring, Configuration Management April 13, 2020

slide-79
SLIDE 79

CS615 - Aspects of System Administration Slide 79

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1| $ rm resolv.conf # idempotent $ echo "nameserver 192.168.0.1" > resolv.conf # idempotent $ echo "nameserver 192.168.0.2" >> resolv.conf # not idempotent $ chown root:wheel resolv.conf # idempotent $ chmod 0644 resolv.conf

Monitoring, Configuration Management April 13, 2020

slide-80
SLIDE 80

CS615 - Aspects of System Administration Slide 80

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1| $ rm resolv.conf # idempotent $ echo "nameserver 192.168.0.1" > resolv.conf # idempotent $ echo "nameserver 192.168.0.2" >> resolv.conf # not idempotent $ chown root:wheel resolv.conf # idempotent $ chmod 0644 resolv.conf # idempotent $ yum install frozzle

Monitoring, Configuration Management April 13, 2020

slide-81
SLIDE 81

CS615 - Aspects of System Administration Slide 81

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1| $ rm resolv.conf # idempotent $ echo "nameserver 192.168.0.1" > resolv.conf # idempotent $ echo "nameserver 192.168.0.2" >> resolv.conf # not idempotent $ chown root:wheel resolv.conf # idempotent $ chmod 0644 resolv.conf # idempotent $ yum install frozzle # not idempotent $ yum install frozzle-1.2.3

Monitoring, Configuration Management April 13, 2020

slide-82
SLIDE 82

CS615 - Aspects of System Administration Slide 82

Idempotence

CM systems assert state. For this, all operations must be idempotent. f(f(x)) ≡ f(x) || − 1|| ≡ | − 1| $ rm resolv.conf # idempotent $ echo "nameserver 192.168.0.1" > resolv.conf # idempotent $ echo "nameserver 192.168.0.2" >> resolv.conf # not idempotent $ chown root:wheel resolv.conf # idempotent $ chmod 0644 resolv.conf # idempotent $ yum install frozzle # not idempotent $ yum install frozzle-1.2.3 # "it depends"

Monitoring, Configuration Management April 13, 2020

slide-83
SLIDE 83

CS615 - Aspects of System Administration Slide 83

Convergence and Eventual Consistency

Note: while idempotence enables self-healing and may allow you to not keep state, it does not guarantee efficiency! CM systems should ensure changes are:

  • 1. idempotent (well, that part’s on you)
  • 2. only applied if needed
  • 3. eventually consistent

This often requires complexity (oh no!), coordination with and awareness

  • f other systems. Service Orchestration has developed as a separate,

related discipline to help address this.

Monitoring, Configuration Management April 13, 2020

slide-84
SLIDE 84

CS615 - Aspects of System Administration Slide 84

Distributed Systems

CM systems are distributed systems. As such, they are subject to the CAP Theorem: Consistency: all systems managed by the CM are consistent within their respective service definition. Availability: the services managed by the CM are kept available, even if no further updates or change sets can be retrieved. Partition tolerance: the CM system can (continue to) operate despite interruptions between its components; e.g. intermediate (coordinated) changes are not required.

Monitoring, Configuration Management April 13, 2020

slide-85
SLIDE 85

CS615 - Aspects of System Administration Slide 85

Configuration Management Overlap

Your configuration management system provides or enables: a remote command execution agent a reporting agent a reporting infrastructure role-based actions and visibility The same principles enabling reliable configuration management can thus also be used for information security related tasks: detection of deviation of known state integrity checks and intrusion detection patch management automated quarantine

Monitoring, Configuration Management April 13, 2020

slide-86
SLIDE 86

CS615 - Aspects of System Administration Slide 86

Configuration Management Overlap

Configuration Management overlaps with numerous other areas: backup (expendable systems, data classiciation, ...) software deployment (base OS, application packages, ..) monitoring (central reporting and ad-hoc data collection, ...) revision control and audit logs (CM changes are code changes!) compliance enforcement (e.g., baseline configurations) ...

Monitoring, Configuration Management April 13, 2020

slide-87
SLIDE 87

CS615 - Aspects of System Administration Slide 87

Overlap with other systems

Monitoring, Configuration Management April 13, 2020

slide-88
SLIDE 88

CS615 - Aspects of System Administration Slide 88

More than just servers...

Configuration Management is not just for servers. You also need to manage configurations for: desktops and item mobile clients network equipment load balancers containers ...

Monitoring, Configuration Management April 13, 2020

slide-89
SLIDE 89

CS615 - Aspects of System Administration Slide 89

Configuration Management Impact

Think scale!

Monitoring, Configuration Management April 13, 2020

slide-90
SLIDE 90

CS615 - Aspects of System Administration Slide 90

Reading

Additional topics to research: Service Orchestration Continuous Deployment / Continuous Integration Infrastructure as Code Information Technology Infrastructure Library (ITIL) Relevant links: http://www.infrastructures.org/bootstrap/recovery.shtml https://is.gd/paZ7qu https://www.engineyard.com/blog/pets-vs-cattle http://markburgess.org/blog cap.html http://markburgess.org/blog cap2.html https://aws.amazon.com/opsworks/chefautomate/ https://puppet.com/product/managed-technology/aws

Monitoring, Configuration Management April 13, 2020