An Empirical Study on Configuration Errors in Commercial and Open - - PowerPoint PPT Presentation

an empirical study on configuration errors in commercial
SMART_READER_LITE
LIVE PREVIEW

An Empirical Study on Configuration Errors in Commercial and Open - - PowerPoint PPT Presentation

An Empirical Study on Configuration Errors in Commercial and Open Source Systems Zuoning Yin, Xiao ma, Lakshmi N. Bairavasundaram, Jing Zheng, Yuanyuan Zhou Shankar Pasupathy University of California Netapp Inc. At San Diego Configuring


slide-1
SLIDE 1

An Empirical Study on Configuration Errors in Commercial and Open Source Systems

Zuoning Yin, Xiao ma, Jing Zheng, Yuanyuan Zhou

University of California At San Diego

Lakshmi N. Bairavasundaram, Shankar Pasupathy

Netapp Inc.

slide-2
SLIDE 2

Configuring computers is not easy

slide-3
SLIDE 3

Configuring server systems is much harder

slide-4
SLIDE 4

Hardware Failure 20% Bug 15% Customer Environment 25% User Knowledge 9%

Configuration 31%

Root Causes of Customer Reported Issues

slide-5
SLIDE 5

Who should take responsibility for configuration errors, users or developers?

slide-6
SLIDE 6
  • Developers do not think carefully when they

design configuration interface

  • Once an issues turns out to be a configuration

error, developers move on

slide-7
SLIDE 7
  • What kind of configuration errors do users make?
  • Which types of configuration parameters are more

error-prone?

  • Which user actions may cause configuration

errors?

  • ......

How to Reduce Configuration Errors?

We need to understand real-world configuration errors first

slide-8
SLIDE 8
  • Objectives
  • Understand the characteristics of real-world

configuration errors

  • Reveal their implications to developers
  • Challenges
  • Configuration errors are not recorded rigorously
  • Configuration errors are difficult to understand

Objectives and Challenges

slide-9
SLIDE 9

Methodology

  • Random sampling configuration errors
  • Choose resolved cases in recent 2 years
  • Ensure the sample size is big enough
  • Calculate statistical error
  • Categorizing errors with best effort
  • Cross-validation among co-authors
  • Help from developers
slide-10
SLIDE 10

Data Source

System ystem Number of Sampled Errors Commercial COMP-A 309 CentOS 60 Open Source MySQL 55 Open Source Apache 60 Open LDAP 62

Total Total 546

slide-11
SLIDE 11
  • We study only user-reported errors
  • Configuration errors may be resolved

in other means

  • We focus on server-side systems
  • Other types of systems may have

different characteristics

Limitation

slide-12
SLIDE 12

Type Reaction Cause Impact

slide-13
SLIDE 13

Type Reaction Cause Impact

slide-14
SLIDE 14

Software Configuration Errors

Parameter Errors Compatibility Errors Other Errors

Type Reaction Cause Impact

slide-15
SLIDE 15

COMP-A CentOS MySQL Apache Open LDAP

9.7% 8.3% 14.5% 11.7% 10.4% 11.2% 8.3% 18.3% 10% 79.0% 83.4% 85.5% 70.0% 79.6%

Parameter Errors Compatibility Errors Other Errors

Parameter errors dominate

Type Reaction Cause Impact

slide-16
SLIDE 16

COMP-A CentOS MySQL Apache Open LDAP

9.7% 8.3% 14.5% 11.7% 10.4% 11.2% 8.3% 18.3% 10% 79.0% 83.4% 85.5% 70.0% 79.6%

Parameter Errors Compatibility Errors Other Errors

Parameter errors dominate

Type Reaction Cause Impact

  • Systems should expose as few

configuration parameters as possible

  • Automatic configuration is preferred
slide-17
SLIDE 17

COMP-A CentOS MySQL Apache Open LDAP

9.7% 8.3% 14.5% 11.7% 10.4% 11.2% 8.3% 18.3% 10%

Other types of configuration errors are not negligible

Parameter Errors Compatibility Errors Other Errors

Type Reaction Cause Impact

slide-18
SLIDE 18

Parameter Errors

InitiatorName: iqn_DEV_domain

Lower-case

  • nly value

Error! COMP-A

AutoCommit = True

“True” value may affect performance

MySQL

Type Reaction Cause Impact

slide-19
SLIDE 19

Illegal Parameters Legal Parameters

InitiatorName: iqn_DEV_domain

Lower-case

  • nly value

Error! COMP-A

AutoCommit = True

“True” value may affect performance

MySQL

Parameter Errors

Type Reaction Cause Impact

slide-20
SLIDE 20

COMP-A CentOS MySQL Apache Open LDAP

46.9% 54.0% 51.1% 61.9% 46.3% 53.1% 46.0% 48.9% 38.1% 53.7%

Illegal Parameters Legal Parameters

Illegal and legal parameters have similar contribution to parameter errors

Type Reaction Cause Impact

slide-21
SLIDE 21

COMP-A CentOS MySQL Apache Open LDAP

46.9% 54.0% 51.1% 61.9% 46.3% 53.1% 46.0% 48.9% 38.1% 53.7%

Illegal Parameters Legal Parameters

Illegal and legal parameters have similar contribution to parameter errors

Type Reaction Cause Impact

  • Illegal parameters are relatively easy to

detect

  • About half of parameter errors involve

illegal parameters (“good” news!)

slide-22
SLIDE 22

Illegal Parameters Value Errors Format Errors Inconsistent Values Errors Other Value Errors

Type Reaction Cause Impact

slide-23
SLIDE 23

COMP-A CentOS MySQL Apache Open LDAP

27% 52% 65% 75% 80%

Inconsistent Value Errors Format Errors Other Value Errors

Inconsistent values dominate illegal parameter errors for most systems

Type Reaction Cause Impact

slide-24
SLIDE 24

COMP-A CentOS MySQL Apache Open LDAP

69% 26% 4% 6% 11%

Inconsistent Value Errors Format Errors Other Value Errors

All systems have format errors, in particular, Open LDAP has 69%

Type Reaction Cause Impact

slide-25
SLIDE 25

Inconsistent Value Errors

max_connections = 300 The value in PHP should not be bigger than the value in MySQL

PHP + MySQL MySQL configuration:

mysql.max_persistent = 400

PHP configuration:

A c r

  • s

s M u l t i p l e S y s t e m s !

log_output=”Table” ... log=query.log

Not consistent. They should be:

MySQL

log_output=”Table”

  • r

log_output=”File” log=query.log Type Reaction Cause Impact

slide-26
SLIDE 26

Inconsistent Value Errors

max_connections = 300 The value in PHP should not be bigger than the value in MySQL

PHP + MySQL MySQL configuration:

mysql.max_persistent = 400

PHP configuration:

A c r

  • s

s M u l t i p l e S y s t e m s !

log_output=”Table” ... log=query.log

Not consistent. They should be:

MySQL

log_output=”Table”

  • r

log_output=”File” log=query.log Type Reaction Cause Impact

  • Value consistency constraints are error-

prone; they account for most illegal parameter errors

  • Consistency constraints could be across

multiple systems, which is more difficult for users to follow

slide-27
SLIDE 27

Format Errors

InitiatorName: iqn_DEV_domain Lower-case

  • nly value

Error COMP-A

include schema/ppolicy.schema ......

  • verlay ppolicy

Missing OpenLDAP

extension = mysql.so ...... extension = recode.so

recode.so must be

put before mysql.so Apache

Type Reaction Cause Impact

slide-28
SLIDE 28

Format Errors

InitiatorName: iqn_DEV_domain Lower-case

  • nly value

Error COMP-A

include schema/ppolicy.schema ......

  • verlay ppolicy

Missing OpenLDAP

extension = mysql.so ...... extension = recode.so

recode.so must be

put before mysql.so Apache

Type Reaction Cause Impact

  • Format constraints are difficult to

follow, especially non-intuitive ones, e.g., upper-case vs. lower-case or

  • rdering
  • Format errors are relatively easier to

detect compared to value errors

slide-29
SLIDE 29

Type Reaction Cause Impact

slide-30
SLIDE 30

System Reaction

Pinpoint Reaction Indeterminate Reaction Quiet Failure

Type Reaction Cause Impact

slide-31
SLIDE 31

Good Reaction #1

  • Symptom: the user cannot create new

directories in “/vol/vol1/data/”

  • Reaction: the system prints this message:

[COMP-A – dir.size.max:warning]: Directory /vol/vol1/data/ reached the maxdirsize Limit. Reduce the number

  • f files or use the vol options command

to increase this limit.

Type Reaction Cause Impact

slide-32
SLIDE 32

Good Reaction #2

+if (opt_logname + && !(log_output_options & LOG_FILE) + && !(log_output_options & LOG_NONE)) + sql_print_warning("Although a path was specified + for the --log option, log tables are used. To enable + logging to files use the --log-output option.");

log_output=”Table” ... log=query.log

MySQL

Patch:

Type Reaction Cause Impact

slide-33
SLIDE 33

log_output=”Table” ... log=query.log

MySQL

Good Reaction #2

+if (opt_logname + && !(log_output_options & LOG_FILE) + && !(log_output_options & LOG_NONE)) + sql_print_warning("Although a path was specified + for the --log option, log tables are used. To enable + logging to files use the --log-output option.");

MySQL

Patch:

Type Reaction Cause Impact

slide-34
SLIDE 34

COMP-A CentOS MySQL Apache Open LDAP

17.7% 13.3% 21.8% 6.7% 11.0% 22.6% 26.7% 23.6% 26.7% 23.9% 45.2% 46.7% 47.3% 54.9% 49.5% 14.5% 13.3% 7.2% 11.7% 15.5%

Pinpoint Reaction Indeterminate Reaction Quiet Failure Unknown

Type Reaction Cause Impact

slide-35
SLIDE 35

COMP-A CentOS MySQL Apache Open LDAP

14.5% 13.3% 7.2% 11.7% 15.5%

Pinpoint Reaction Indeterminate Reaction Quiet Failure Unknown

Today’s systems do not react to configuration errors in a user-friendly way

Type Reaction Cause Impact

slide-36
SLIDE 36

COMP-A CentOS MySQL Apache Open LDAP

22.6% 26.7% 23.6% 26.7% 23.9%

Pinpoint Reaction Indeterminate Reaction Quiet Failure Unknown

Big portion of quiet failures makes diagnosis difficult

Type Reaction Cause Impact

slide-37
SLIDE 37

COMP-A CentOS MySQL Apache Open LDAP

22.6% 26.7% 23.6% 26.7% 23.9%

Pinpoint Reaction Indeterminate Reaction Quiet Failure Unknown

Big portion of quiet failures makes diagnosis difficult

Type Reaction Cause Impact

  • Systems should avoid “bug-like”

symptoms when configuration errors happen, such as quite failures, crash or hang

slide-38
SLIDE 38

Do systems react better to errors with “illegal” parameters?

Type Reaction Cause Impact

slide-39
SLIDE 39

7.5% 15.0% 22.5% 30.0% COMP-A CentOS MySQL Apache OpenLDAP Percentage of Pinpoint Reaction Overall Illegal Parameter

Type Reaction Cause Impact

slide-40
SLIDE 40

7.5% 15.0% 22.5% 30.0% COMP-A CentOS MySQL Apache OpenLDAP

26.9% 21.7% 4.3% 25.0% 18.9% 14.5% 13.3% 7.2% 11.7% 15.5%

Percentage of Pinpoint Reaction Overall Illegal Parameter

Illegal parameter errors are handled better, but not good enough

Type Reaction Cause Impact

slide-41
SLIDE 41

How does message quality affect diagnosis time?

Type Reaction Cause Impact

slide-42
SLIDE 42

Message Quality

Explicit Message Ambiguous Message No Message

Type Reaction Cause Impact

slide-43
SLIDE 43

3.75 7.5 11.25 15 COMP-A CentOS MySQL Apache Open LDAP Median of Diangosis Time (normalized) Explicit Message Ambigious Message No Message

Explicit messages significantly reduce diagnosis time

Type Reaction Cause Impact

slide-44
SLIDE 44

Messages are harmful if they are misleading

3.75 7.5 11.25 15 COMP-A CentOS MySQL Apache Open LDAP Median of Diangosis Time (normalized) Explicit Message Ambigious Message No Message

Type Reaction Cause Impact

slide-45
SLIDE 45

Type Reaction Cause Impact

slide-46
SLIDE 46

When does a configuration error happen?

Used-to-work First-time Use

Type Reaction Cause Impact

slide-47
SLIDE 47

COMP-A CentOS MySQL Apache Open LDAP

30.0% 12.7% 16.7% 14.2% 16.7% 32.4% 91.9% 66.7% 81.6% 66.6% 53.4%

First-time Use Used-to-work Unknown

Type Reaction Cause Impact

slide-48
SLIDE 48

Complex systems are more likely to have configuration errors in the middle of lifetime

COMP-A CentOS MySQL Apache Open LDAP

16.7% 32.4%

First-time Use Used-to-work Unknown

Type Reaction Cause Impact

slide-49
SLIDE 49

Type Reaction Cause Impact

slide-50
SLIDE 50

How do configuration errors affect system availability?

Partially Unavailable Fully Unavailable Performance Degradation

Type Reaction Cause Impact

slide-51
SLIDE 51

COMP-A CentOS MySQL Apache Open LDAP

6.4% 20.0% 6.8% 9.7% 25% 27.3% 20% 13.3% 83.9% 73.3% 52.7% 78.3% 79.9%

Partially Unavailable Fully Unavailable Performance Degredation

Type Reaction Cause Impact

slide-52
SLIDE 52

COMP-A CentOS MySQL Apache Open LDAP

6.4% 20.0% 6.8% 9.7% 25% 27.3% 20% 13.3%

Partially Unavailable Fully Unavailable Performance Degredation

Configuration errors can cause system full unavailability and performance degradation

Type Reaction Cause Impact

slide-53
SLIDE 53

COMP-A CentOS MySQL Apache Open LDAP

6.4% 20.0% 6.8%

Partially Unavailable Fully Unavailable Performance Degredation

Performance configuration is especially difficult

Type Reaction Cause Impact

slide-54
SLIDE 54

COMP-A CentOS MySQL Apache Open LDAP

6.4% 20.0% 6.8%

Partially Unavailable Fully Unavailable Performance Degredation

Performance configuration is especially difficult

Type Reaction Cause Impact

  • Performance parameters are more

difficult to understand and set

  • Diagnosing performance configuration

issues are troublesome

slide-55
SLIDE 55

Other Characteristics

  • Location and domain of parameter errors
  • Number of involved/fixed parameters
  • Complete categorization of illegal parameters

and their distribution

  • Complete analysis of causes of errors
  • Details about compatibility errors and

component errors

  • Analysis across multiple directions
  • More examples
slide-56
SLIDE 56

Related Work

  • Prevention (SmartFrog, Kardo, etc.)
  • Detection (PeerPressure, Strider, etc.)
  • Diagnosis (Chronus, AutoBash, ConfAid, etc.)
  • Tolerance (AutoBash, Undo, etc.)
  • Validation (Barricade, etc.)
  • Injection/testing (ConfErr, etc.)
slide-57
SLIDE 57

Summary

  • Think from users’ point of view
  • Users do not know the code
  • Keep things simple
  • Expose fewer “knobs”
  • Use intuitive and simple rules
  • Validate configuration proactively
  • React decently when configuration errors happen
  • Provide good feedback to users
  • Record configuration errors
  • Learn from mistakes
slide-58
SLIDE 58

Thank you!