The Beauty and the Beast Vulnerabilities in Red Hats Packages - - PowerPoint PPT Presentation

the beauty and the beast
SMART_READER_LITE
LIVE PREVIEW

The Beauty and the Beast Vulnerabilities in Red Hats Packages - - PowerPoint PPT Presentation

The Beauty and the Beast Vulnerabilities in Red Hats Packages Stephan Neuhaus <Stephan.Neuhaus@disi.unitn.it> Thomas Zimmermann <tzimmer@microsoft.com> Vulnerabilities are important because fixing them costs a lot of money (2005


slide-1
SLIDE 1

The Beauty and the Beast

Vulnerabilities in Red Hat’s Packages Stephan Neuhaus <Stephan.Neuhaus@disi.unitn.it> Thomas Zimmermann <tzimmer@microsoft.com>

slide-2
SLIDE 2

Vulnerabilities are important because fixing them costs a lot of money (2005 FBI study: 67 Bn $). There are 3241 packages (or were, by August 2008) ofgered by Red Hat. (There are certainly more being ofgered for Red Hat!)

slide-3
SLIDE 3

Vulnerabilities are important because fixing them costs a lot of money (2005 FBI study: 67 Bn $). There are 3241 packages (or were, by August 2008) ofgered by Red Hat. (There are certainly more being ofgered for Red Hat!)

slide-4
SLIDE 4

Vulnerabilities are important because fixing them costs a lot of money (2005 FBI study: 67 Bn $). There are 3241 packages (or were, by August 2008) ofgered by Red Hat. (There are certainly more being ofgered for Red Hat!)

slide-5
SLIDE 5

Explain colours: white = no vulnerabilities, blue -> red: progressively more

slide-6
SLIDE 6

Explain colours: white = no vulnerabilities, blue -> red: progressively more

slide-7
SLIDE 7

Explain colours: white = no vulnerabilities, blue -> red: progressively more

slide-8
SLIDE 8

Explain colours: white = no vulnerabilities, blue -> red: progressively more

slide-9
SLIDE 9

Explain colours: white = no vulnerabilities, blue -> red: progressively more

slide-10
SLIDE 10

Explain colours: white = no vulnerabilities, blue -> red: progressively more

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13

Distribution of RHSAs

Number of RHSAs Number of Packages 0 8 18 30 41 73 88 112 129 1 10 100 600

kernel, kernel-doc php-related

top not shown

2/3 of packages

Note logarithmic y-axis. 3241 packages in total, about 2/3 with no known vulnerabilities.

slide-14
SLIDE 14

Properties of packages, not properties of the software in the package

slide-15
SLIDE 15

Are there properties that correlate with vulnerabilities?

Properties of packages, not properties of the software in the package

slide-16
SLIDE 16

Are there properties that correlate with vulnerabilities? Are there properties that increase or decrease the risk?

Properties of packages, not properties of the software in the package

slide-17
SLIDE 17

Are there properties that correlate with vulnerabilities? Are there properties that increase or decrease the risk? Can we predict whether a package contains unknown vulnerabilities?

Properties of packages, not properties of the software in the package

slide-18
SLIDE 18

Are there properties that correlate with vulnerabilities? Are there properties that increase or decrease the risk? Can we predict whether a package contains unknown vulnerabilities? ✔ Dependencies

Properties of packages, not properties of the software in the package

slide-19
SLIDE 19

Are there properties that correlate with vulnerabilities? Are there properties that increase or decrease the risk? Can we predict whether a package contains unknown vulnerabilities? ✔ Dependencies ✔ Beauties and Beasts

Properties of packages, not properties of the software in the package

slide-20
SLIDE 20

Are there properties that correlate with vulnerabilities? Are there properties that increase or decrease the risk? Can we predict whether a package contains unknown vulnerabilities? ✔ Dependencies ✔ Machine Learning ✔ Beauties and Beasts

Properties of packages, not properties of the software in the package

slide-21
SLIDE 21

Dependencies

slide-22
SLIDE 22

amanda-server

Dependencies

slide-23
SLIDE 23

amanda-server glibc

Dependencies

slide-24
SLIDE 24

amanda-server readline amanda glibc xinetd gnuplot grep libtermcap coreutils perl

Dependencies

slide-25
SLIDE 25

Dependencies and Vulnerabilities

  • Dependency A → B exists because A wants

to use the services offered by B

  • Vulnerability exists in A if
  • A is in an insecure domain (domains are

characterised by dependencies)

  • B is insecure and fix in B spills over to A; or
  • B is difficult to use securely

Packages in same domain will tend to have same dependencies. Domain examples are: compilers, games, offjce applications,

slide-26
SLIDE 26

Red Hat Dependencies

slide-27
SLIDE 27

100 200 300 400

Distribution of Package Dependencies

Number of Packages Number of Dependencies 4 8 13 19 25 31 37 43 50 56 62 75 81 88 96

kdebase development packages containing headers

Distribution is apparently logarithmic with a long tail. This is not transitive closure. kdebase has 14 RHSAs (but 96 dependencies), kernel has 129 (but 0 dependencies), so number of dependencies is not a good predictor of number of RHSAs

slide-28
SLIDE 28

Are there properties that correlate with vulnerabilities? Are there properties that increase or decrease the risk? Can we predict whether a package contains unknown vulnerabilities? ✔ Dependencies ✔ Machine Learning ✔ Beauties and Beasts

slide-29
SLIDE 29

Where does the addition of dependencies significantly increase/ decrease the risk?

slide-30
SLIDE 30

Where does the addition of dependencies significantly increase/ decrease the risk?

  • 1. Data structure: concept lattice
slide-31
SLIDE 31

Where does the addition of dependencies significantly increase/ decrease the risk?

  • 1. Data structure: concept lattice
  • 2. Compute change in risk
slide-32
SLIDE 32

Where does the addition of dependencies significantly increase/ decrease the risk?

  • 1. Data structure: concept lattice
  • 2. Compute change in risk
  • 3. Include only statistically significant changes
slide-33
SLIDE 33

Step 1: Data Structure

Start with no knowledge about dependencies (top node contains all packages). Add knowledge of glibc (node contains all packages depending on glibc), then qt (node contains all packages depending on qt and glibc), then xorg-x11-libs (node contains all packages

slide-34
SLIDE 34

Step 1: Data Structure

Start with no knowledge about dependencies (top node contains all packages). Add knowledge of glibc (node contains all packages depending on glibc), then qt (node contains all packages depending on qt and glibc), then xorg-x11-libs (node contains all packages

slide-35
SLIDE 35

Block 1: All packages depending on glibc

∅ glibc

Step 1: Data Structure

Start with no knowledge about dependencies (top node contains all packages). Add knowledge of glibc (node contains all packages depending on glibc), then qt (node contains all packages depending on qt and glibc), then xorg-x11-libs (node contains all packages

slide-36
SLIDE 36

Block 1: All packages depending on glibc

kdelibs ∅ glibc

Step 1: Data Structure …

Start with no knowledge about dependencies (top node contains all packages). Add knowledge of glibc (node contains all packages depending on glibc), then qt (node contains all packages depending on qt and glibc), then xorg-x11-libs (node contains all packages

slide-37
SLIDE 37

Block 1: All packages depending on glibc Block 2: All packages depending on glibc, qt Block 3: All packages depending on glibc, qt, xorg-x11-libs

kdelibs qt xorg-x11-libs ∅ glibc

Step 1: Data Structure …

Start with no knowledge about dependencies (top node contains all packages). Add knowledge of glibc (node contains all packages depending on glibc), then qt (node contains all packages depending on qt and glibc), then xorg-x11-libs (node contains all packages

slide-38
SLIDE 38

Block 1: All packages depending on glibc Block 2: All packages depending on glibc, qt Block 3: All packages depending on glibc, qt, xorg-x11-libs

kdelibs qt xorg-x11-libs ∅ glibc

Step 1: Data Structure …

Start with no knowledge about dependencies (top node contains all packages). Add knowledge of glibc (node contains all packages depending on glibc), then qt (node contains all packages depending on qt and glibc), then xorg-x11-libs (node contains all packages

slide-39
SLIDE 39

∅ 32.9% vulnerable (1065 out of 3241) glibc 33.5% vulnerable (692 out of 2066) kdelibs 85.6% vulnerable (143 out of 167) glibc, qt 77.4% vulnerable (120 out of 155) glibc, qt, xorg-x11-libs 79.4% vulnerable (27 out of 34)

Step 2: Compute Risk Change

Question: Is the rise of 43.9% when going from {glibc} to {glibc, qt} just some random

fluctuation? We test this using statistical tests (Chi^2 or Fischer exact) and discard the “random fluctuation” hypothesis when the probability of such a increase happening

slide-40
SLIDE 40

∅ 32.9% vulnerable (1065 out of 3241) glibc 33.5% vulnerable (692 out of 2066) kdelibs 85.6% vulnerable (143 out of 167) glibc, qt 77.4% vulnerable (120 out of 155) glibc, qt, xorg-x11-libs 79.4% vulnerable (27 out of 34) +0.6% +52.7% +43.9% +2.0%

Step 2: Compute Risk Change

Question: Is the rise of 43.9% when going from {glibc} to {glibc, qt} just some random

fluctuation? We test this using statistical tests (Chi^2 or Fischer exact) and discard the “random fluctuation” hypothesis when the probability of such a increase happening

slide-41
SLIDE 41

∅ 32.9% vulnerable (1065 out of 3241) glibc 33.5% vulnerable (692 out of 2066) kdelibs 85.6% vulnerable (143 out of 167) glibc, qt 77.4% vulnerable (120 out of 155) glibc, qt, xorg-x11-libs 79.4% vulnerable (27 out of 34) +0.6% +52.7% +43.9% +2.0%

Risk change by adding qt

  • nly when already dependent
  • n glibc! (glibc is the context)

Step 2: Compute Risk Change

Question: Is the rise of 43.9% when going from {glibc} to {glibc, qt} just some random

fluctuation? We test this using statistical tests (Chi^2 or Fischer exact) and discard the “random fluctuation” hypothesis when the probability of such a increase happening

slide-42
SLIDE 42
  • Risk changes with significance p < 0.01
  • No significant and more general context

exists for this dependency

  • Risk goes up: “beast”
  • Risk goes down: “beauty”

Step 3: Include Only Significant Changes

slide-43
SLIDE 43

Context Dependency Risk before Risk after Change ∅

  • penoffice.org-core

0.329 1.000 0.671 ∅ kdelibs 0.329 0.856 0.527 ∅ cups-libs 0.329 0.774 0.445 ∅ libmng 0.329 0.769 0.440 glibc qt 0.335 0.774 0.439 glibc krb5-libs 0.335 0.769 0.434

Selected Beasts

The complete list can be found in the paper

Explain packages, don’t just list names

slide-44
SLIDE 44

Context Dependency Risk before Risk after Change glibc xorg-x11-server-Xorg 0.335 0.015

  • 0.320

compat- glibc, glibc, zlib

audiofile 0.613 0.359

  • 0.254

glibc, glibc- debug, zlib

audiofile 0.590 0.351

  • 0.239

∅ gnome-keyring 0.329 0.101

  • 0.228

glibc, zlib

gnome-libs 0.456 0.281

  • 0.175

∅ python 0.329 0.132

  • 0.197

Selected Beauties

The complete list can be found in the paper

Explain possible consequences: new applications: choose less risky dependencies

slide-45
SLIDE 45

Are there properties that correlate with vulnerabilities? Are there properties that increase or decrease the risk? Can we predict whether a package contains unknown vulnerabilities? ✔ Dependencies ✔ Machine Learning ✔ Beauties and Beasts

slide-46
SLIDE 46

Is it possible to predict…

  • from the dependencies which packages are

vulnerable (classification)?

  • which packages will have the most vulnerabilities

(ranking)?

slide-47
SLIDE 47

Experiment

X Y

Dependencies Vulnerabilities

Repeat 50x This “self-testing” is a standard evaluation technique for machine learning methods

slide-48
SLIDE 48

Experiment

X Y

Dependencies Vulnerabilities

Repeat 50x This “self-testing” is a standard evaluation technique for machine learning methods

slide-49
SLIDE 49

Experiment

f

X Y Train Model

Dependencies Vulnerabilities

Repeat 50x This “self-testing” is a standard evaluation technique for machine learning methods

slide-50
SLIDE 50

Experiment

f

X Y Train Test Y’ Model

Dependencies Vulnerabilities

Repeat 50x This “self-testing” is a standard evaluation technique for machine learning methods

slide-51
SLIDE 51

Experiment

f

X Y Train Test Y’ Model

Dependencies Vulnerabilities

Repeat 50x This “self-testing” is a standard evaluation technique for machine learning methods

slide-52
SLIDE 52

Indicators

Don’t mention -1. We want values near 1.

slide-53
SLIDE 53

Indicators

Classification

Don’t mention -1. We want values near 1.

slide-54
SLIDE 54

Indicators

precision = true positives true positives + false positives recall = true positives true positives + false negatives

Classification

Don’t mention -1. We want values near 1.

slide-55
SLIDE 55

Indicators

precision = true positives true positives + false positives recall = true positives true positives + false negatives

Classification 1

Don’t mention -1. We want values near 1.

slide-56
SLIDE 56

Indicators

precision = true positives true positives + false positives recall = true positives true positives + false negatives

Classification Ranking 1

Don’t mention -1. We want values near 1.

slide-57
SLIDE 57

Indicators

precision = true positives true positives + false positives recall = true positives true positives + false negatives

1 2 3 4 1 2 3 4 Classification Ranking 1 1

Don’t mention -1. We want values near 1.

slide-58
SLIDE 58

Indicators

precision = true positives true positives + false positives recall = true positives true positives + false negatives

1 2 3 4 1 2 3 4 1 2 3 4 2 4 1 3 Classification Ranking 1 1

Don’t mention -1. We want values near 1.

slide-59
SLIDE 59

Indicators

precision = true positives true positives + false positives recall = true positives true positives + false negatives

1 2 3 4 4 3 2 1 1 2 3 4 1 2 3 4 1 2 3 4 2 4 1 3 Classification Ranking 1

  • 1

1

Don’t mention -1. We want values near 1.

slide-60
SLIDE 60

Indicators

precision = true positives true positives + false positives recall = true positives true positives + false negatives

1 2 3 4 4 3 2 1 1 2 3 4 1 2 3 4 1 2 3 4 2 4 1 3 Classification Ranking 1

  • 1

1

Don’t mention -1. We want values near 1.

slide-61
SLIDE 61
  • 0.4

0.5 0.6 0.7 0.8 0.9 0.4 0.5 0.6 0.7 0.8 0.9 Precision versus Recall Recall Precision

  • SVM

Decision Tree

Results of 50 random splits: train with 2/3 of the packages, predict with the rest, record precision and recall.

slide-62
SLIDE 62
  • 0.4

0.5 0.6 0.7 0.8 0.9 0.4 0.5 0.6 0.7 0.8 0.9 Precision versus Recall Recall Precision

  • SVM

Decision Tree

Results of 50 random splits: train with 2/3 of the packages, predict with the rest, record precision and recall.

slide-63
SLIDE 63
  • 0.4

0.5 0.6 0.7 0.8 0.9 0.4 0.5 0.6 0.7 0.8 0.9 Precision versus Recall Recall Precision

  • SVM

Decision Tree

Results of 50 random splits: train with 2/3 of the packages, predict with the rest, record precision and recall.

slide-64
SLIDE 64
  • 0.4

0.5 0.6 0.7 0.8 0.9 0.4 0.5 0.6 0.7 0.8 0.9 Precision versus Recall Recall Precision

  • SVM

Decision Tree

Decision Trees worse than SVMs

Results of 50 random splits: train with 2/3 of the packages, predict with the rest, record precision and recall.

slide-65
SLIDE 65
  • 0.4

0.5 0.6 0.7 0.8 0.9 0.4 0.5 0.6 0.7 0.8 0.9 Precision versus Recall Recall Precision

  • SVM

Decision Tree

Decision Trees worse than SVMs

Results of 50 random splits: train with 2/3 of the packages, predict with the rest, record precision and recall.

slide-66
SLIDE 66
  • 0.4

0.5 0.6 0.7 0.8 0.9 0.4 0.5 0.6 0.7 0.8 0.9 Precision versus Recall Recall Precision

  • SVM

Decision Tree

Decision Trees worse than SVMs

Results of 50 random splits: train with 2/3 of the packages, predict with the rest, record precision and recall.

slide-67
SLIDE 67
  • 0.4

0.5 0.6 0.7 0.8 0.9 0.4 0.5 0.6 0.7 0.8 0.9 Precision versus Recall Recall Precision

  • SVM

Decision Tree

Predictions are correct 83% of the time 65% of all vulnerable packages predicted

Results of 50 random splits: train with 2/3 of the packages, predict with the rest, record precision and recall.

slide-68
SLIDE 68

0.52 0.54 0.56 0.58 0.60 0.62 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Rank Correlation Rank Correlation Coefficient Fraction of Splits

  • Even though “self-evaluation” is a standard technique, what we realy want to know is if the

method is able to predict the future... (next slide)

slide-69
SLIDE 69

January 1, 2008 August 31, 2008 predict evaluate Top 25 out of 2181 73 new vulnerable

Package Name mod_php php-dbg php-dbg-server perl-DBD-Pg kudzu irda-utils hpoj libbdevid-python mrtg evolution28-evolution-data-server lilo ckermit dovecot kde2-compat gq vorbis-tools k3b taskjuggler ddd tora libpurple libwvstreams pidgin linuxwacom policycoreutils-newrole … 2156 further packages

slide-70
SLIDE 70

January 1, 2008 August 31, 2008 predict evaluate Top 25 out of 2181 73 new vulnerable

Package Name mod_php php-dbg php-dbg-server perl-DBD-Pg kudzu irda-utils hpoj libbdevid-python mrtg evolution28-evolution-data-server lilo ckermit dovecot kde2-compat gq vorbis-tools k3b taskjuggler ddd tora libpurple libwvstreams pidgin linuxwacom policycoreutils-newrole … 2156 further packages

slide-71
SLIDE 71

January 1, 2008 August 31, 2008 predict evaluate Top 25 out of 2181 73 new vulnerable

Patch published 2009-05-12

Package Name mod_php php-dbg php-dbg-server perl-DBD-Pg kudzu irda-utils hpoj libbdevid-python mrtg evolution28-evolution-data-server lilo ckermit dovecot kde2-compat gq vorbis-tools k3b taskjuggler ddd tora libpurple libwvstreams pidgin linuxwacom policycoreutils-newrole … 2156 further packages

slide-72
SLIDE 72

Consequences

  • When building new applications, choose less

risky dependencies – use GNU-SASL instead of cyrus-sasl, Gnome instead of KDE

  • When maintaining existing applications,

prioritise resources – look at krb5-libs, not at gkermit

slide-73
SLIDE 73

Conclusions

  • Vulnerabilities correlate with dependencies
  • Identification of risky dependencies
  • Prediction with high precision, recall, correlation

http://research.microsoft.com/projects/esm/ http://www.artdecode.de/

* Have we worked with Red Hat: yes, have received positive feedback * Usage Data: nonexistent * Explain Correlation: See previous slide: domains