Vulnerability Extrapolation USENIX WOOT 2011 Fabian fabs Yamaguchi - - PowerPoint PPT Presentation

vulnerability extrapolation
SMART_READER_LITE
LIVE PREVIEW

Vulnerability Extrapolation USENIX WOOT 2011 Fabian fabs Yamaguchi - - PowerPoint PPT Presentation

Vulnerability Extrapolation USENIX WOOT 2011 Fabian fabs Yamaguchi Recurity Labs GmbH, Germany Agenda Patterns you find when auditing code Exploiting these patterns: Vulnerability Extrapolation Using machine learning to get


slide-1
SLIDE 1

Vulnerability Extrapolation USENIX WOOT 2011

Fabian „fabs‟ Yamaguchi Recurity Labs GmbH, Germany

slide-2
SLIDE 2

Agenda

  • Patterns you find when auditing code
  • Exploiting these patterns:

Vulnerability Extrapolation

  • Using machine learning to get there
  • A method to assist in manual code audits

based on this idea

  • The method in practice
  • A showcase
slide-3
SLIDE 3

Exploring a new code base

  • Like an area of mathematics you don‟t yet know.
  • It‟s not completely different from the

mathematics you already know.

  • But there are secrets specific to this area:
  • Vocabulary
  • Reoccurring patterns in argumentation
  • Weird tricks used in proofs
  • Understanding the specifics of the area makes it

a lot easier to reason about it.

slide-4
SLIDE 4

Another Example: libTIFF CVE-2006-3459 | CVE-2010-2067

sta tic in t T IFFFetch S h ortPair(T IFF* tif, T IFFD irE n try * d ir) { sw itch (d ir-> td ir_ typ e ) { ca se T IFF_ B YT E : ca se T IFF_ S B YT E : { u in t8 v[4 ]; re tu rn T IFFFetch B yteA rray (tif, d ir, v) & & T IFFS etField (tif, d ir-> td ir_ tag , v[0 ], v[1 ]); } ca se T IFF_ S H O R T : ca se T IFF_ S S H O R T : { u in t1 6 v[2 ]; re tu rn T IFFFetch S h ortA rray (tif, d ir, v) & & T IFFS etField (tif, d ir-> td ir_ tag , v[0 ], v[1 ]); } d e fa u lt: re tu rn 0 ; } }

slide-5
SLIDE 5

Another Example: libTIFF CVE-2006-3459 | CVE-2010-2067

sta tic in t T IFFFetch S h ortPair(T IFF* tif, T IFFD irE n try * d ir) { sw itch (d ir-> td ir_ typ e ) { ca se T IFF_ B YT E : ca se T IFF_ S B YT E : { u in t8 v[4 ]; re tu rn T IFFFetch B yteA rray (tif, d ir, v) & & T IFFS etField (tif, d ir-> td ir_ tag , v[0 ], v[1 ]); } ca se T IFF_ S H O R T : ca se T IFF_ S S H O R T : { u in t1 6 v[2 ]; re tu rn T IFFFetch S h ortA rray (tif, d ir, v) & & T IFFS etField (tif, d ir-> td ir_ tag , v[0 ], v[1 ]); } d e fa u lt: re tu rn 0 ; } }

s ta tic in t T IF F F e tch S u b je ctD ista n ce ( T IF F * tif, T IF F D irE n try * d ir) { u in t3 2 l[ 2 ] ; flo a t v ; in t o k = 0 ; if ( T IF F F e tch D a ta ( tif, d ir, ( c h a r * ) l) & & cv tR a tio n a l( tif, d ir, l[ 0 ] , l[ 1 ] , & v ) ) { /*

* X X X : N um erator 0 x F F F F F F F F m eans th at w e h ave infinite * d istance. I nd icate th at w ith a negative floating point * S ub jectD istance value. */

  • k = T IF F S e tF ie ld ( tif, d ir-> td ir_ ta g ,

( l[ 0 ] != 0 x F F F F F F F F ) ? v : -v ) ; } re tu rn o k ; }

slide-6
SLIDE 6

LibTIFF: Bug Analysis

  • TIFFFetchShortArray is actually a wrapper

around TIFFFetchData.

  • The two are pretty much synonyms.
  • These functions are part of an API local to

libTIFF.

  • Badly designed API: the amount of data to be

copied into the buffer is passed in one of the fields of the dir-structure and not explicitly!

  • Developers missed this in both cases and it‟s

hard to blame them.

slide-7
SLIDE 7

The times of “grep „memcpy‟ ./*.c” may be

  • ver. But that does not mean patterns of

API use that lead to vulnerabilities no longer exist!

slide-8
SLIDE 8

Vulnerability Extrapolation

  • Given a function known to be vulnerable,

determine functions similar to this one in terms of application-specific API usage patterns.

  • Vulnerability Extrapolation exploits the

information leak you get every time a vulnerability is disclosed!

slide-9
SLIDE 9

What needs to be done

  • We need to be able to determine how

“similar” functions are in terms of dominant programming patterns.

  • We need to find a way to extract these

programming patterns from a code-base in the first place.

  • How do we do that?
slide-10
SLIDE 10

Similarity – A decomposition

Signal Processing: Decomposition into components of different frequencies: Noise is suspected to be of high frequency while the signal is of lower frequency.

Decomposition into shape and rotation: If rotation is just a detail, these are pretty similar. In Face-Recognition, faces are decomposed into weighted sums of commonly found patterns + a noise-term.

slide-11
SLIDE 11

Think of it as ‘zooming out’

Decreasing dominance of pattern Increasing level of detail/frequency

s ta tic in t T IF F F e tch S u b je ctD ista n ce (T IF F * tif, T IF F D irE n try * d ir) { u in t3 2 l[2 ]; flo a t v ; in t o k = 0 ; if (T IF F F e tch D a ta (tif, d ir, (c h a r * ) l) & & cv tR a tio n a l(tif, d ir, l[0 ], l[1 ], & v )) { /* * X X X : N um erator 0 x F F F F F F F F m eans th at w e h ave infinite * d istance. Ind icate th at w ith a negative floating point * S ub jectD istance value. */
  • k = T IF F S e tF ie ld (tif, d ir-> td ir_ ta g ,
(l[0 ] != 0 x F F F F F F F F ) ? v : -v ); } re tu rn o k ; }

Usage Pattern Usage Pattern Usage Pattern Linear approximation of each function by the most dominant API usage patterns of the code-base it is contained in!

slide-12
SLIDE 12

Extracting dominant patterns

How do we identify the most dominant API usage patterns of a code-base? In Face Recognition, a standard technique is Principal Component Analysis.

slide-13
SLIDE 13

Mapping code to the vector space

  • Describe functions by the API-symbols they contain.
  • API-symbols are extracted using a fuzzy parser.
  • Each API-symbol is associated with a dimension.

func1(){ int *ptr = malloc(64); fetchArray(pb, ptr); }

slide-14
SLIDE 14

Principal Component Analysis

Data Matrix (Contains all function-vectors) Representation of functions in terms

  • f the most dominant patterns

Each row is a representation

  • f an API-symbol in terms of

the most dominant patterns Each column of U is a dominant pattern. Strength of pattern

slide-15
SLIDE 15

In summary

slide-16
SLIDE 16

A toy problem to gain an intuition Group 1

v o id g u iFu n c1 (G tk W id g e t * w id g e t) { in t j; g u i_ m a k e _ w in d o w (w id g e t); G tk B u tto n * b u tto n ; b u tto n = g u i_ n e w _ b u tto n (); g u i_ sh o w _ w in d o w (); }

v o id gu iFu n c2 (G tkW idget * w idget) { gu i_ m ake_ w in dow (w idget); G tkB u tton * m yB u tton ; bu tton 1 = gu i_ n ew _ bu tton (); bu tton 2 = gu i_ n ew _ bu tton (); bu tton 3 = gu i_ n ew _ bu tton (); fo r(in t i = 1 0 ; i != i; i+ + ) do_ gu i_ stu ff(); }

slide-17
SLIDE 17

Group2

v o id n e tF u n c1 () { in t fd ; in t i = 0 ; s tru c t so ck a d d r_ in in ; fd = so ck e t(a rg u m e n ts); re cv (fd , m o re A rg u m e n ts ); if(co n d itio n ){ i+ + ; se n d (fd , i, a rg ); } se n d (fd , i, a rg ); clo se (fd ); } v o id n e tF u n c2 () { in t fd ; s tru c t so ck a d d r_ in in ; h o ste n t h o st; fd = so ck e t(a rg u m e n ts); re cv (fd , m o re A rg u m e n ts ); g e th o stb y n a m e (h o st) if(co n d itio n ){ in t i = 0 ; i+ + ; se n d (fd , i, a rg ); } clo se (fd ); }

slide-18
SLIDE 18

Group 3

v o id listF u n c1 (in t e le m ) { G List m y List; if(! list_ ch e ck (m y List)){ d o _ list_ e rro r_ stu ff(); re tu rn ; } list_ a d d (m y List, e le m ); } v o id listF u n c2 (in t e le m ) { G List m y List; if(! list_ ch e ck (m y List)){ d o _ list_ e rro r_ stu ff(); re tu rn ; } list_ re m o v e (m y List, e le m ); list_ d e le te (m y List); }

slide-19
SLIDE 19

Projection onto the first two principal components

Core API Occurs in this context but does not constitute the pattern Functions

slide-20
SLIDE 20

Vulnerability Extrapolation

  • Take a function that used to be vulnerable

as an input.

  • Measure distances to other functions to

determine those functions, which are most similar.

  • Let‟s try that for FFmpeg.
slide-21
SLIDE 21

Original bug: CVE-2010-3429

s ta tic in t flic_ d e co d e _ fra m e _ 8 B P P (A V C o d e cC o n te x t * a v ctx , v o id * d a ta , in t * d a ta _ size , c o n s t u in t8 _ t * b u f, in t b u f_ size ) { [ ..] p ix e ls = s -> fra m e .d a ta [ 0 ] ; [ ..] c a s e F LI_ D E LT A : y _ p tr = 0 ; co m p re sse d _ lin e s = A V _ R L1 6 ( & b u f[ stre a m _ p tr] ) ; stre a m _ p tr + = 2 ; w h ile ( co m p re sse d _ lin e s > 0 ) { lin e _ p a c k e ts = A V _ R L 1 6 ( & b u f[ s tre a m _ p tr] ) ; stre a m _ p tr + = 2 ; if ( ( lin e _ p a ck e ts & 0 x C 0 0 0 ) = = 0 x C 0 0 0 ) { // line skip opcod e lin e _ p a ck e ts = -lin e _ p a ck e ts ; y _ p tr + = lin e _ p a c k e ts * s -> fra m e .lin e s iz e [ 0 ] ; } e ls e if ( ( lin e _ p a ck e ts & 0 x C 0 0 0 ) = = 0 x 4 0 0 0 ) { [ ..] } e ls e if ( ( lin e _ p a ck e ts & 0 x C 0 0 0 ) = = 0 x 8 0 0 0 ) { // "last b yte" opcod e p ix e ls [ y _ p tr + s -> fra m e .lin e s iz e [ 0 ] -1 ] = lin e _ p a c k e ts & 0 x ff; } e ls e { [ ..] y _ p tr + = s -> fra m e .lin e s iz e [ 0 ] ; } } b re a k ; [ ..] }

unchecked index, Write to arbitrary location in memory. Decoder-Pattern: Usually a variable of type AvCodecContext AV_RL*-Functions used as sources. Lot‟s of primitive types with specified width used. Use of memcpy, memset, etc.

slide-22
SLIDE 22

Extrapolation

  • The closest match

contained the same vulnerability but it was fixed when the initial function was fixed.

0-Day

slide-23
SLIDE 23

0-Day

s ta tic v o id v m d _ d e co d e ( V m d V id e o C o n te x t * s) { [ ...] in t fra m e _ x , fra m e _ y ; in t fra m e _ w id th , fra m e _ h e ig h t; in t d p _ size ; fra m e _ x = A V _ R L 1 6 ( & s -> b u f[ 6 ] ) ; fra m e _ y = A V _ R L 1 6 ( & s -> b u f[ 8 ] ) ; fra m e _ w id th = A V _ R L 1 6 ( & s -> b u f[ 1 0 ] ) - fra m e _ x + 1 ; fra m e _ h e ig h t = A V _ R L 1 6 ( & s -> b u f[ 1 2 ] ) - fra m e _ y + 1 ; [ ...] if ( s -> size > = 0 ) { /* originally U npack F ram e in V A G 's cod e */ p b = p ; m e th = * p b + + ; [ ...] d p = & s -> fra m e .d a ta [ 0 ] [ fra m e _ y * s -> fra m e .lin e s iz e [ 0 ] + fra m e _ x ] ; d p _ size = s-> fra m e .lin e size [ 0 ] * s -> a v ctx -> h e ig h t; p p = & s -> p re v _ fra m e .d a ta [ 0 ] [ fra m e _ y * s -> p re v _ fra m e .lin e size [ 0 ] + fra m e _ x ] ; s w itc h ( m e th ) { [ ...] c a s e 2 : fo r ( i = 0 ; i < fra m e _ h e ig h t; i+ + ) { m e m c p y ( d p , p b , fra m e _ w id th ) ; p b + = fra m e _ w id th ; d p + = s -> fra m e .lin e size [ 0 ] ; p p + = s -> p re v _ fra m e .lin e size [ 0 ] ; } b re a k ; [ ...] } } }

Decoder-Pattern: Usually a variable of type AvCodecContext AV_RL*-Functions used as sources. Lot‟s of primitive types with specified width used. Use of memcpy, memset, etc. Again an unchecked index into the pixel- buffer!

slide-24
SLIDE 24

Summary

  • Often inherent link between vulnerabilities

and API usage patterns

  • Application of machine learning for

automatic identification of these patterns

  • Extrapolation of known vulnerabilities

using dominant API usage patterns

  • Discovery of a 0-day vulnerability in a

widely used application

slide-25
SLIDE 25

Questions?

Recurity Labs GmbH, Berlin, Germany http://www.recurity-labs.com

Fabian Yamaguchi

Vulnerability Researcher fabs@recurity-labs.com