vulnerability extrapolation
play

Vulnerability Extrapolation USENIX WOOT 2011 Fabian fabs Yamaguchi - PowerPoint PPT Presentation

Vulnerability Extrapolation USENIX WOOT 2011 Fabian fabs Yamaguchi Recurity Labs GmbH, Germany Agenda Patterns you find when auditing code Exploiting these patterns: Vulnerability Extrapolation Using machine learning to get


  1. Vulnerability Extrapolation USENIX WOOT 2011 Fabian „fabs‟ Yamaguchi Recurity Labs GmbH, Germany

  2. Agenda  Patterns you find when auditing code  Exploiting these patterns: Vulnerability Extrapolation  Using machine learning to get there  A method to assist in manual code audits based on this idea  The method in practice  A showcase

  3. Exploring a new code base  Like an area of mathematics you don‟t yet know.  It‟s not completely different from the mathematics you already know.  But there are secrets specific to this area:  Vocabulary  Reoccurring patterns in argumentation  Weird tricks used in proofs  Understanding the specifics of the area makes it a lot easier to reason about it.

  4. Another Example: libTIFF CVE-2006-3459 | CVE-2010-2067 sta tic in t T IFFFetch S h ortPair ( T IFF * tif , T IFFD irE n try * d ir ) { sw itch ( d ir -> td ir_ typ e ) { ca se T IFF_ B YT E : ca se T IFF_ S B YT E : { u in t8 v [ 4 ]; re tu rn T IFFFetch B yteA rray ( tif , d ir , v ) & & T IFFS etField ( tif , d ir -> td ir_ tag , v [ 0 ], v [ 1 ]); } ca se T IFF_ S H O R T : ca se T IFF_ S S H O R T : { u in t1 6 v [ 2 ]; re tu rn T IFFFetch S h ortA rray ( tif , d ir , v ) & & T IFFS etField ( tif , d ir -> td ir_ tag , v [ 0 ], v [ 1 ]); } d e fa u lt: re tu rn 0 ; } }

  5. Another Example: libTIFF CVE-2006-3459 | CVE-2010-2067 s ta tic in t T IF F F e tch S u b je ctD ista n ce ( T IF F * tif , T IF F D irE n try * d ir ) sta tic in t { T IFFFetch S h ortPair ( T IFF * tif , T IFFD irE n try * d ir ) u in t3 2 l [ 2 ] ; { flo a t v ; in t o k = 0 ; sw itch ( d ir -> td ir_ typ e ) { ca se T IFF_ B YT E : ca se T IFF_ S B YT E : if ( T IF F F e tch D a ta ( tif , d ir , ( c h a r * ) l ) { & & cv tR a tio n a l ( tif , d ir , l [ 0 ] , l [ 1 ] , & v ) ) { u in t8 v [ 4 ]; /* re tu rn T IFFFetch B yteA rray ( tif , d ir , v ) * X X X : N um erator 0 x F F F F F F F F m eans th at w e h ave infinite & & T IFFS etField ( tif , d ir -> td ir_ tag , v [ 0 ], v [ 1 ]); } * d istance. I nd icate th at w ith a negative floating point ca se T IFF_ S H O R T : * S ub jectD istance value. ca se T IFF_ S S H O R T : */ { o k = T IF F S e tF ie ld ( tif , d ir -> td ir_ ta g , u in t1 6 v [ 2 ]; ( l [ 0 ] != 0 x F F F F F F F F ) ? v : - v ) ; re tu rn T IFFFetch S h ortA rray ( tif , d ir , v ) } & & T IFFS etField ( tif , d ir -> td ir_ tag , v [ 0 ], v [ 1 ]); } re tu rn o k ; d e fa u lt: } re tu rn 0 ; } }

  6. LibTIFF: Bug Analysis  TIFFFetchShortArray is actually a wrapper around TIFFFetchData.  The two are pretty much synonyms.  These functions are part of an API local to libTIFF.  Badly designed API: the amount of data to be copied into the buffer is passed in one of the fields of the dir-structure and not explicitly!  Developers missed this in both cases and it‟s hard to blame them.

  7. The times of “grep „memcpy‟ ./*.c” may be over. But that does not mean patterns of API use that lead to vulnerabilities no longer exist!

  8. Vulnerability Extrapolation  Given a function known to be vulnerable, determine functions similar to this one in terms of application-specific API usage patterns.  Vulnerability Extrapolation exploits the information leak you get every time a vulnerability is disclosed!

  9. What needs to be done  We need to be able to determine how “similar” functions are in terms of dominant programming patterns.  We need to find a way to extract these programming patterns from a code-base in the first place.  How do we do that?

  10. Similarity – A decomposition Decomposition into shape and rotation: If rotation is just a detail, these are pretty similar. In Face-Recognition, faces are decomposed into weighted sums of Signal Processing: Decomposition into commonly found patterns components of different frequencies: Noise is + a noise-term. suspected to be of high frequency while the signal is of lower frequency.

  11. Think of it as ‘zooming out’ Increasing level of detail/frequency s ta tic in t T IF F F e tch S u b je ctD ista n ce ( T IF F * tif , T IF F D irE n try * d ir ) Decreasing dominance of pattern { u in t3 2 l [ 2 ]; flo a t v ; in t o k = 0 ; if ( T IF F F e tch D a ta ( tif , d ir , (c h a r * ) l ) & & cv tR a tio n a l ( tif , d ir , l [ 0 ], l [ 1 ], & v )) { /* Usage Usage Usage * X X X : N um erator 0 x F F F F F F F F m eans th at w e h ave infinite * d istance. Ind icate th at w ith a negative floating point * S ub jectD istance value. */ o k = T IF F S e tF ie ld ( tif , d ir -> td ir_ ta g , Pattern Pattern Pattern ( l [ 0 ] != 0 x F F F F F F F F ) ? v : - v ); } re tu rn o k ; } Linear approximation of each function by the most dominant API usage patterns of the code-base it is contained in!

  12. Extracting dominant patterns How do we identify the most dominant API usage patterns of a code-base? In Face Recognition, a standard technique is Principal Component Analysis.

  13. Mapping code to the vector space  Describe functions by the API-symbols they contain.  API-symbols are extracted using a fuzzy parser.  Each API-symbol is associated with a dimension. func1(){ int *ptr = malloc(64); fetchArray(pb, ptr); }

  14. Principal Component Analysis Data Matrix (Contains all function-vectors) Strength of pattern Each column of U is a dominant pattern. Each row is a representation Representation of functions in terms of an API-symbol in terms of of the most dominant patterns the most dominant patterns

  15. In summary

  16. A toy problem to gain an intuition Group 1 v o id gu iFu n c2 ( G tkW idget * w idget ) v o id g u iFu n c1 ( G tk W id g e t * w id g e t ) { { gu i_ m ake_ w in dow ( w idget ); in t j ; G tkB u tton * m yB u tton ; g u i_ m a k e _ w in d o w ( w id g e t ); bu tton 1 = gu i_ n ew _ bu tton (); G tk B u tto n * b u tto n ; bu tton 2 = gu i_ n ew _ bu tton (); b u tto n = g u i_ n e w _ b u tto n (); bu tton 3 = gu i_ n ew _ bu tton (); g u i_ sh o w _ w in d o w (); } fo r(in t i = 1 0 ; i != i ; i + + ) do_ gu i_ stu ff (); }

  17. Group2 v o id n e tF u n c2 () { v o id n e tF u n c1 () in t fd ; { s tru c t so ck a d d r_ in in ; in t fd ; h o ste n t h o st ; in t i = 0 ; fd = so ck e t ( a rg u m e n ts ); s tru c t so ck a d d r_ in in ; re cv ( fd , m o re A rg u m e n ts ); fd = so ck e t ( a rg u m e n ts ); g e th o stb y n a m e ( h o st ) re cv ( fd , m o re A rg u m e n ts ); if( co n d itio n ){ if( co n d itio n ){ i + + ; in t i = 0 ; se n d ( fd , i , a rg ); i + + ; } se n d ( fd , i , a rg ); } se n d ( fd , i , a rg ); clo se ( fd ); clo se ( fd ); } }

  18. Group 3 v o id listF u n c2 (in t e le m ) v o id listF u n c1 (in t e le m ) { { G List m y List ; G List m y List ; if(! list_ ch e ck ( m y List )){ if(! list_ ch e ck ( m y List )){ d o _ list_ e rro r_ stu ff (); d o _ list_ e rro r_ stu ff (); re tu rn ; re tu rn ; } } list_ re m o v e ( m y List , e le m ); list_ a d d ( m y List , e le m ); list_ d e le te ( m y List ); } }

  19. Projection onto the first two principal components Core API Functions Occurs in this context but does not constitute the pattern

  20. Vulnerability Extrapolation  Take a function that used to be vulnerable as an input.  Measure distances to other functions to determine those functions, which are most similar.  Let‟s try that for FFmpeg.

  21. Original bug: CVE-2010-3429 s ta tic in t flic_ d e co d e _ fra m e _ 8 B P P ( A V C o d e cC o n te x t * a v ctx , v o id * d a ta , in t * d a ta _ size , Decoder-Pattern: c o n s t u in t8 _ t * b u f , in t b u f_ size ) { [ ..] Usually a variable of p ix e ls = s -> fra m e .d a ta [ 0 ] ; [ ..] c a s e F LI_ D E LT A : type AvCodecContext y _ p tr = 0 ; co m p re sse d _ lin e s = A V _ R L1 6 ( & b u f [ stre a m _ p tr ] ) ; AV_RL*-Functions stre a m _ p tr + = 2 ; w h ile ( co m p re sse d _ lin e s > 0 ) { used as sources. lin e _ p a c k e ts = A V _ R L 1 6 ( & b u f[ s tre a m _ p tr] ) ; stre a m _ p tr + = 2 ; Lot‟s of primitive types if ( ( lin e _ p a ck e ts & 0 x C 0 0 0 ) = = 0 x C 0 0 0 ) { // line skip opcod e with specified width lin e _ p a ck e ts = - lin e _ p a ck e ts ; used. y _ p tr + = lin e _ p a c k e ts * s -> fra m e .lin e s iz e [ 0 ] ; } e ls e if ( ( lin e _ p a ck e ts & 0 x C 0 0 0 ) = = 0 x 4 0 0 0 ) { [ ..] Use of memcpy, } e ls e if ( ( lin e _ p a ck e ts & 0 x C 0 0 0 ) = = 0 x 8 0 0 0 ) { memset, etc. // "last b yte" opcod e p ix e ls [ y _ p tr + s -> fra m e .lin e s iz e [ 0 ] -1 ] = lin e _ p a c k e ts & 0 x ff; } e ls e { [ ..] y _ p tr + = s -> fra m e .lin e s iz e [ 0 ] ; unchecked index, } } Write to arbitrary b re a k ; location in memory. [ ..] }

  22. Extrapolation  The closest match contained the same vulnerability but it was fixed when the initial function was fixed. 0-Day

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend