vulnerability prediction models a case study on the linux
play

Vulnerability Prediction Models: A case study on the Linux Kernel - PowerPoint PPT Presentation

Vulnerability Prediction Models: A case study on the Linux Kernel Matthieu Jimenez Mike Papadakis Yves Le Traon Jimenez et al. Vulnerability Prediction Models: A Case Study on the Linux Kernel SCAM16 1 Slides: Matthieu


  1. Vulnerability Prediction Models: A case study on the Linux Kernel Matthieu Jimenez Mike Papadakis Yves Le Traon Jimenez et al. “Vulnerability Prediction Models: A Case Study on the Linux Kernel” SCAM’16 � 1 Slides: Matthieu Jimenez Thème: Sébastien Mosser

  2. Vulnerabilities ? � 2

  3. A vulnerability “An information security ‘ vulnerability’ is a mistake in a software that can be directly used by a hacker to gain access to a system or network.” ~ CVE - website ~ � 3

  4. Vulnerabilities are special More Important - Critical There are more bugs than vulnerabilities Uncovered differently - defects can be easily noticed, while vulnerabilities not. � 4

  5. Vulnerabilities are Web server used to remotely control the glassware-cleaning machine CVE for that… � 5

  6. Prediction Model ? � 6

  7. Prediction Models Models analysing current and historical events to make prediction about the future and/or unknown events ! � 7

  8. Vulnerability Prediction Model ? � 8

  9. Vulnerability Prediction Take advantage of the knowledge on some part of a software system and/ or previous releases � 9

  10. Vulnerability Prediction to automatically classify software entities as vulnerable or not ! � 10

  11. Software Entities ? � 11

  12. Granularity Possibility to work at : • module level • file level • function level •… � 12

  13. In this work , we stay at the file level ! � 13 *Morrison et al. “Challenges with applying vulnerability prediction models,” in HotSoS’15.

  14. GOAL � 14

  15. Replicating and comparing the main VPMs approaches on the same software system . � 15

  16. Replication … � 16

  17. Exact independent replication � 17

  18. Exact replication procedures of an experiment are followed as closely as possible e.g. here we replicate using the same machine learning settings � 18

  19. Independent replication deliberately vary one or more major aspects of the conditions of the experiment e.g. we use our dataset � 19

  20. Approaches … � 20

  21. #Include and f(n) calls � 21

  22. Include & Function calls Introduced by Neuhaus et al. at CCS’07 � 22

  23. Include & Function calls Introduced by Neuhaus et al. at CCS’07 Intuition : vulnerable files share similar set of imports and function calls � 23

  24. Include & Function calls Introduced by Neuhaus et al. at CCS’07 Intuition : vulnerable files share similar set of imports and function calls build a model based on either includes or function calls of a file . � 24

  25. Overview Preprocessing Learning Retrieve all include Include & function SVM with a linear and function calls of a calls kernel file 2 models are build � 25

  26. Software Metrics � 26

  27. Software Metrics Several works on using metrics to predict vulnerabilities , mostly by Shin et al. � 27

  28. Software Metrics Several works on using metrics to predict vulnerabilities , mostly by Shin et al. Software metrics are used in defect prediction build a model based software metrics (complexity, code churn, …) � 28

  29. Overview Preprocessing Learning Compute complexity metrics of each function (keeping sum, avg and max) Software Metrics Logistic regression code churn and the number of authors of every files. � 29

  30. Text Mining � 30

  31. Text Mining suggested by Scandariato et al. in 2014. � 31

  32. Text Mining suggested by Scandariato et al. in 2014. Aim : building a model requiring no human intuition for feature selection � 32

  33. Text Mining suggested by Scandariato et al. in 2014. Aim : building a model requiring no human intuition for feature selection build a model based on a bag of word extracted from a file � 33

  34. Overview Preprocessing Learning •Discretisation of the Creating a bag of features (making word (splitting the them boolean) code according to the •Remove of all Text mining language grammar) features considered for every files useless •Random Forest with 100 trees � 34

  35. Dataset � 35

  36. Introducing the dataset based on commit and not release � 36

  37. Introducing the dataset • CVE-NVD database as a source of vulnerabilities • Bugzilla as a source of bugs � 37

  38. Introducing the dataset •build automatically •with the latest data available •on the Linux Kernel � 38

  39. Overall dataset statistics 2006-June 2016 • 1,640 vulnerable files , accounting for 743 vulnerabilities • 4,900 buggy files related to 3,400 bug reports • more than 50,000 files in total � 39

  40. Research Questions • RQ1 . Can we distinguish between buggy and vulnerable files ? � 40

  41. Research Questions • RQ1 . Can we distinguish between buggy and vulnerable files ? • RQ2 . Can we distinguish between vulnerable and non vulnerable files ? � 41

  42. Research Questions • RQ1 . Can we distinguish between buggy and vulnerable files ? • RQ2 . Can we distinguish between vulnerable and non vulnerable files ? • RQ3 . Can we predict future vulnerable when using past data ? � 42

  43. Research Questions • RQ1 . Can we distinguish between buggy and vulnerable files ? • RQ2 . Can we distinguish between vulnerable and non vulnerable files ? • RQ3 . Can we predict future vulnerable when using past data ? ✦ Distinguish between buggy and vulnerable files ✦ Distinguish between vulnerable and non vulnerable files ? � 43 •

  44. Experimental Dataset * Buggy vs Vulnerable files � 44

  45. Experimental dataset Can we distinguish between buggy and vulnerable files ? • files related to bug report patches vs files from vulnerability patches • ratio 3.3 : 1 � 45

  46. Realistic Dataset * Vulnerable vs Non-Vulnerable files � 46

  47. Realistic dataset • Can we distinguish between Vulnerable and Non-Vulnerable files? • Reproduce observed ratio between different categories of files • 3% of (likely) vulnerable files • 47% of (likely) buggy files • 50% of clear files � 47

  48. Evaluation � 48

  49. RQ1 - Bugs vs Vulnerabilities 1.0 0.8 0.6 MCC 0.4 0.2 ● 0.0 Function Calls Includes Software Metrics Text Mining � 49

  50. RQ2 - Vulnerable vs Non- 1.0 0.8 ● ● ● 0.6 MCC 0.4 ● 0.2 ● 0.0 Function Calls Includes Software Metrics Text Mining � 50

  51. RQ3 Time - Bugs vs Precision Recall

  52. RQ3 Time - Bugs vs 1.00 ● Function Calls Includes Software Metrics Text Mining 0.75 mcc 0.50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.25 0.00 5 10 15 20 release

  53. RQ3 Time - Vulnerable vs Non- 1.00 1.00 ● ● ● ● ● ● ● ● 0.75 0.75 ● ● ● ● ● ● ● ● ● ● ● precision ● ● ● ● recall ● 0.50 0.50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.25 0.25 0.00 0.00 5 10 15 20 5 10 15 20 release release � 53

  54. RQ3 Time - Vulnerable vs 1.00 ● Function Calls Includes Software Metrics Text Mining 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● mcc ● ● ● ● ● ● ● 0.50 0.25 0.00 5 10 15 20 release

  55. Discussion - Findings � 55

  56. 1 VPM’s are working well with historical data � 56

  57. 2 Good precision observed even with unbalanced data � 57

  58. 3 In the practical case , the best trade off is in favour of include and function calls � 58

  59. 4 In the general case , or favouring precision the best one is text mining . � 59

  60. Previous studies Include and Function calls There is no comparison with Metrics or Text Mining There are no results related to time In the context of Linux We found Reported we have similar results… Precision 70% Precision 70% Recall 45% Recall 64% Neuhaus et al. “Predicting vulnerable software components” CCS’07. � 60

  61. Previous studies Software Metrics Reported 10 fold cross validation We found In the context of Linux Precision 3-5, 9, 2-52% Precision 65% Recall 87-90, 91, 66-79% Recall 22% there are significant differences… Reported results based on time We found Precision 3% Precision 42 : 39% Recall 79-85% Recall 16 : 24% Shin et al. “Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities” TSE ’11. Shinand et al. “Cantraditionalfaultpredictionmodelsbeused for vulnerability prediction?” ESE’ 13. Walden et al. “Predicting Vulnerable Components: Software Metrics vs Text Mining” ISSRE’14. � 61

  62. Previous studies Text Mining Reported We found 10 fold cross validation In the context of Linux Precision 90, 2-57% Precision 76% there are again Recall 77, 74-81% Recall 58% significant differences Reported results based on time We found Precision 86% Precision 74 : 93% Recall 77% Recall 37 : 27% Scandariato et al.“Predicting Vulnerable Software Components via Text Mining” TSE ’14. � 62 Walden et al. “Predicting Vulnerable Components: Software Metrics vs Text Mining” ISSRE’14.

  63. DataSet and Replication package and additional results will be available soon … Please contact Matthieu Jimenez ( Matthieu.Jimenez@uni.lu ) � 63

  64. Thank you for your attention ! � 64

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend