mining and understanding software enclaves muse
play

Mining and Understanding Software Enclaves (MUSE) Suresh - PowerPoint PPT Presentation

Mining and Understanding Software Enclaves (MUSE) Suresh Jagannathan Information Innovation Office DARPA http://www.darpa.mil/Our_Work/I2O/Programs/Mining_and_Understanding_Software_Enclaves_(MUSE).aspx 1 Distribution Statement A - Approved


  1. Mining and Understanding Software Enclaves (MUSE) Suresh Jagannathan Information Innovation Office DARPA http://www.darpa.mil/Our_Work/I2O/Programs/Mining_and_Understanding_Software_Enclaves_(MUSE).aspx 1 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  2. What is it? Next for DARPA: 'Autocomplete' for programmers Source: Phys.org Do We Really Need to Learn to Code? Source: The New Yorker Computer Programming Is a Dying Art Source: Newsweek Pentagon seeks 'big code' for 'big data' Source: USA Today 2 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  3. Trends > 21M repositories 24M > 10M LoC > 4M code snippets (open source) Navy’s newest warship (USS Zumwalt) runs on Linux The US government is the largest consumer of OSS 3 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  4. Why should the government care? Navy’s newest warship (USS Zumwalt) runs on Linux The US government is the 24M largest consumer of OSS in the world 4 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  5. Topic Modeling Open-Source Software Generic Program Properties Specialized Domain Properties Source: ohloh.net 5 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  6. System Architecture Source Binary OR Graph Database and Analytics Mining Engine Inspection α 1 α 2 Source Binary Artifact Property X X Checking OR Generation α 3 and Repair β 3 Program Analysis, Discovery Theorem Proving, Testing λ 3 β 2 Program that satisfies X: Learning and f( α 1 ) ◦ g( β 2 ) ◦ h( λ 3 ) Synthesis β 1 λ 2 λ 1 Query: “Synthesize a program that does X” 6 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  7. Enclaves Redundancies in the corpus exposed as dense components ( enclaves ) in the mined network • Nodes represent properties facts, claims, and evidence • Edges connect related properties Anomalous properties have small number of connections Likely invariants have large number of connections 7 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  8. Big Code Front-End Collector ! Classifier ! Diverse, Representative, ! Ontological Structure ! High-fidelity corpus ! Types and Proofs ! Binary decompilation ! Static and dynamic analyses ! Theorem proving ! Environment and platform dependencies, ! Tests and runtime verification ! Program ! Models (memory, execution, …) ! Executable Specifications ! Analyses ! Model Checking ! Abstract Interpretation ! Contracts and assertions ! Documentation extraction ! Canonical and persistent representation of analysis Database ! outputs construction ! 8 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  9. Big Code Back-End Mining Inference Engine Distributed Graph Database Property Checking Navigation and Search Query Specification DSLs Queries Queries Language Language Learning and Protocol Synthesis Framework Model Generation Discovery 9 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  10. Dependencies Infrastructure Widget Synthesis & Repair Analytics Sketch Mining Engine Based Synthesis Artifact Store Trace Analysis Graph Artifact Ontic Types Visualization Generators & Clichés Datalog Invariant Detection Evaluator Probabilistic Type Inference Systems Ontology / Datalog Analyses Protocol Repair Collection Classification (static, dynamic, concolic) & Patch Synthesis Repair Bayesian Demo Specification Queries Workshops Abstract Draft-based Interpretation Cloud Synthesis Infrastructure Static Challenge Abductive Dependently Analysis Deep Learning Problems Inference Typed IR & Hypothesis Specification Binary LLVM Generation Extraction Convex Fault Localization Optimization & Repair Multii-Layered Database Design Pattern Flaw Detection Synthesis from & Repair Specifications 10 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  11. Corpus Currently, ~6TB Java and C, C++ 11 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  12. Draper Labs: The DeepCode Architecture Source: Draper Labs 12 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  13. Artifact Generation Use of clang and Draper’s open-source Fracture decompiler support both compile down of source and binary lift to LLVM Intermediate Representation (IR) Source: Draper Labs 13 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  14. Deep Learning Analytics Source: Draper Labs 14 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  15. Finding Heartbleed using Big Code (Draper) Buggy Program: Heartbleed bug 170K C/C++ Deep Learning Projects Graph Layer ~400GB ~20M artifacts (calls graphs, CFGs, etc.) Artifact Generator LLVM Identify and classify design patterns (flaws and repairs) ANTLR4 Blue-Good Red-Bad Fracture Math Layer Metadata Extractor if (1+2+16 > s->s3->rrec.length) return 0; Repaired Program: if (1+2+payload+16 > s->s3->rrec.length) Added bounds checks return 0; if (write_length > SSL3_RT_MAX_PLAIN_LENGTH) return 0; Distribution Statement A - Approved for Public Release, Distribution Unlimited

  16. Kestrel Institute: Synthesis using Big Code Source: Kestrel Institute 16 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  17. Kestrel Institute: Proof-Directed Synthesis Using Big-Code Source: Kestrel Institute 17 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  18. Artifact Generation Process Source: Kestrel Institute 18

  19. Features Source: Kestrel Institute 19 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  20. AES Synthesis using Big Code (Kestrel) (defthm bytep-of-xtime (implies (bytep b) Specification (bytep (xtime b))) :hints (("Goal" :in-theory (enable acl2::shl)))) Machine Learning 130K Java 180 out of 130K projects Projects relevant to AES ~2.3B methods Control Flow Types Graphs ~200B facts Synthesis + Proof Refinement Analysis & API sequences Specification Extraction Proofs 422 Features Program public static int lookup (int[][] arr, int hex) { int row = hex >> 4; Implementation + int column = hex & 0xF; Proof of Correctness return arr[row][column]; } Distribution Statement A - Approved for Public Release, Distribution Unlimited

  21. Challenge Problems – Phase 1 Problem Approach Synthesis from demonstrations in Dynamic tracing analysis Swing/Eclipse Synthesis of AES Specification-driven (synthesis-by-construction) Automated repair of incorrect API Code transfer usage in Android Repair of incorrect invariants (off-by- Deep learning one errors) in C/C++ code Synthesize a communication module User-directed cliché discovery for a drone Complete a partial implementation of Sketch-based synthesis binary search tree Graph classification and repair Repair incorrect graph implementations from specifications 21 Distribution Statement A - Approved for Public Release, Distribution Unlimited

  22. www.darpa.mil 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend