safe self attentive function embedding for binary
play

SAFE: Self Attentive Function Embedding for Binary Similarity Luca - PowerPoint PPT Presentation

SAFE: Self Attentive Function Embedding for Binary Similarity Luca Massarelli PhD Student @ Sapienza University of Rome Who am I? Exploring how to leverage Artificial Intelligence to improve security! Reverse Engineering is painful


  1. SAFE: Self Attentive Function Embedding for Binary Similarity Luca Massarelli

  2. PhD Student @ Sapienza University of Rome Who am I? Exploring how to leverage Artificial Intelligence to improve security!

  3. Reverse Engineering is painful … Image Credit: G. A. Di Luna

  4. Binary Similarity Problem

  5. App ppli licatio ions • Vulnerability Detection • Library Function Identification • Malware Hunting

  6. Existing Commercial IDA F.L.I.R.T. Solutions DIAPHORA

  7. Not Scalable (BinDiff - Diaphora) Require an extact copy of the function (IDA F.L.I.R.T. - YARA) Analyst have to write rule (YARA) Mai ain Lim imit itatio ions

  8. A few word about recompilation Easy to do! Effective

  9. How to create new efficient and effective solutions?

  10. Representation of words, sentences or documents using vector! EMBEDDINGS!! 𝐶𝐽𝑂𝐵𝑆𝑍 = 𝑤1 = [ 0.17 , 0. 19 , … , 0.21] 𝐶𝐽𝑂𝐵𝑆𝐽𝐹𝑇 = 𝑤2 = [ 0.16 , 0. 23 , … , 0.20] 𝑇𝐽𝑁 𝐶𝐽𝑂𝐵𝑆𝑍, 𝐶𝐽𝑂𝐵𝑆𝐽𝐹𝑇 = < 𝑤1, 𝑤2 > = 0.9 IDEA BORROWED FROM Natural Language Processing

  11. • The embedding of each word is computed with an unsupervised Word2Vec Model algorithm that consider the context in od the word.

  12. • Words relationship can be retrieved from the embeddings: 𝑛𝑏𝑜 ∶ 𝑥𝑝𝑛𝑓𝑜 = 𝑙𝑗𝑜𝑕 ∶ ? ? ? Word2Vec Model 𝑤2𝑥 𝑛𝑏𝑜 − 𝑤2𝑥 𝑙𝑗𝑜𝑕 + 𝑤2𝑥 𝑥𝑝𝑛𝑓𝑜 = 𝑥2𝑤(𝑟𝑣𝑓𝑓𝑜)

  13. Word2Vec Model For ASM We can do the same with assembly code! 𝑞𝑣𝑡ℎ 𝑠𝑐𝑞 ∶ 𝑞𝑝𝑞 𝑠𝑐𝑞 = 𝑞𝑣𝑡ℎ 𝑠𝑏𝑦 ∶ ? ? ? pop rax

  14. How we ag aggregate instruction embeddings to function embeddings?

  15. Structured Self Attentive Model

  16. The Full Pipeline

  17. • This is easy!!! • We compile 11 different projects with different compilers and optimization! • … and we disassemble everithing! Creating the dataset

  18. It works!! • AUC: • SAFE: 0.99 • I2v_attention: 0.96 • Gemini (MFE): 0.95 • We tested SAFE on different task!

  19. Function Search Engine! • We tested SAFE as a function search engine! • We try to retrieve from a knowledge base similar function to the query!

  20. Semantic Classification • We try to classify functions to 4 different semantic classes using embeddings! • Math • String • Encryption • Sorting

  21. Semantic Classification (S) Sorting (E) Encryption Visualization (SM) String Manipulation (M) Math Embeddings are clustered in the space according to their semantic! classifier flagged classifier • flags confirmed files • fier flags confirmed find final files

  22. IDENTIFICATION OF AN IDENTIFICATION OF A ENCRYPTION FUNCTION VULNERABLE FUNCTIONS INSIDE A MALWARE! INSIDE A FIRMWARE! Applications YARASAFE – USING SAFE INSIDE YARA

  23. TeslaCrypt Ransomware • We disassemble the sample with IDA and we used our semantic classifier to analyze every function! • The Classifier founds seven functions that has encryption semantic! • 6 of them were effectively performing encryption!! Sample:3372c1edab46837f1e973164fa2d726c5c5e17bcb888828ccd7c4dfcc234a370 Detected Functions: 0x41e900, 0x420ec0, 0x4210a0,0x4212c0, 0x421665,0x421900, 0x4219c0

  24. Function Detected At 0x41E900 SHA1 Constant

  25. Possible improvent: Detecting Suspicious functionality inside a firmware

  26. • We develop a tool: YARASAFE, to simplify this process! Spotting Vulnerability in COTS software

  27. YARA-SAFE

  28. import "safe" rule Heartbleed { condition: safe.similarity ("[0.094, …. , 0.0597]") > 0.97 } YARA-SAFE Rule

  29. Rule - Creation

  30. DEMO!!

  31. Pape per Github hub

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend