Reverse Engineering Binary Messages through Design Patterns LangSec - - PowerPoint PPT Presentation

reverse engineering binary messages through design
SMART_READER_LITE
LIVE PREVIEW

Reverse Engineering Binary Messages through Design Patterns LangSec - - PowerPoint PPT Presentation

Reverse Engineering Binary Messages through Design Patterns LangSec 2020 Jared Chandler Kathleen Fisher Tufts University Tufts University Automatic Reverse Engineering of Binary y Messages Who does this: Why: Malware Communication


slide-1
SLIDE 1

Reverse Engineering Binary Messages through Design Patterns

Jared Chandler Tufts University Kathleen Fisher Tufts University LangSec 2020

slide-2
SLIDE 2

Automatic Reverse Engineering of Binary y Messages

Who does this:

  • Researchers
  • Security Analysts
  • Reverse Engineers

Related Problems: Tags, Delimited Data, Long Distance Dependencies Our Focus: Binary messages with variable length Why:

  • Malware Communication Analysis
  • Protocol Validation
  • Old Gear with Lost Specification
slide-3
SLIDE 3

Example of reverse engineering problem

Msg 1 2 3 67 65 84 4 66 73 82 68 Msg 2 1 5 77 79 85 83 69 Msg 3 3 2 79 88 3 68 79 71 3 66 85 71

  • 1. The analyst starts with messages.
slide-4
SLIDE 4

Example of reverse engineering problem

Msg 1 2 3 67 65 84 4 66 73 82 68 Msg 2 1 5 77 79 85 83 69 Msg 3 3 2 79 88 3 68 79 71 3 66 85 71 Msg 1 2 3 67 65 84 4 66 73 82 68 Msg 2 1 5 77 79 85 83 69 Msg 3 3 2 79 88 3 68 79 71 3 66 85 71 1 2

  • 1. The analyst starts with messages.
  • 2. Infers some pattern in the data.
slide-5
SLIDE 5

Example of reverse engineering problem

Msg 1 2 3 67 65 84 4 66 73 82 68 Msg 2 1 5 77 79 85 83 69 Msg 3 3 2 79 88 3 68 79 71 3 66 85 71 C A T B I R D 67 65 84 66 73 82 68

  • 1. The analyst starts with messages.
  • 2. Infers some pattern in the data.
  • 3. Develops a hypothesis.

Msg 1 2 3 67 65 84 4 66 73 82 68 Msg 2 1 5 77 79 85 83 69 Msg 3 3 2 79 88 3 68 79 71 3 66 85 71 1 2

slide-6
SLIDE 6

Example of reverse engineering problem

  • 1. The analyst starts with messages.
  • 2. Infers some pattern in the data.
  • 3. Develops a hypothesis.
  • 4. Validates it on all messages.

Msg 1 2 3 67 65 84 4 66 73 82 68 Msg 2 1 5 77 79 85 83 69 Msg 3 3 2 79 88 3 68 79 71 3 66 85 71 Msg 1 2 3 C A T 4 B I R D Msg 2 1 5 M O U S E Msg 3 3 2 O X 3 D O G 3 B U G C A T B I R D 67 65 84 66 73 82 68 Msg 1 2 3 67 65 84 4 66 73 82 68 Msg 2 1 5 77 79 85 83 69 Msg 3 3 2 79 88 3 68 79 71 3 66 85 71 1 2

slide-7
SLIDE 7

What makes this problem hard for a human?

  • Can take a long time to find a pattern in the data.
  • Bytes at the same offset don’t always come from the same field or type
  • Messages can be hundreds or thousands of bytes in size.

40D513C4221EF3E2EEB96F37D3EB1C10805124771BCB9C146746E2A26CC30EB9E97BBB44821416CEF424837EEBBE8138D2B222B7D B07DE3FBFD791AABB867E876E2D699A0CC2A58299AB227A5822EC480A8C5F9FD7678036093DDA2575C3A762A4EA2F17D18BCC15 385D7973B03128EFCB15CB317A5226B1B6654B01B116A56738B4B5B779F8D68334328C018C64C07A930DCD548F7C6B7A1952E26F2 CA05340EC63BFEF513F3C1E8EB6AF00E14DC5000FE0A9CE5F876B56D7DA73352527329B60B66C552D469F3A2B12A4573B2C111557 4FC4D30F8372A52D868DCC38D7739E94D2C0815000D3B692DCA6D82693AD93D102222D349E9EC4D101F67FC9E702B5430AFB73AB 5361120902A82E4A6FDFF252809B36106B3C3FEC2FC8A98AFC642F1926BD4B3E72C39272004F2B8F731F8145A43D7B4D78BC

311 Byte Msg Msg 1 2 3 C A T 4 B I R D Msg 2 1 5 M O U S E Msg 3 3 2 O X 3 D O G 3 B U G A T 4 B O U S E X 3 D O

slide-8
SLIDE 8

Our Our Aut utoma mated ed Appr pproac ach

slide-9
SLIDE 9

Common Serialization Design Patterns

Variable Quantity Fixed Length

Q ⋅ ( BYTEK )[Q]

Variable Quantity Variable Length

Q ⋅ ( L ⋅ BYTE[L] )[Q]

Length Value

L ⋅ BYTE[L]

BYTE INDEX

Type Length Value / TLV

Q ⋅ ( T ⋅ L ⋅ BYTE[L] )[Q]

5 Z E B R A 3 C A T 2 5 Z E B R A 3 C A T 2 T1 5 Z E B R A T2 3 C A T 2 IP ADDR IP ADDR 3 IP ADDR IP ADDR IP ADDR 1 2 3 4 5 6 7 8 9 10 11 12 Quantity (Q) Length (L) Quantity (Q) Type (T) Length (L) Quantity (Q) Fixed Length (K) Length (L)

slide-10
SLIDE 10

2 3 1 Unexplored Bytes Unexplored Bytes 2 3 1 |4| |4| |4| |4| |4| |4| 2 3 1 |3| |3| |3| |3| |3| |3| 2 3 1 |5| |5| |5| |5| |5| |5| 2 3 1 Unexplored Bytes Unexplored Bytes 2 3 1 Unexplored Bytes Unexplored Bytes

Approach ch for fitting design patterns to data

Success! Apply a single design pattern to all messages. Recurse on leftovers. If we index off the end of any message, try a different pattern.

10

If it fits…

slide-11
SLIDE 11

Hypothesis Space Exploration

Bounded Hypothesis Search Space Unexplored Hypothesis Space

Iterative Deepening

= Hypothesis consistent with message samples

LV ⋅ TLV LV ⋅ TLV ⋅ BYTE LV

slide-12
SLIDE 12

How does our approach perform?

  • 1. Generated permutations of Design Patterns

(Example: LV ⋅ VQVL ⋅ VQFW4 )

  • 2. Used each permutation to serialize values creating 100 messages.
  • 3. Ran our inference Algorithm on each collection of 100 messages.
  • 4. Compared Inferred Patterns with those used to Serialize.

Experimental Condition Test Cases Accuracy Patterns with random values 16500 99.9% Patterns with values from real network traffic 1434 99.37%

slide-13
SLIDE 13

Further Evaluation: Botnet CnC Attack Commands

7 10 2 7 C . R 2 1 77 21 1 10 27 1 A M O G 7 10 10 1 2 5 . D U 2 2 1 88 24 2 10 27 10 7 1 X E 7 10 1 2 8 E E O G 2 1 99 22 1 10 18 1 I E . R 2 4 6 8 10 12 14 16 18 19 20 21 22 23 1 3 5 7 9 11 13 15 17

BYTE INDEX Msg 1 Msg 2 Msg 3

T Y P E L E N G T H V A L U E B Y T E S Q U A N T I T Y V A L U E B Y T E S Q U A N T I T Y M S G L E N G T H C O N S T A N T

Inferred Format 1: BYTE, BYTE, BYTE, VQFL, TLV Inferred Format 2: BYTE, BYTE, BYTE, VQFL, BYTE, BYTE, LV, BYTE, LV

slide-14
SLIDE 14

Next Steps

  • Expand our tool box of design patterns through protocol taxonomy
  • Parallelization
  • Guided Search Heuristics
slide-15
SLIDE 15

Thank You

Jared Chandler jared.chandler@tufts.edu

Acknowledgements: This material is based upon work partly supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No.HR0011-19-C-0073. This project was sponsored in part by the Air Force Research Laboratory (AFRL) under contract number FA8750-19-C-0039. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force.