hack the cpython
play

Hack The CPython Batuhan Taskaya @isidentical What is hacking? - PowerPoint PPT Presentation

Hack The CPython Batuhan Taskaya @isidentical What is hacking? Why do we hack? Yes, we want FREEDOM! We want to use PEP313! Before we hack, Learn the internals Lexing - Tokenization - Read #define NEWLINE 4 #define INDENT


  1. Hack The CPython Batuhan Taskaya @isidentical

  2. What is hacking?

  3. Why do we hack?

  4. Yes, we want FREEDOM! We want to use PEP313!

  5. Before we hack, Learn the internals

  6. Lexing - Tokenization - Read #define NEWLINE 4 #define INDENT 5 - Split #define DEDENT 6 - Set the first #define LPAR 7 token #define RPAR 8 #define LSQB 9 #define RSQB 10 #define COLON 11 #define COMMA 12

  7. Parsing - Parser - Generated by PGen2 - Keeps record of structres in arcs, dfas etc. - Keeps non-affect things (like whitespace) - Constructs a CST

  8. AST (where actual hack begins) - Generated by ASDL class RewriteName (NodeTransformer): - A highly relational tree def visit_Name(self, node): that constructed from return ast.Name(“a” + CST node.id, node.ctx) - Doesn’t keep any thing if it doesn’t need (like whitespace) - Can be manipulated easily

  9. Bytecode Generation >>> dis.dis("a.xyz(3)") - CFG construction - Compiling to a code 1 0 LOAD_NAME 0 (a) object 2 LOAD_METHOD 1 (xyz) - Peephole 4 LOAD_CONST 0 (3) 6 CALL_METHOD 1 8 RETURN_VALUE

  10. Evaluation - A biiig for loop frame graph - (with labeled goto’s if gcc) - Tons of structs tries to track everything - Based on frame by frame execution atop on stacks - Global & Local namespaces

  11. Let’s Hack

  12. Walrus on Python 3.7 A project that allows you to use walrus operator on python 3.7 with using a new encoding

  13. The Strategy For Hacking - Should run before the tokenization happen - Needs a new tokenizer or modification to python’s tokenize module - Should be tokenized with that tokenizer - Needs an untokenizer that consumes sequence of tokens to construct source back - Should stream that source to real tokenizer

  14. Modifiying the Tokens tokens.COLONEQUAL = 0xFF tokens.tok_name[0xFF] = "COLONEQUAL" - Add a new token under `token` tokenize.EXACT_TOKEN_TYPES[":="] = module (where python keep token tokens.COLONEQUAL names and ids) - Add a new key to `tokenize.EXACT_TOKEN_TYPES` tokenize.PseudoToken = for getting token name when tokenize.Whitespace + tokenize.group( that token streamed r":=", - Updating rule for tokenization tokenize.PseudoExtras, (if not python will throw error tokenize.Number, tokens because it cant tokenize.Funny, understand :=) tokenize.ContStr, tokenize.Name, )

  15. Modifying The Source def generate_walrused_source (readline): - A function that reads walrused source_tokens = list(tokenize(readline)) source and returns the 3.7 modified_source_tokens = adapted source source_tokens.copy() - Tokenizes the walrused source with new modifications for index, token in - Creates a copy of that tokens enumerate(source_tokens): - Uses real one for detection and the copy for modification if token.exact_type == tokens.COLONEQUAL: <code for replacing that token> return untokenize(modified_source_tokens)

  16. Creating decode function for Encoding def decode(input, errors ="strict", - Reads source encoding=None): - Decodes with the actual decoding if not isinstance(input, bytes): - Streams into input, _ = encoding.encode( input, `generate_walrused_source` errors) - Returns the clean source back buffer = io.BytesIO(input) result = generate_walrused_source(buffer.readline) return encoding.decode(result)

  17. Adding a search function - `codecs.register` takes a def search(name): search function that returns if "walrus37" in name: the `codecs.CodecInfo` if the encoding = given name is the codec’s name name.strip("walrus37").strip("-") or else returns `None` "utf8" - For using walrus37 with other encoding = lookup(encoding) encodings then utf8 allow user decoder = <partial decoder with to specify encoding and bind given encoding> that encoding into `decode` function walrus_codec = CodecInfo(...) return walrus_codec

  18. Implementing Rejected PEPs A project that allows you to use features of rejected peps

  19. The Strategy For Hacking - Should run when imported - Should be effective only with-in the Allow(<pep num>) space - If the syntax is used outside the scope should raise the proper error (for an example if I used without the pep313 scope it should raise NameError)

  20. Implementing Peps (Example PEP313) - Should go through all names (a, class PEP313(HandledTransformer ): x, obtainer, I, IV, test) def visit_Name(self, node): - If the name is a valid roman number = roman(node.id) literal if number: - Get the value of that literal and then replace it with proper return ast.Num(number) number return node

  21. Scoping - Should go through all with class PEPTransformer (Transformer): statements def visit_With(self, node): - Find with’s name and check if if <name check>: name is `Allow` pep = <get first arg> - Get args of `Allow` (PEP Number) new_node = <get node> - Dispatch the elements of that with to proper PEP handler copyloc(new_node, node) fix_missing(new_node) return node

  22. Runtime - Run when imported def allow(): - Get the source code of the file main = __import__("__main__") it is imported tf = PEPTransformer() - Transform that source into AST f = main.__file__ - Dispatch AST to Scoping Handler - Get back the AST main_ast = ast.parse(<open>) - Compile AST to bytecode main_ast = tf.visit(main_ast) - Run the bytecode fix_missing_locations(main_ast) bc= compile(main_ast, f, "exec") exec(bc, main. __dict__) allow()

  23. Rusty Return Implicitly return the last expression (like rust)

  24. The Strategy For Hacking - Should run when function decorated - Should be return the last expression - Should support infinite branching

  25. Transforming AST (1) - Visit the function definition class RLR(ast.NodeTransformer ): - Remove the @rlr from the def visit_FunctionDef (self, fn): decorators list (for preventing self._adjust(fn) infinite recursion) ds = filter(lambda d: d.id != "rlr", fn.decorator_list ) fn.decorator_list = list(ds) return fn

  26. Transforming AST (2) - If the last node is an def _adjust(self, container: ast.AST, items: str = "body") -> None: expression should replace last items = getattr(container, items) if items is node with `ast.Return` not None else container - Call itself back while the last last_stmt = items[-1] statement is `ast.If` if isinstance(last_stmt, ast.Expr): items.append(ast.Return(value=items.pop().value)) elif isinstance(last_stmt, ast.If): self._adjust(last_stmt) if len(last_stmt.orelse) > 0: self._adjust(last_stmt.orelse, None) else: return None

  27. Poophole Optimizer An extra bytecode optimizer for python

  28. The Strategy For Hacking - Should run when function decorated - Should go through bytecode and only apply the optimizations the user specified - Should re-set the optimized bytecode

  29. Optimize Function - A decorator that takes a set of @classmethod options def optimize(cls, el): - Creates a `dis.Bytecode` from def wrapper(func): function buffer = Bytecode(func) - Call optimizers by checking the given options if el: - Re-set the bytecode buffer = elem(buffer) - Return the function reset_bytecode(func, buffer) return func return wrapper

  30. Optimizers 1 (Example Elem Local Vars) - Go over bytecode buffer def _elem_locals(self, buffer, - Keep a dict of variables their function): value is a constant (like a int constant_loaded = False or string) stack, symbols = [], {} - Find unused variables for instr in buffer: <create a list of symbols> unuseds = [(unused[0], unused[1]) for unused in symbols.values() if unused[2] == 0]

  31. Optimizers 2 (Example Elem Local Vars) - Remove unused parts from unused_consts, unused_varnames = bytecode [], [] - Remove unnecessary constants offset = 0 - Remove unnecessary symbols for value, unused in unuseds: <replace code> <remove consts> <remove names>

  32. Catlizor v1-extended Assign hooks to python functions without mutating functions

  33. The Strategy For Hacking - Should not mutate the function itself - Should notify before a function call - Should notify during a function call (result = notify(call(x))) - Should notify after a function call

  34. Hooking - Write onto the memory address #pragma pack(push, 1) of default function call jumper = { function .push = 0x50, - Written by @dutc .mov = {0x48, 0xb8}, .jmp = {0xff, 0xe0} }; #pragma pack(pop) lpyhook(_PyFunction_FastCallKeywords, &hookify_PyFunction_FastCallKeywords);

  35. Modifiying - Adding hooks for pre, on call PyObject * and post actions hookify_PyFunction_FastCallKeywords - Calling catlizor interface when (PyObject *func, PyObject * const these hooks activated *stack, Py_ssize_t nargs, PyObject *kwnames) { <code> <code> }

  36. Thanks @isidentical

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend