Changelog Changes made in this version not seen in fjrst lecture: 7 - PowerPoint PPT Presentation

secondary opcodes: OP q B j CC Dest 7 cc call Dest 8 0 ret 9 0 pushq rA A 0 rA F popq rA 0 rA F 6 V D D Dest Dest 0 add 1 sub 2 and 3 xor fn rA rB OP q rA, rB byte: 0 0 1 2 3 4 5 6 7 8 9 halt 0 nop 0 rA rB 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 4 0 rA rB mrmovq D(rB), rA 5 26

Registers: rA, rB 8 %rdx 2 %r9 9 %rcx 1 %r8 %rax %r10 0 Dest Dest D D V 0 rA F B A 3 0 rA F %r13 none F %rdi 7 %r14 E %rsi 6 D %rbx %rbp 5 %r12 C %rsp 4 %r11 B popq rA A byte: 9 rrmovq / cmovCC rA, rB 0 1 nop 0 0 halt 8 irmovq V, rB 7 6 5 4 3 2 1 0 2 cc rA rB 3 pushq rA fn rA rB 0 9 ret 0 8 call Dest 7 cc j CC Dest 6 0 OP q rA, rB 0 rA rB 5 mrmovq D(rB), rA 0 rA rB 4 rmmovq rA, D(rB) F rB 27

Immediates: V, D, Dest 9 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 ret 0 mrmovq D(rB), rA pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 5 0 rA rB byte: halt 0 1 2 3 4 5 6 7 8 9 0 4 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 28

Y86-64 encoding (1) long addOne( long x) { return x + 1; } x86-64: movq %rdi, %rax addq $1, %rax ret Y86-64: irmovq $1, %rax addq %rdi, %rax ret 29

Y86-64 encoding (2) addOne: irmovq $1, %rax addq %rdi, %rax ret 3 0 F %rax 01 00 00 00 00 00 00 00 30 F0 01 00 00 00 00 00 00 00 60 70 90 30 ⋆

Y86-64 encoding (2) addOne: irmovq $1, %rax addq %rdi, %rax ret 3 0 F 0 01 00 00 00 00 00 00 00 30 F0 01 00 00 00 00 00 00 00 60 70 90 30 ⋆

Y86-64 encoding (2) addOne: 30 F0 01 00 00 00 00 00 00 00 60 70 90 %rax %rdi add 6 01 00 00 00 00 00 00 00 0 F 0 3 ret %rdi, %rax addq %rax $1, irmovq 30 ⋆

Y86-64 encoding (2) addOne: 30 F0 01 00 00 00 00 00 00 00 60 70 90 0 7 0 6 01 00 00 00 00 00 00 00 0 F 0 3 ret %rdi, %rax addq %rax $1, irmovq 30 ⋆

Y86-64 encoding (2) addOne: 30 F0 01 00 00 00 00 00 00 00 60 70 90 0 9 0 7 0 6 01 00 00 00 00 00 00 00 0 F 0 3 ret %rdi, %rax addq %rax $1, irmovq 30 ⋆

Y86-64 encoding (2) 0 30 F0 01 00 00 00 00 00 00 00 60 70 90 0 9 0 7 0 6 01 00 00 00 00 00 00 00 F addOne: 0 3 ret %rdi, %rax addq %rax $1, irmovq 30

Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 add %rax %rax 31

Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 add %rax %rax 31 ⋆

Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 0 0 0 31 ⋆

Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 0 0 0 7 ge 23 01 00 00 00 00 00 00 31 ⋆

Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 0 0 0 7 5 23 01 00 00 00 00 00 00 31 ⋆

Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 0 0 0 7 5 23 01 00 00 00 00 00 00 31

Y86-64 decoding ret mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 9 4 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 0 rA rB rmmovq rA, D(rB) 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 0 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 1 F rB 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 32

Y86-64 decoding 0 4 0 rA rB mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 ret F rB 9 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest rmmovq rA, D(rB) 0 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 byte: 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 3 0x68 0 1 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 32 ◮ 0 as cc: always ◮ 1 as reg: %rcx ◮ 0 as reg: %rax

Y86-64 decoding ret 0 rA rB mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 9 rmmovq rA, D(rB) 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 4 F rB 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 0 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 0 1 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 32 ◮ 0 as fn: add ◮ 1 as fn: sub

Y86-64 decoding 0 4 0 rA rB mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 ret F rB 9 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest rmmovq rA, D(rB) 0 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 3 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 0 1 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 32 ◮ 2 as cc: l (less than) ◮ hex 84 00 … as little endian Dest:

Y86-64 decoding ret mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 9 4 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 0 rA rB rmmovq rA, D(rB) 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 0 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 1 F rB 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 32

Y86-64: convenience for hardware 9 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 ret 0 mrmovq D(rB), rA pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 5 0 rA rB 4 bits to decode instruction halt size/layout (mostly) uniform placement of operands (“uniform decode”) jumping to zeroes (uninitialized?) by accident halts no attempt to fjt (parts of) multiple instructions in a byte byte: 0 1 2 3 4 5 6 7 8 9 0 4 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 33

Y86-64 Y86-64: simplifjed, more RISC-y version of X86-64 minimal set of arithmetic only movs touch memory simple variable-length encoding later: implementing with circuits 34 only jumps, calls, and movs take immediates

extracting opcodes (1) A 6 fn rA rB j CC Dest 7 cc call Dest 8 0 ret 9 0 pushq rA 0 rA F 0 rA rB popq rA B 0 rA F V D D Dest Dest typedef unsigned char byte ; int get_opcode ( byte * instr ) { return ???; } OP q rA, rB 5 byte: 0 0 1 2 3 4 5 6 7 8 9 halt 0 mrmovq D(rB), rA nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 4 0 rA rB 35

extracing opcodes (2) typedef unsigned char byte ; int get_opcode_and_function ( byte * instr ) { return instr [0]; } /* first byte = opcode * 16 + fn/cc code */ int get_opcode ( byte * instr ) { return instr [0] / 16; } 36

aside: division division is really slow Intel “Skylake” microarchitecture: …and much worse for eight-byte division but this case: it’s just extracting ‘top wires’ — simpler? 37 about six cycles per division versus: four additions per cycle

extracting opcode in hardware 0111 0010 = 0x72 (fjrst byte of jl) 2 38 0 0 1 0 0 0 0 0

exposing wire selection 1 0 0 0 0 0 0 0 0 ? ? ? ? 1 1 1 1 1 0 x86 instruction: shr — shift right 0 shr $ amount , %reg (or variable: shr %cl, %reg ) %reg (initial value) %reg (fjnal value) 0 0 0 0 0 0 … … … … 39 0 0 1 0

shift right x86 instruction: shr — shift right shr $ amount , %reg (or variable: shr %cl, %reg ) get_opcode: // intel syntax: movzx eax, byte ptr [rdi] movzbl (%rdi), %eax shrl $4, %eax ret 40 // eax ← byte at memory[rdi] with zero padding

right shift in C get_opcode: // %rdi -- instruction address // intel syntax: movzx eax, byte ptr [rdi] movzbl (%rdi), %eax shrl $4, %eax ret typedef unsigned char byte ; int get_opcode ( byte * instr ) { return instr [0] >> 4; } 41 // eax ← one byte of memory[rdi] with zero padding

right shift in C typedef unsigned char byte ; int get_opcode1 ( byte * instr ) { return instr [0] >> 4; } int get_opcode2 ( byte * instr ) { return instr [0] / 16; } example output from optimizing compiler: get_opcode1: movzbl (%rdi), %eax shrl $4, %eax ret get_opcode2: movb (%rdi), %al shrb $4, %al movzbl %al, %eax ret 42

right shift in math 1 >> 0 == 1 0000 0001 1 >> 1 == 0 0000 0000 1 >> 2 == 0 0000 0000 10 >> 0 == 10 0000 1010 10 >> 1 == 5 0000 0101 10 >> 2 == 2 0000 0010 43 � x × 2 − y � x >> y =

constructing instructions typedef unsigned char byte ; byte make_simple_opcode ( byte icode ) { // function code is fixed as 0 for now return opcode * 16; } 44

constructing instructions in hardware icode 0 0 0 0 opcode 45

shift left 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 opcode >> (-4) shr $-4, %reg 46 ✭✭✭✭✭✭✭✭✭✭✭✭ ❤❤❤❤❤❤❤❤❤❤❤❤ instead: shl $4, %reg (“ sh ift l eft”) ❤❤❤❤❤❤❤❤❤❤❤❤ ✭ ✭✭✭✭✭✭✭✭✭✭✭✭ ❤ instead: opcode << 4 1 0 1 1 0

shift left 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 x86 instruction: shl — shift left 0 … … … … 1 1 0 0 0 %reg (fjnal value) %reg (initial value) shl $ amount , %reg (or variable: shr %cl, %reg ) 47 1 0 1 1 0

left shift in math 1 << 0 == 1 0000 0001 1 << 1 == 2 0000 0010 1 << 2 == 4 0000 0100 10 << 0 == 10 0000 1010 10 << 1 == 20 0001 0100 10 << 2 == 40 0010 1000 << 48

left shift in math 1 << 0 == 1 0000 0001 1 << 1 == 2 0000 0010 1 << 2 == 4 0000 0100 10 << 0 == 10 0000 1010 10 << 1 == 20 0001 0100 10 << 2 == 40 0010 1000 48 x << y = x × 2 y

extracting icode from more rB } return ( value % 256) / 16; unsigned extract_opcode2 (unsigned value ) { } return ( value / 16) % 16; unsigned extract_opcode1 (unsigned value ) { // % -- remainder rA ifun 1 icode 0 0 0 1 1 1 1 49 0 0 1 0 0 0 0 0

manipulating bits? easy to manipulate individual bits in HW how do we expose that to software? 50

interlude: a truth table AND 0 1 0 0 0 1 0 1 AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove 51

bitwise AND — & 0 … 1 0 1 0 & … 1 0 1 1 … 0 0 1 0 0 0 Treat value as array of bits 0 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2 … 0 1 0 0 & … 0 1 0 0 … 52

bitwise AND — C/assembly x86: and %reg, %reg 53 C: foo & bar

bitwise hardware ( 10 & 7 == 2 ) 1 0 0 1 0 0 1 1 1 10 0 1 0 . . . 7 54

extract opcode from larger unsigned extract_opcode1_bitwise (unsigned value ) { return ( value >> 4) & 0xF; // 0xF: 00001111 // like (value / 16) % 16 } unsigned extract_opcode2_bitwise (unsigned value ) { return ( value & 0xF0) >> 4; // 0xF0: 11110000 // like (value % 256) / 16; } 55

extract opcode from larger extract_opcode1_bitwise: movl %edi, %eax shrl $4, %eax andl $0xF, %eax ret extract_opcode2_bitwise: movl %edi, %eax andl $0xF0, %eax shrl $4, %eax ret 56

more truth tables 0 1 0 0 1 1 1 & XOR conditionally clear bit conditionally keep bit | conditionally set bit ^ conditionally fmip bit 0 1 AND 0 0 1 0 0 0 1 1 1 OR 0 1 0 0 1 1 57

bitwise OR — | 0 … 1 0 1 0 | … 1 1 1 1 … 1 1 1 1 0 1 1 | 1 == 1 1 1 | 0 == 1 0 | 0 == 0 2 | 4 == 6 10 | 7 == 15 … 0 0 0 0 | … 0 1 0 0 … 58

Changelog Changes made in this version not seen in fjrst lecture: 7 - PowerPoint PPT Presentation

Changelog Changes made in this version not seen in fjrst lecture: 7 September 2017: slide 37: correct text about division speed: four-byte division is weirdly not much slower than 1-byte division on Skylake (but 64-bit division is much slower)

FAT cont / HDDs/ SSDs / inodes 1 Changelog Changes made in this version not seen in fjrst

Scheduling 1 1 Changelog Changes made in this version not seen in fjrst lecture: 10 September:

Optimization part 1 1 Changelog Changes made in this version not seen in fjrst lecture: 29 Feb

Threads / Synchronization 1 1 Changelog Changes made in this version not seen in fjrst lecture:

RPC (fjnish) / two-phase commit 1 Changelog Changes made in this version not seen in fjrst

Changelog Changes made in this version not seen in fjrst lecture: 15 January: logistics

thread creation / POSIX / process management 1 Changelog 21 January 2020 (between 12:30pm and

injection vulnerabilities 1 Changelog Corrections made in this version not in fjrst posting: 17

self-replicating malware 1 Changelog Corrections made in this version not in fjrst posting: 1

Viruses cont 1 Changelog Corrections made in this version not in fjrst posting: 6 Feb 2017:

1 Changelog Changes made in this version not seen in fjrst lecture: 25 September: add back

Changelog Changes made in this version not seen in fjrst lecture: sz is one past the end of the

Changelog Changes made in this version not seen in fjrst lecture: 30 August: juggling stacks: add

Changelog Changes made in this version not seen in fjrst lecture: 6 September: fjx stray @s on

Changelog Changes made in this version not seen in fjrst lecture: 18 Feb 2019: counting to binary

capabilities / virtual machines 1 Changelog Changes not seen in fjrst lecture: 23 April 2020:

B.d) AES W. Schindler: Cryptography, B-IT, winter 2006 / 2007 2 B.96 AES (Advanced Encryption

Database Systems IIB: DBMS-Implementation Chapter 8: Row Format Prof. Dr. Stefan Brass

Algorithms and Architecture 1 Representing and Manipulating Numbers Alexandre David 1 Outline

PROGRAMMAZIONE PROCEDURALE A.A. 2020/2021 LITERALS LITERALS A literal is a token that denotes a

Rotational Cryptanalysis in the Presence of Constants Tomer Ashur Yunwen Liu ESAT/COSIC, KU

Beginning C Programming for Engineers Lecture 6: Bit Operations R. Lindsay Todd Bit Operations

Numbers Genome 373 Genomic Informatics Elhanan Borenstein Numbers Python defines various

Fundamentals of Programming Lecture 2 Hamed Rasifard Outline Numerical Systems