changelog
play

Changelog Changes made in this version not seen in fjrst lecture: 7 - PowerPoint PPT Presentation

Changelog Changes made in this version not seen in fjrst lecture: 7 September 2017: slide 37: correct text about division speed: four-byte division is weirdly not much slower than 1-byte division on Skylake (but 64-bit division is much slower)


  1. secondary opcodes: OP q B j CC Dest 7 cc call Dest 8 0 ret 9 0 pushq rA A 0 rA F popq rA 0 rA F 6 V D D Dest Dest 0 add 1 sub 2 and 3 xor fn rA rB OP q rA, rB byte: 0 0 1 2 3 4 5 6 7 8 9 halt 0 nop 0 rA rB 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 4 0 rA rB mrmovq D(rB), rA 5 26

  2. Registers: rA, rB 8 %rdx 2 %r9 9 %rcx 1 %r8 %rax %r10 0 Dest Dest D D V 0 rA F B A 3 0 rA F %r13 none F %rdi 7 %r14 E %rsi 6 D %rbx %rbp 5 %r12 C %rsp 4 %r11 B popq rA A byte: 9 rrmovq / cmovCC rA, rB 0 1 nop 0 0 halt 8 irmovq V, rB 7 6 5 4 3 2 1 0 2 cc rA rB 3 pushq rA fn rA rB 0 9 ret 0 8 call Dest 7 cc j CC Dest 6 0 OP q rA, rB 0 rA rB 5 mrmovq D(rB), rA 0 rA rB 4 rmmovq rA, D(rB) F rB 27

  3. Registers: rA, rB 8 %rdx 2 %r9 9 %rcx 1 %r8 %rax %r10 0 Dest Dest D D V 0 rA F B A 3 0 rA F %r13 none F %rdi 7 %r14 E %rsi 6 D %rbx %rbp 5 %r12 C %rsp 4 %r11 B popq rA A byte: 9 rrmovq / cmovCC rA, rB 0 1 nop 0 0 halt 8 irmovq V, rB 7 6 5 4 3 2 1 0 2 cc rA rB 3 pushq rA fn rA rB 0 9 ret 0 8 call Dest 7 cc j CC Dest 6 0 OP q rA, rB 0 rA rB 5 mrmovq D(rB), rA 0 rA rB 4 rmmovq rA, D(rB) F rB 27

  4. Immediates: V, D, Dest 9 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 ret 0 mrmovq D(rB), rA pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 5 0 rA rB byte: halt 0 1 2 3 4 5 6 7 8 9 0 4 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 28

  5. Immediates: V, D, Dest 9 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 ret 0 mrmovq D(rB), rA pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 5 0 rA rB byte: halt 0 1 2 3 4 5 6 7 8 9 0 4 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 28

  6. Y86-64 encoding (1) long addOne( long x) { return x + 1; } x86-64: movq %rdi, %rax addq $1, %rax ret Y86-64: irmovq $1, %rax addq %rdi, %rax ret 29

  7. Y86-64 encoding (1) long addOne( long x) { return x + 1; } x86-64: movq %rdi, %rax addq $1, %rax ret Y86-64: irmovq $1, %rax addq %rdi, %rax ret 29

  8. Y86-64 encoding (2) addOne: irmovq $1, %rax addq %rdi, %rax ret 3 0 F %rax 01 00 00 00 00 00 00 00 30 F0 01 00 00 00 00 00 00 00 60 70 90 30 ⋆

  9. Y86-64 encoding (2) addOne: irmovq $1, %rax addq %rdi, %rax ret 3 0 F 0 01 00 00 00 00 00 00 00 30 F0 01 00 00 00 00 00 00 00 60 70 90 30 ⋆

  10. Y86-64 encoding (2) addOne: 30 F0 01 00 00 00 00 00 00 00 60 70 90 %rax %rdi add 6 01 00 00 00 00 00 00 00 0 F 0 3 ret %rdi, %rax addq %rax $1, irmovq 30 ⋆

  11. Y86-64 encoding (2) addOne: 30 F0 01 00 00 00 00 00 00 00 60 70 90 0 7 0 6 01 00 00 00 00 00 00 00 0 F 0 3 ret %rdi, %rax addq %rax $1, irmovq 30 ⋆

  12. Y86-64 encoding (2) addOne: 30 F0 01 00 00 00 00 00 00 00 60 70 90 0 9 0 7 0 6 01 00 00 00 00 00 00 00 0 F 0 3 ret %rdi, %rax addq %rax $1, irmovq 30 ⋆

  13. Y86-64 encoding (2) 0 30 F0 01 00 00 00 00 00 00 00 60 70 90 0 9 0 7 0 6 01 00 00 00 00 00 00 00 F addOne: 0 3 ret %rdi, %rax addq %rax $1, irmovq 30

  14. Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 add %rax %rax 31

  15. Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 add %rax %rax 31 ⋆

  16. Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 0 0 0 31 ⋆

  17. Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 0 0 0 7 ge 23 01 00 00 00 00 00 00 31 ⋆

  18. Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 0 0 0 7 5 23 01 00 00 00 00 00 00 31 ⋆

  19. Y86-64 encoding (3) doubleTillNegative: /* suppose at address 0x123 */ addq %rax, %rax jge doubleTillNegative 6 0 0 0 7 5 23 01 00 00 00 00 00 00 31

  20. Y86-64 decoding ret mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 9 4 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 0 rA rB rmmovq rA, D(rB) 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 0 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 1 F rB 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 32

  21. Y86-64 decoding ret mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 9 4 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 0 rA rB rmmovq rA, D(rB) 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 0 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 1 F rB 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 32

  22. Y86-64 decoding 0 4 0 rA rB mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 ret F rB 9 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest rmmovq rA, D(rB) 0 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 byte: 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 3 0x68 0 1 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 32 ◮ 0 as cc: always ◮ 1 as reg: %rcx ◮ 0 as reg: %rax

  23. Y86-64 decoding ret 0 rA rB mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 9 rmmovq rA, D(rB) 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 4 F rB 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 0 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 0 1 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 32 ◮ 0 as fn: add ◮ 1 as fn: sub

  24. Y86-64 decoding 0 4 0 rA rB mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 ret F rB 9 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest rmmovq rA, D(rB) 0 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 3 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 0 1 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 32 ◮ 2 as cc: l (less than) ◮ hex 84 00 … as little endian Dest:

  25. Y86-64 decoding ret mrmovq D(rB), rA 5 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 9 4 0 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 0 rA rB rmmovq rA, D(rB) 20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 0 20 12 20 01 70 68 00 00 00 00 00 00 00 rrmovq %rcx, %rax addq %rdx, %rax subq %rbx, %rdi jl 0x84 rrmovq %rax, %rcx jmp 0x68 byte: 1 F rB 2 3 4 5 6 7 8 9 halt 0 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 32

  26. Y86-64: convenience for hardware 9 0 rA rB OP q rA, rB 6 fn rA rB j CC Dest 7 cc call Dest 8 0 ret 0 mrmovq D(rB), rA pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest 5 0 rA rB 4 bits to decode instruction halt size/layout (mostly) uniform placement of operands (“uniform decode”) jumping to zeroes (uninitialized?) by accident halts no attempt to fjt (parts of) multiple instructions in a byte byte: 0 1 2 3 4 5 6 7 8 9 0 4 0 nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 33

  27. Y86-64 Y86-64: simplifjed, more RISC-y version of X86-64 minimal set of arithmetic only movs touch memory simple variable-length encoding later: implementing with circuits 34 only jumps, calls, and movs take immediates

  28. extracting opcodes (1) A 6 fn rA rB j CC Dest 7 cc call Dest 8 0 ret 9 0 pushq rA 0 rA F 0 rA rB popq rA B 0 rA F V D D Dest Dest typedef unsigned char byte ; int get_opcode ( byte * instr ) { return ???; } OP q rA, rB 5 byte: 0 0 1 2 3 4 5 6 7 8 9 halt 0 mrmovq D(rB), rA nop 1 0 rrmovq / cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 0 F rB rmmovq rA, D(rB) 4 0 rA rB 35

  29. extracing opcodes (2) typedef unsigned char byte ; int get_opcode_and_function ( byte * instr ) { return instr [0]; } /* first byte = opcode * 16 + fn/cc code */ int get_opcode ( byte * instr ) { return instr [0] / 16; } 36

  30. aside: division division is really slow Intel “Skylake” microarchitecture: …and much worse for eight-byte division but this case: it’s just extracting ‘top wires’ — simpler? 37 about six cycles per division versus: four additions per cycle

  31. aside: division division is really slow Intel “Skylake” microarchitecture: …and much worse for eight-byte division but this case: it’s just extracting ‘top wires’ — simpler? 37 about six cycles per division versus: four additions per cycle

  32. extracting opcode in hardware 0111 0010 = 0x72 (fjrst byte of jl) 2 38 0 0 1 0 0 0 0 0

  33. exposing wire selection 1 0 0 0 0 0 0 0 0 ? ? ? ? 1 1 1 1 1 0 x86 instruction: shr — shift right 0 shr $ amount , %reg (or variable: shr %cl, %reg ) %reg (initial value) %reg (fjnal value) 0 0 0 0 0 0 … … … … 39 0 0 1 0

  34. exposing wire selection 1 0 0 0 0 0 0 0 0 ? ? ? ? 1 1 1 1 1 0 x86 instruction: shr — shift right 0 shr $ amount , %reg (or variable: shr %cl, %reg ) %reg (initial value) %reg (fjnal value) 0 0 0 0 0 0 … … … … 39 0 0 1 0

  35. exposing wire selection 1 0 0 0 0 0 0 0 0 ? ? ? ? 1 1 1 1 1 0 x86 instruction: shr — shift right 0 shr $ amount , %reg (or variable: shr %cl, %reg ) %reg (initial value) %reg (fjnal value) 0 0 0 0 0 0 … … … … 39 0 0 1 0

  36. shift right x86 instruction: shr — shift right shr $ amount , %reg (or variable: shr %cl, %reg ) get_opcode: // intel syntax: movzx eax, byte ptr [rdi] movzbl (%rdi), %eax shrl $4, %eax ret 40 // eax ← byte at memory[rdi] with zero padding

  37. shift right x86 instruction: shr — shift right shr $ amount , %reg (or variable: shr %cl, %reg ) get_opcode: // intel syntax: movzx eax, byte ptr [rdi] movzbl (%rdi), %eax shrl $4, %eax ret 40 // eax ← byte at memory[rdi] with zero padding

  38. right shift in C get_opcode: // %rdi -- instruction address // intel syntax: movzx eax, byte ptr [rdi] movzbl (%rdi), %eax shrl $4, %eax ret typedef unsigned char byte ; int get_opcode ( byte * instr ) { return instr [0] >> 4; } 41 // eax ← one byte of memory[rdi] with zero padding

  39. right shift in C typedef unsigned char byte ; int get_opcode1 ( byte * instr ) { return instr [0] >> 4; } int get_opcode2 ( byte * instr ) { return instr [0] / 16; } example output from optimizing compiler: get_opcode1: movzbl (%rdi), %eax shrl $4, %eax ret get_opcode2: movb (%rdi), %al shrb $4, %al movzbl %al, %eax ret 42

  40. right shift in C typedef unsigned char byte ; int get_opcode1 ( byte * instr ) { return instr [0] >> 4; } int get_opcode2 ( byte * instr ) { return instr [0] / 16; } example output from optimizing compiler: get_opcode1: movzbl (%rdi), %eax shrl $4, %eax ret get_opcode2: movb (%rdi), %al shrb $4, %al movzbl %al, %eax ret 42

  41. right shift in math 1 >> 0 == 1 0000 0001 1 >> 1 == 0 0000 0000 1 >> 2 == 0 0000 0000 10 >> 0 == 10 0000 1010 10 >> 1 == 5 0000 0101 10 >> 2 == 2 0000 0010 43 � x × 2 − y � x >> y =

  42. constructing instructions typedef unsigned char byte ; byte make_simple_opcode ( byte icode ) { // function code is fixed as 0 for now return opcode * 16; } 44

  43. constructing instructions in hardware icode 0 0 0 0 opcode 45

  44. shift left 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 opcode >> (-4) shr $-4, %reg 46 ✭✭✭✭✭✭✭✭✭✭✭✭ ❤❤❤❤❤❤❤❤❤❤❤❤ instead: shl $4, %reg (“ sh ift l eft”) ❤❤❤❤❤❤❤❤❤❤❤❤ ✭ ✭✭✭✭✭✭✭✭✭✭✭✭ ❤ instead: opcode << 4 1 0 1 1 0

  45. shift left 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 opcode >> (-4) shr $-4, %reg 46 ✭✭✭✭✭✭✭✭✭✭✭✭ ❤❤❤❤❤❤❤❤❤❤❤❤ instead: shl $4, %reg (“ sh ift l eft”) ❤❤❤❤❤❤❤❤❤❤❤❤ ✭ ✭✭✭✭✭✭✭✭✭✭✭✭ ❤ instead: opcode << 4 1 0 1 1 0

  46. shift left 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 x86 instruction: shl — shift left 0 … … … … 1 1 0 0 0 %reg (fjnal value) %reg (initial value) shl $ amount , %reg (or variable: shr %cl, %reg ) 47 1 0 1 1 0

  47. shift left 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 x86 instruction: shl — shift left 0 … … … … 1 1 0 0 0 %reg (fjnal value) %reg (initial value) shl $ amount , %reg (or variable: shr %cl, %reg ) 47 1 0 1 1 0

  48. left shift in math 1 << 0 == 1 0000 0001 1 << 1 == 2 0000 0010 1 << 2 == 4 0000 0100 10 << 0 == 10 0000 1010 10 << 1 == 20 0001 0100 10 << 2 == 40 0010 1000 << 48

  49. left shift in math 1 << 0 == 1 0000 0001 1 << 1 == 2 0000 0010 1 << 2 == 4 0000 0100 10 << 0 == 10 0000 1010 10 << 1 == 20 0001 0100 10 << 2 == 40 0010 1000 48 x << y = x × 2 y

  50. extracting icode from more rB } return ( value % 256) / 16; unsigned extract_opcode2 (unsigned value ) { } return ( value / 16) % 16; unsigned extract_opcode1 (unsigned value ) { // % -- remainder rA ifun 1 icode 0 0 0 1 1 1 1 49 0 0 1 0 0 0 0 0

  51. extracting icode from more rB } return ( value % 256) / 16; unsigned extract_opcode2 (unsigned value ) { } return ( value / 16) % 16; unsigned extract_opcode1 (unsigned value ) { // % -- remainder rA ifun 1 icode 0 0 0 1 1 1 1 49 0 0 1 0 0 0 0 0

  52. manipulating bits? easy to manipulate individual bits in HW how do we expose that to software? 50

  53. interlude: a truth table AND 0 1 0 0 0 1 0 1 AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove 51

  54. interlude: a truth table AND 0 1 0 0 0 1 0 1 AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove 51

  55. interlude: a truth table AND 0 1 0 0 0 1 0 1 AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove 51

  56. interlude: a truth table AND 0 1 0 0 0 1 0 1 AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove 51

  57. bitwise AND — & 0 … 1 0 1 0 & … 1 0 1 1 … 0 0 1 0 0 0 Treat value as array of bits 0 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2 … 0 1 0 0 & … 0 1 0 0 … 52

  58. bitwise AND — & 0 … 1 0 1 0 & … 1 0 1 1 … 0 0 1 0 0 0 Treat value as array of bits 0 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2 … 0 1 0 0 & … 0 1 0 0 … 52

  59. bitwise AND — & 0 … 1 0 1 0 & … 1 0 1 1 … 0 0 1 0 0 0 Treat value as array of bits 0 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2 … 0 1 0 0 & … 0 1 0 0 … 52

  60. bitwise AND — C/assembly x86: and %reg, %reg 53 C: foo & bar

  61. bitwise hardware ( 10 & 7 == 2 ) 1 0 0 1 0 0 1 1 1 10 0 1 0 . . . 7 54

  62. extract opcode from larger unsigned extract_opcode1_bitwise (unsigned value ) { return ( value >> 4) & 0xF; // 0xF: 00001111 // like (value / 16) % 16 } unsigned extract_opcode2_bitwise (unsigned value ) { return ( value & 0xF0) >> 4; // 0xF0: 11110000 // like (value % 256) / 16; } 55

  63. extract opcode from larger extract_opcode1_bitwise: movl %edi, %eax shrl $4, %eax andl $0xF, %eax ret extract_opcode2_bitwise: movl %edi, %eax andl $0xF0, %eax shrl $4, %eax ret 56

  64. more truth tables 0 1 0 0 1 1 1 & XOR conditionally clear bit conditionally keep bit | conditionally set bit ^ conditionally fmip bit 0 1 AND 0 0 1 0 0 0 1 1 1 OR 0 1 0 0 1 1 57

  65. bitwise OR — | 0 … 1 0 1 0 | … 1 1 1 1 … 1 1 1 1 0 1 1 | 1 == 1 1 1 | 0 == 1 0 | 0 == 0 2 | 4 == 6 10 | 7 == 15 … 0 0 0 0 | … 0 1 0 0 … 58

  66. bitwise OR — | 0 … 1 0 1 0 | … 1 1 1 1 … 1 1 1 1 0 1 1 | 1 == 1 1 1 | 0 == 1 0 | 0 == 0 2 | 4 == 6 10 | 7 == 15 … 0 0 0 0 | … 0 1 0 0 … 58

  67. bitwise OR — | 0 … 1 0 1 0 | … 1 1 1 1 … 1 1 1 1 0 1 1 | 1 == 1 1 1 | 0 == 1 0 | 0 == 0 2 | 4 == 6 10 | 7 == 15 … 0 0 0 0 | … 0 1 0 0 … 58

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend