1 Changelog Changes made in this version not seen in fjrst lecture: - - PowerPoint PPT Presentation

1 changelog
SMART_READER_LITE
LIVE PREVIEW

1 Changelog Changes made in this version not seen in fjrst lecture: - - PowerPoint PPT Presentation

1 Changelog Changes made in this version not seen in fjrst lecture: 12 September 2017: slide 28, 33: quote solution that uses z correctly 1 last time Y86 choices, encoding and decoding shift operators right shift = dividing by power of


slide-1
SLIDE 1

1

slide-2
SLIDE 2

Changelog

Changes made in this version not seen in fjrst lecture:

12 September 2017: slide 28, 33: quote solution that uses z correctly

1

slide-3
SLIDE 3

last time

Y86 — choices, encoding and decoding shift operators

shr assembly, >> in C right shift = towards least signifjcant bit right shift = dividing by power of two

2

slide-4
SLIDE 4
  • n the quizzes in general

yes, I know quizzes are hard intention: quiz questions from slides/etc. + some serious thought

(and sometimes I miss the mark)

main purpose: review material other than before exams, warning sign for me

why graded? because otherwise…

3

slide-5
SLIDE 5
  • n the quiz (1)

RISC versus CISC: about simplifying hardware variable-length encoding is less simple for HW instructions chosen more based on what’s simple for HW

(e.g. push/pop not simple for HW)

more registers — simpler than adding more instructions

compensates for seperate memory instructions

4

slide-6
SLIDE 6
  • n the quiz (2)

instruction set — what the instructions do size of memory address — operands to rmmovq, etc. mean what? fmoating point support — do such instructions exist?

5

slide-7
SLIDE 7

constructing instructions

typedef unsigned char byte; byte make_simple_opcode(byte icode) { // function code is fixed as 0 for now return opcode * 16; // 16 = 1 0000 in binary }

6

slide-8
SLIDE 8

constructing instructions in hardware

icode 0 0 0 0

  • pcode

7

slide-9
SLIDE 9

reversed shift right?

✭✭✭✭✭✭✭✭✭✭✭✭ ❤❤❤❤❤❤❤❤❤❤❤❤

shr $-4, %reg

✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤ ❤

  • pcode >> (-4)

8

slide-10
SLIDE 10

shift left

x86 instruction: shl — shift left C: value << amount shl $amount, %reg (or variable: shr %cl, %reg)

%reg (initial value) %reg (fjnal value) 1 0 1 1 0 1 1 … … … … 1 1

9

slide-11
SLIDE 11

shift left

x86 instruction: shl — shift left C: value << amount shl $amount, %reg (or variable: shr %cl, %reg)

%reg (initial value) %reg (fjnal value) 1 0 1 1 0 1 1 … … … … 1 1

9

slide-12
SLIDE 12

left shift in math

1 << 0 == 1 0000 0001 1 << 1 == 2 0000 0010 1 << 2 == 4 0000 0100 10 << 0 == 10 0000 1010 10 << 1 == 20 0001 0100 10 << 2 == 40 0010 1000

<<

10

slide-13
SLIDE 13

left shift in math

1 << 0 == 1 0000 0001 1 << 1 == 2 0000 0010 1 << 2 == 4 0000 0100 10 << 0 == 10 0000 1010 10 << 1 == 20 0001 0100 10 << 2 == 40 0010 1000

x << y = x × 2y

10

slide-14
SLIDE 14

extracting icode from more

1 1 1 1 1 0 0 1 0 0 0 0 0 icode ifun rB rA // % -- remainder unsigned extract_opcode1(unsigned value) { return (value / 16) % 16; } unsigned extract_opcode2(unsigned value) { return (value % 256) / 16; }

11

slide-15
SLIDE 15

extracting icode from more

1 1 1 1 1 0 0 1 0 0 0 0 0 icode ifun rB rA // % -- remainder unsigned extract_opcode1(unsigned value) { return (value / 16) % 16; } unsigned extract_opcode2(unsigned value) { return (value % 256) / 16; }

11

slide-16
SLIDE 16

manipulating bits?

easy to manipulate individual bits in HW how do we expose that to software?

12

slide-17
SLIDE 17

interlude: a truth table

AND 1 1 1

AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove

13

slide-18
SLIDE 18

interlude: a truth table

AND 1 1 1

AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove

13

slide-19
SLIDE 19

interlude: a truth table

AND 1 1 1

AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove

13

slide-20
SLIDE 20

interlude: a truth table

AND 1 1 1

AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove

13

slide-21
SLIDE 21

bitwise AND — &

Treat value as array of bits 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2

… 1 & … 1 … … 1 1 & … 1 1 1 … 1

14

slide-22
SLIDE 22

bitwise AND — &

Treat value as array of bits 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2

… 1 & … 1 … … 1 1 & … 1 1 1 … 1

14

slide-23
SLIDE 23

bitwise AND — &

Treat value as array of bits 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2

… 1 & … 1 … … 1 1 & … 1 1 1 … 1

14

slide-24
SLIDE 24

bitwise AND — C/assembly

x86: and %reg, %reg C: foo & bar

15

slide-25
SLIDE 25

bitwise hardware (10 & 7 == 2)

10 7 . . .

1 1 1 1 1 1

16

slide-26
SLIDE 26

extract opcode from larger

unsigned extract_opcode1_bitwise(unsigned value) { return (value >> 4) & 0xF; // 0xF: 00001111 // like (value / 16) % 16 } unsigned extract_opcode2_bitwise(unsigned value) { return (value & 0xF0) >> 4; // 0xF0: 11110000 // like (value % 256) / 16; }

17

slide-27
SLIDE 27

extract opcode from larger

extract_opcode1_bitwise: movl %edi, %eax shrl $4, %eax andl $0xF, %eax ret extract_opcode2_bitwise: movl %edi, %eax andl $0xF0, %eax shrl $4, %eax ret

18

slide-28
SLIDE 28

more truth tables

AND 1 1 1 OR 1 1 1 1 1 XOR 1 1 1 1 & conditionally clear bit conditionally keep bit | conditionally set bit ^ conditionally fmip bit

19

slide-29
SLIDE 29

bitwise OR — |

1 | 1 == 1 1 | 0 == 1 0 | 0 == 0 2 | 4 == 6 10 | 7 == 15

… 1 1 | … 1 1 1 … 1 1 1 1

20

slide-30
SLIDE 30

bitwise xor — ̂

1 ^ 1 == 0 1 ^ 0 == 1 0 ^ 0 == 0 2 ^ 4 == 6 10 ^ 7 == 13

… 1 1 ^ … 1 1 1 … 1 1 1

21

slide-31
SLIDE 31

negation / not — ~

~ (‘complement’) is bitwise version of !:

!0 == 1 !notZero == 0 ~0 == (int) 0xFFFFFFFF (aka −1) ~2 == (int) 0xFFFFFFFD (aka

3)

~((unsigned) 2) == 0xFFFFFFFD

~ … 1 1 … 1 1 1 1 32 bits

22

slide-32
SLIDE 32

negation / not — ~

~ (‘complement’) is bitwise version of !:

!0 == 1 !notZero == 0 ~0 == (int) 0xFFFFFFFF (aka −1) ~2 == (int) 0xFFFFFFFD (aka −3) ~((unsigned) 2) == 0xFFFFFFFD

~ … 1 1 … 1 1 1 1 32 bits

22

slide-33
SLIDE 33

negation / not — ~

~ (‘complement’) is bitwise version of !:

!0 == 1 !notZero == 0 ~0 == (int) 0xFFFFFFFF (aka −1) ~2 == (int) 0xFFFFFFFD (aka −3) ~((unsigned) 2) == 0xFFFFFFFD

~ … 1 1 … 1 1 1 1 32 bits

22

slide-34
SLIDE 34

note: ternary operator

w = (x ? y : z) if (x) { w = y; } else { w = z; }

23

slide-35
SLIDE 35
  • ne-bit ternary

(x ? y : z) constraint: everything is 0 or 1 now: reimplement in C without if/else/||/etc.

(assembly: no jumps probably)

divide-and-conquer:

(x ? y : 0) (x ? 0 : z)

24

slide-36
SLIDE 36
  • ne-bit ternary

(x ? y : z) constraint: everything is 0 or 1 now: reimplement in C without if/else/||/etc.

(assembly: no jumps probably)

divide-and-conquer:

(x ? y : 0) (x ? 0 : z)

24

slide-37
SLIDE 37
  • ne-bit ternary parts (1)

constraint: everything is 0 or 1 (x ? y : 0) y=0 y=1 x=0 0 x=1 1 (x & y)

25

slide-38
SLIDE 38
  • ne-bit ternary parts (1)

constraint: everything is 0 or 1 (x ? y : 0) y=0 y=1 x=0 x=1 1 → (x & y)

25

slide-39
SLIDE 39
  • ne-bit ternary parts (2)

(x ? y : 0) = (x & y) (x ? 0 : z)

  • pposite x: ~x

((~x) & z)

26

slide-40
SLIDE 40
  • ne-bit ternary parts (2)

(x ? y : 0) = (x & y) (x ? 0 : z)

  • pposite x: ~x

((~x) & z)

26

slide-41
SLIDE 41
  • ne-bit ternary

constraint: everything is 0 or 1 — but y, z is any integer (x ? y : z) (x ? y : 0) | (x ? 0 : z) (x & y) | ((~x) & z)

27

slide-42
SLIDE 42

multibit ternary

constraint: x is 0 or 1

  • ld solution ((x & y) | (~x) & 1)
  • nly gets least sig. bit

(x ? y : z) (x ? y : 0) | (x ? 0 : z) (( x) & y) | (( (x ^ 1)) & z)

28

slide-43
SLIDE 43

multibit ternary

constraint: x is 0 or 1

  • ld solution ((x & y) | (~x) & 1)
  • nly gets least sig. bit

(x ? y : z) (x ? y : 0) | (x ? 0 : z) (( x) & y) | (( (x ^ 1)) & z)

28

slide-44
SLIDE 44

constructing masks

constraint: x is 0 or 1 (x ? y : 0) if x = 1: want 1111111111…1 (keep y) if x = 0: want 0000000000…0 (want 0)

  • ne idea: x | (x << 1) | (x << 2) | ...

a trick: x ((-x) & y)

29

slide-45
SLIDE 45

constructing masks

constraint: x is 0 or 1 (x ? y : 0) if x = 1: want 1111111111…1 (keep y) if x = 0: want 0000000000…0 (want 0)

  • ne idea: x | (x << 1) | (x << 2) | ...

a trick: −x ((-x) & y)

29

slide-46
SLIDE 46

two’s complement refresher

1

−231

1

+230

1

+229

… 1

+22

1

+21

1

+20

−1 =

0111 1111… 1111 1000 0000… 0000 1111 1111… 1111

30

slide-47
SLIDE 47

two’s complement refresher

1

−231

1

+230

1

+229

… 1

+22

1

+21

1

+20

−1 =

−1 1 231 − 1 −231 −231 + 1

0111 1111… 1111 1000 0000… 0000 1111 1111… 1111

30

slide-48
SLIDE 48

two’s complement refresher

1

−231

1

+230

1

+229

… 1

+22

1

+21

1

+20

−1 =

−1 1 231 − 1 −231 −231 + 1

0111 1111… 1111 1000 0000… 0000 1111 1111… 1111

30

slide-49
SLIDE 49

constructing masks

constraint: x is 0 or 1 (x ? y : 0) if x = 1: want 1111111111…1 (keep y) if x = 0: want 0000000000…0 (want 0)

  • ne idea: x | (x << 1) | (x << 2) | ...

a trick: −x ((-x) & y)

31

slide-50
SLIDE 50

constructing other masks

constraint: x is 0 or 1 (x ? 0 : z) if x = ✓

✓ ❙ ❙

1 0: want 1111111111…1 if x = ✁

✁ ❆ ❆

0 1: want 0000000000…0 mask: ✟✟

❍❍

  • x

(x^1)

32

slide-51
SLIDE 51

constructing other masks

constraint: x is 0 or 1 (x ? 0 : z) if x = ✓

✓ ❙ ❙

1 0: want 1111111111…1 if x = ✁

✁ ❆ ❆

0 1: want 0000000000…0 mask: ✟✟

❍❍

  • x −(x^1)

32

slide-52
SLIDE 52

multibit ternary

constraint: x is 0 or 1

  • ld solution ((x & y) | (~x) & 1)
  • nly gets least sig. bit

(x ? y : z) (x ? y : 0) | (x ? 0 : z) ((−x) & y) | ((−(x ^ 1)) & z)

33

slide-53
SLIDE 53

fully multibit

✭✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤❤ ❤

constraint: x is 0 or 1 (x ? y : z) easy C way: !x = 0 or 1, !!x = 0 or 1

x86 assembly: testq %rax, %rax then sete/setne (copy from ZF)

(x ? y : 0) | (x ? 0 : z) (( !!x) & y) | (( !x) & z)

34

slide-54
SLIDE 54

fully multibit

✭✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤❤ ❤

constraint: x is 0 or 1 (x ? y : z) easy C way: !x = 0 or 1, !!x = 0 or 1

x86 assembly: testq %rax, %rax then sete/setne (copy from ZF)

(x ? y : 0) | (x ? 0 : z) (( !!x) & y) | (( !x) & z)

34

slide-55
SLIDE 55

fully multibit

✭✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤❤ ❤

constraint: x is 0 or 1 (x ? y : z) easy C way: !x = 0 or 1, !!x = 0 or 1

x86 assembly: testq %rax, %rax then sete/setne (copy from ZF)

(x ? y : 0) | (x ? 0 : z) ((−!!x) & y) | ((−!x) & z)

34

slide-56
SLIDE 56

simple operation performance

typical modern desktop processor:

bitwise and/or/xor, shift, add, subtract, compare — ∼ 1 cycle integer multiply — ∼ 1-3 cycles integer divide — ∼ 10-150 cycles

(smaller/simpler/lower-power processors are difgerent) add/subtract/compare are more complicated in hardware! but much more important for typical applications

35

slide-57
SLIDE 57

simple operation performance

typical modern desktop processor:

bitwise and/or/xor, shift, add, subtract, compare — ∼ 1 cycle integer multiply — ∼ 1-3 cycles integer divide — ∼ 10-150 cycles

(smaller/simpler/lower-power processors are difgerent) add/subtract/compare are more complicated in hardware! but much more important for typical applications

35

slide-58
SLIDE 58

problem: any-bit

is any bit of x set? goal: turn 0 into 0, not zero into 1 easy C solution: !(!(x))

another easy solution if you have − or + (lab exercise)

what if we don’t have ! or − or + how do we solve is x is two bits? four bits?

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

36

slide-59
SLIDE 59

problem: any-bit

is any bit of x set? goal: turn 0 into 0, not zero into 1 easy C solution: !(!(x))

another easy solution if you have − or + (lab exercise)

what if we don’t have ! or − or + how do we solve is x is two bits? four bits?

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

36

slide-60
SLIDE 60

problem: any-bit

is any bit of x set? goal: turn 0 into 0, not zero into 1 easy C solution: !(!(x))

another easy solution if you have − or + (lab exercise)

what if we don’t have ! or − or + how do we solve is x is two bits? four bits?

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

36

slide-61
SLIDE 61

wasted work (1)

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

in general: (x & 1) | (y & 1) == (x | y) & 1

(x | (x >> 1) | (x >> 2) | (x >> 3)) & 1

37

slide-62
SLIDE 62

wasted work (1)

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

in general: (x & 1) | (y & 1) == (x | y) & 1

(x | (x >> 1) | (x >> 2) | (x >> 3)) & 1

37

slide-63
SLIDE 63

wasted work (2)

4-bit any set: (x | (x >> 1) | (x >> 2) | (x >> 3)) & 1 performing 3 bitwise ors …each bitwise or does 4 OR operations 2/3 of bitwise ORs useless — don’t use upper bits

. . .

38

slide-64
SLIDE 64

wasted work (2)

4-bit any set: (x | (x >> 1) | (x >> 2) | (x >> 3)) & 1 performing 3 bitwise ors …each bitwise or does 4 OR operations 2/3 of bitwise ORs useless — don’t use upper bits

. . .

38

slide-65
SLIDE 65

any-bit: divide and conquer

four-bit input x = x1x2x3x4 (x >> 1) | x = (x1|0)(x2|x1)(x3|x2)(x4|x3) = y1y2y3y4 = any-of( ) = , = any-of( ) = unsigned int any_of_four(unsigned int x) { int part_bits = (x >> 1) | x; return ((part_bits >> 2) | part_bits) & 1; }

39

slide-66
SLIDE 66

any-bit: divide and conquer

four-bit input x = x1x2x3x4 (x >> 1) | x = (x1|0)(x2|x1)(x3|x2)(x4|x3) = y1y2y3y4 y2 = any-of(x1x2) = x1|x2, y4 = any-of(x3x4) = x3|x4 unsigned int any_of_four(unsigned int x) { int part_bits = (x >> 1) | x; return ((part_bits >> 2) | part_bits) & 1; }

39

slide-67
SLIDE 67

any-bit: divide and conquer

four-bit input x = x1x2x3x4 (x >> 1) | x = (x1|0)(x2|x1)(x3|x2)(x4|x3) = y1y2y3y4 y2 = any-of(x1x2) = x1|x2, y4 = any-of(x3x4) = x3|x4 unsigned int any_of_four(unsigned int x) { int part_bits = (x >> 1) | x; return ((part_bits >> 2) | part_bits) & 1; }

39

slide-68
SLIDE 68

parallelism

bitwise operations — each bit is seperate same idea can apply to more interesting operations ; sometimes specifjc HW support

e.g. x86-64 has a “multiply four pairs of fmoats” instruction

40

slide-69
SLIDE 69

parallelism

bitwise operations — each bit is seperate same idea can apply to more interesting operations 010 + 011 = 101; 001 + 010 = 011 → 01000001 + 01100010 = 10100011 sometimes specifjc HW support

e.g. x86-64 has a “multiply four pairs of fmoats” instruction

40

slide-70
SLIDE 70

parallelism

bitwise operations — each bit is seperate same idea can apply to more interesting operations 010 + 011 = 101; 001 + 010 = 011 → 01000001 + 01100010 = 10100011 sometimes specifjc HW support

e.g. x86-64 has a “multiply four pairs of fmoats” instruction

40

slide-71
SLIDE 71

any-bit-set: 32 bits

unsigned int any(unsigned int x) { x = (x >> 1) | x; x = (x >> 2) | x; x = (x >> 4) | x; x = (x >> 8) | x; x = (x >> 16) | x; return x & 1; }

41

slide-72
SLIDE 72

bitwise strategies

use paper, fjnd subproblems, etc. mask and shift

(x & 0xF0) >> 4

factor/distribute

(x & 1) | (y & 1) == (x | y) & 1

divide and conquer common subexpression elimination

return ((−!!x) & y) | ((−!x) & z) becomes d = !x; return ((−!d) & y) | ((−d) & z)

42

slide-73
SLIDE 73

exercise

Which of these will swap last and second-to-last bit of an unsigned int x? (abcdef becomes abcd fe)

/* version A */ return ((x >> 1) & 1) | (x & (~1)); /* version B */ return ((x >> 1) & 1) | ((x << 1) & (~2)) | (x & (~3)); /* version C */ return (x & (~3)) | ((x & 1) << 1) | ((x >> 1) & 1); /* version D */ return (((x & 1) << 1) | ((x & 3) >> 1)) ^ x;

43

slide-74
SLIDE 74

version A

/* version A */ return ((x >> 1) & 1) | (x & (~1)); // ^^^^^^^^^^^^^^ // abcdef --> 0abcde -> 00000e // ^^^^^^^^^^ // abcdef --> abcde0 // ^^^^^^^^^^^^^^^^^^^^^^^^^^^ // 00000e | abcde0 = abcdee

44

slide-75
SLIDE 75

version B

/* version B */ return ((x >> 1) & 1) | ((x << 1) & (~2)) | (x & (~3)); // ^^^^^^^^^^^^^^ // abcdef --> 0abcde --> 00000e // ^^^^^^^^^^^^^^^ // abcdef --> bcdef0 --> bcde00 // ^^^^^^^^^ // abcdef --> abcd00

45

slide-76
SLIDE 76

version C

/* version C */ return (x & (~3)) | ((x & 1) << 1) | ((x >> 1) & 1); // ^^^^^^^^^^ // abcdef --> bcde00 // ^^^^^^^^^^^^^^ // abcdef --> 00000f --> 0000f0 // ^^^^^^^^^^^^^ // abcdef --> 0abcde --> 00000e

46

slide-77
SLIDE 77

version D

/* version D */ return (((x & 1) << 1) | ((x & 3) >> 1)) ^ x; // ^^^^^^^^^^^^^^^ // abcdef --> 00000f --> 0000f0 // ^^^^^^^^^^^^^^ // abcdef --> 0000ef --> 00000e // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // 0000fe ^ abcdef --> abcd(f XOR e)(e XOR f)

47

slide-78
SLIDE 78

int lastBit = x & 1; int secondToLastBit = x & 2; int rest = x & ~3; int lastBitInPlace = lastBit << 1; int secondToLastBitInPlace = secondToLastBit >> 1; return rest | lastBitInPlace | secondToLastBitInPlace;

48

slide-79
SLIDE 79

49

slide-80
SLIDE 80

backup slides

50

slide-81
SLIDE 81

right shift in math

1 >> 0 == 1 0000 0001 1 >> 1 == 0 0000 0000 1 >> 2 == 0 0000 0000 10 >> 0 == 10 0000 1010 10 >> 1 == 5 0000 0101 10 >> 2 == 2 0000 0010

x >> y =

x × 2−y 51

slide-82
SLIDE 82

non-power of two arithmetic

unsigned times130(unsigned x) { return x * 130; } unsigned times130(unsigned x) { return (x << 7) + (x << 1); // x * 128 + x * 2 } times130: movl %edi, %eax shll $7, %eax leal (%rax, %rdi, 2), %eax ret

52

slide-83
SLIDE 83

non-power of two arithmetic

unsigned times130(unsigned x) { return x * 130; } unsigned times130(unsigned x) { return (x << 7) + (x << 1); // x * 128 + x * 2 } times130: movl %edi, %eax shll $7, %eax leal (%rax, %rdi, 2), %eax ret

52

slide-84
SLIDE 84

non-power of two arithmetic

unsigned times130(unsigned x) { return x * 130; } unsigned times130(unsigned x) { return (x << 7) + (x << 1); // x * 128 + x * 2 } times130: movl %edi, %eax shll $7, %eax leal (%rax, %rdi, 2), %eax ret

52

slide-85
SLIDE 85

53