bitwise operators 1 Changelog Changes made in this version not - - PowerPoint PPT Presentation

bitwise operators
SMART_READER_LITE
LIVE PREVIEW

bitwise operators 1 Changelog Changes made in this version not - - PowerPoint PPT Presentation

bitwise operators 1 Changelog Changes made in this version not seen in fjrst lecture: 6 Feb 2018: arithmetic right shift: x86 arith. shift instruction is sar to sra 6 Feb 2018: logical left shift: use shl consistently 6 Feb 2018: exercise C


slide-1
SLIDE 1

bitwise operators

1

slide-2
SLIDE 2

Changelog

Changes made in this version not seen in fjrst lecture:

6 Feb 2018: arithmetic right shift: x86 arith. shift instruction is sar to sra 6 Feb 2018: logical left shift: use shl consistently 6 Feb 2018: exercise C explanation: correct bcde00 typo for abcd00 6 Feb

1

slide-3
SLIDE 3

extracting opcodes (1)

byte: 1 2 3 4 5 6 7 8 9 halt nop 1 rrmovq/cmovCC rA, rB 2 cc rA rB irmovq V, rB 3 F rB rmmovq rA, D(rB) 4 0 rA rB mrmovq D(rB), rA 5 0 rA rB OPq rA, rB 6 fn rA rB jCC Dest 7 cc call Dest 8 ret 9 pushq rA A 0 rA F popq rA B 0 rA F V D D Dest Dest

typedef unsigned char byte; int get_opcode(byte *instr) { return ???; }

2

slide-4
SLIDE 4

extracing opcodes (2)

typedef unsigned char byte; int get_opcode_and_function(byte *instr) { return instr[0]; } /* first byte = opcode * 16 + fn/cc code */ int get_opcode(byte *instr) { return instr[0] / 16; }

3

slide-5
SLIDE 5

aside: division

division is really slow Intel “Skylake” microarchitecture:

about six cycles per division …and much worse for eight-byte division versus: four additions per cycle

but this case: it’s just extracting ‘top wires’ — simpler?

4

slide-6
SLIDE 6

aside: division

division is really slow Intel “Skylake” microarchitecture:

about six cycles per division …and much worse for eight-byte division versus: four additions per cycle

but this case: it’s just extracting ‘top wires’ — simpler?

4

slide-7
SLIDE 7

circuits: wires

1 1 1 1 1 1 binary value — actually voltage value propagates to rest of wire (small delay)

5

slide-8
SLIDE 8

circuits: wires

1 1 1 1 1 1 binary value — actually voltage value propagates to rest of wire (small delay)

5

slide-9
SLIDE 9

circuits: wires

1 1 1 1 1 1 binary value — actually voltage value propagates to rest of wire (small delay)

5

slide-10
SLIDE 10

circuits: wire bundles

1 1 1 1 1 1 11010 = 26 same as 26 26 same as 26 26

6

slide-11
SLIDE 11

circuits: wire bundles

1 1 1 1 1 1 11010 = 26 same as 26 26 same as 26 26

6

slide-12
SLIDE 12

circuits: wire bundles

1 1 1 1 1 1 11010 = 26 same as 26 26 same as 26 26

6

slide-13
SLIDE 13

extracting opcode in hardware

0 1 1 1 0 0 1 0 0111 0010 = 0x72 (fjrst byte of jl) 7

7

slide-14
SLIDE 14

exposing wire selection

x86 instruction: shr — shift right shr $amount, %reg (or variable: shr %cl, %reg)

%reg (initial value) %reg (fjnal value) 0 0 1 0 … … … … 1 1 1 1 1 1 ? ? ? ?

8

slide-15
SLIDE 15

exposing wire selection

x86 instruction: shr — shift right shr $amount, %reg (or variable: shr %cl, %reg)

%reg (initial value) %reg (fjnal value) 0 0 1 0 … … … … 1 1 1 1 1 1 ? ? ? ?

8

slide-16
SLIDE 16

exposing wire selection

x86 instruction: shr — shift right shr $amount, %reg (or variable: shr %cl, %reg)

%reg (initial value) %reg (fjnal value) 0 0 1 0 … … … … 1 1 1 1 1 1 ? ? ? ?

8

slide-17
SLIDE 17

shift right

x86 instruction: shr — shift right shr $amount, %reg (or variable: shr %cl, %reg)

get_opcode: // eax ← byte at memory[rdi] with zero padding // intel syntax: movzx eax, byte ptr [rdi] movzbl (%rdi), %eax shrl $4, %eax ret

9

slide-18
SLIDE 18

shift right

x86 instruction: shr — shift right shr $amount, %reg (or variable: shr %cl, %reg)

get_opcode: // eax ← byte at memory[rdi] with zero padding // intel syntax: movzx eax, byte ptr [rdi] movzbl (%rdi), %eax shrl $4, %eax ret

9

slide-19
SLIDE 19

right shift in C

get_opcode: // %rdi -- instruction address // eax ← one byte of memory[rdi] with zero padding // intel syntax: movzx eax, byte ptr [rdi] movzbl (%rdi), %eax shrl $4, %eax ret typedef unsigned char byte; int get_opcode(byte *instr) { return instr[0] >> 4; }

10

slide-20
SLIDE 20

right shift in C

typedef unsigned char byte; int get_opcode1(byte *instr) { return instr[0] >> 4; } int get_opcode2(byte *instr) { return instr[0] / 16; }

example output from optimizing compiler:

get_opcode1: movzbl (%rdi), %eax shrl $4, %eax ret get_opcode2: movb (%rdi), %al shrb $4, %al movzbl %al, %eax ret

11

slide-21
SLIDE 21

right shift in C

typedef unsigned char byte; int get_opcode1(byte *instr) { return instr[0] >> 4; } int get_opcode2(byte *instr) { return instr[0] / 16; }

example output from optimizing compiler:

get_opcode1: movzbl (%rdi), %eax shrl $4, %eax ret get_opcode2: movb (%rdi), %al shrb $4, %al movzbl %al, %eax ret

11

slide-22
SLIDE 22

right shift in math

1 >> 0 == 1 0000 0001 1 >> 1 == 0 0000 0000 1 >> 2 == 0 0000 0000 10 >> 0 == 10 0000 1010 10 >> 1 == 5 0000 0101 10 >> 2 == 2 0000 0010

x >> y =

x × 2−y 12

slide-23
SLIDE 23

arithmetic right shift

x86 instruction: sar — arithmetic shift right sar $amount, %reg (or variable: sar %cl, %reg)

%reg (initial value) %reg (fjnal value) 1 0 1 1 1 1 … … … … 1 1 0 0 0 0 1 1 1 1 1 1

13

slide-24
SLIDE 24

arithmetic right shift

x86 instruction: sar — arithmetic shift right sar $amount, %reg (or variable: sar %cl, %reg)

%reg (initial value) %reg (fjnal value) 1 0 1 1 1 1 … … … … 1 1 0 0 0 0 1 1 1 1 1 1

13

slide-25
SLIDE 25

dividing negative by two

start with −x fmip all bits and add one to get x right shift by one to get x/2 fmip all bits and add one to get −x/2 same as right shift by one, adding 1s instead of 0s (except for rounding)

14

slide-26
SLIDE 26

dividing negative by two

start with −x fmip all bits and add one to get x right shift by one to get x/2 fmip all bits and add one to get −x/2 same as right shift by one, adding 1s instead of 0s (except for rounding)

14

slide-27
SLIDE 27

right shift in C

int shift_signed(int x) { return x >> 5; } unsigned shift_unsigned(unsigned x) { return x >> 5; } shift_signed: movl %edi, %eax sarl $5, %eax ret shift_unsigned: movl %edi, %eax shrl $5, eax ret

15

slide-28
SLIDE 28

standards and shifts in C

signed right shift is implementation-defjned

standard lets compilers choose which type of shift to do all x86 compilers I know of — arithmetic

shift amount ≥ width of type: undefjned

x86 assembly: only uses lower bits of shift amount

16

slide-29
SLIDE 29

constructing instructions in hardware

icode 0 0 0 0

  • pcode

17

slide-30
SLIDE 30

shift left

✭✭✭✭✭✭✭✭✭✭✭✭ ❤❤❤❤❤❤❤❤❤❤❤❤

shr $-4, %reg instead: shl $4, %reg (“shift left”)

✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤ ❤

  • pcode >> (-4)

instead: opcode << 4

1 0 1 1 0 1 1

18

slide-31
SLIDE 31

shift left

✭✭✭✭✭✭✭✭✭✭✭✭ ❤❤❤❤❤❤❤❤❤❤❤❤

shr $-4, %reg instead: shl $4, %reg (“shift left”)

✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤ ❤

  • pcode >> (-4)

instead: opcode << 4

1 0 1 1 0 1 1

18

slide-32
SLIDE 32

shift left

x86 instruction: shl — shift left shl $amount, %reg (or variable: shl %cl, %reg)

%reg (initial value) %reg (fjnal value) 1 0 1 1 0 1 1 … … … … 1 1

19

slide-33
SLIDE 33

shift left

x86 instruction: shl — shift left shl $amount, %reg (or variable: shl %cl, %reg)

%reg (initial value) %reg (fjnal value) 1 0 1 1 0 1 1 … … … … 1 1

19

slide-34
SLIDE 34

left shift in math

1 << 0 == 1 0000 0001 1 << 1 == 2 0000 0010 1 << 2 == 4 0000 0100 10 << 0 == 10 0000 1010 10 << 1 == 20 0001 0100 10 << 2 == 40 0010 1000

<<

20

slide-35
SLIDE 35

left shift in math

1 << 0 == 1 0000 0001 1 << 1 == 2 0000 0010 1 << 2 == 4 0000 0100 10 << 0 == 10 0000 1010 10 << 1 == 20 0001 0100 10 << 2 == 40 0010 1000

x << y = x × 2y

20

slide-36
SLIDE 36

extracting icode from more

1 1 1 1 1 0 0 1 0 0 0 0 0 icode ifun rB rA // % -- remainder unsigned extract_opcode1(unsigned value) { return (value / 16) % 16; } unsigned extract_opcode2(unsigned value) { return (value % 256) / 16; }

21

slide-37
SLIDE 37

extracting icode from more

1 1 1 1 1 0 0 1 0 0 0 0 0 icode ifun rB rA // % -- remainder unsigned extract_opcode1(unsigned value) { return (value / 16) % 16; } unsigned extract_opcode2(unsigned value) { return (value % 256) / 16; }

21

slide-38
SLIDE 38

manipulating bits?

easy to manipulate individual bits in HW how do we expose that to software?

22

slide-39
SLIDE 39

circuits: gates

1 1 1 1 1 1

23

slide-40
SLIDE 40

interlude: a truth table

AND 1 1 1

AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove

24

slide-41
SLIDE 41

interlude: a truth table

AND 1 1 1

AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove

24

slide-42
SLIDE 42

interlude: a truth table

AND 1 1 1

AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove

24

slide-43
SLIDE 43

interlude: a truth table

AND 1 1 1

AND with 1: keep a bit the same AND with 0: clear a bit method: construct “mask” of what to keep/remove

24

slide-44
SLIDE 44

bitwise AND — &

Treat value as array of bits 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2

… 1 & … 1 … … 1 1 & … 1 1 1 … 1

25

slide-45
SLIDE 45

bitwise AND — &

Treat value as array of bits 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2

… 1 & … 1 … … 1 1 & … 1 1 1 … 1

25

slide-46
SLIDE 46

bitwise AND — &

Treat value as array of bits 1 & 1 == 1 1 & 0 == 0 0 & 0 == 0 2 & 4 == 0 10 & 7 == 2

… 1 & … 1 … … 1 1 & … 1 1 1 … 1

25

slide-47
SLIDE 47

bitwise AND — C/assembly

x86: and %reg, %reg C: foo & bar

26

slide-48
SLIDE 48

bitwise hardware (10 & 7 == 2)

10 7 . . .

1 1 1 1 1 1

27

slide-49
SLIDE 49

extract opcode from larger

unsigned extract_opcode1_bitwise(unsigned value) { return (value >> 4) & 0xF; // 0xF: 00001111 // like (value / 16) % 16 } unsigned extract_opcode2_bitwise(unsigned value) { return (value & 0xF0) >> 4; // 0xF0: 11110000 // like (value % 256) / 16; }

28

slide-50
SLIDE 50

extract opcode from larger

extract_opcode1_bitwise: movl %edi, %eax shrl $4, %eax andl $0xF, %eax ret extract_opcode2_bitwise: movl %edi, %eax andl $0xF0, %eax shrl $4, %eax ret

29

slide-51
SLIDE 51

more truth tables

AND 1 1 1 OR 1 1 1 1 1 XOR 1 1 1 1 & conditionally clear bit conditionally keep bit | conditionally set bit ^ conditionally fmip bit

30

slide-52
SLIDE 52

bitwise OR — |

1 | 1 == 1 1 | 0 == 1 0 | 0 == 0 2 | 4 == 6 10 | 7 == 15

… 1 1 | … 1 1 1 … 1 1 1 1

31

slide-53
SLIDE 53

bitwise xor — ̂

1 ^ 1 == 0 1 ^ 0 == 1 0 ^ 0 == 0 2 ^ 4 == 6 10 ^ 7 == 13

… 1 1 ^ … 1 1 1 … 1 1 1

32

slide-54
SLIDE 54

negation / not — ~

~ (‘complement’) is bitwise version of !:

!0 == 1 !notZero == 0 ~0 == (int) 0xFFFFFFFF (aka −1) ~2 == (int) 0xFFFFFFFD (aka

3)

~((unsigned) 2) == 0xFFFFFFFD

~ … 1 1 … 1 1 1 1 32 bits

33

slide-55
SLIDE 55

negation / not — ~

~ (‘complement’) is bitwise version of !:

!0 == 1 !notZero == 0 ~0 == (int) 0xFFFFFFFF (aka −1) ~2 == (int) 0xFFFFFFFD (aka −3) ~((unsigned) 2) == 0xFFFFFFFD

~ … 1 1 … 1 1 1 1 32 bits

33

slide-56
SLIDE 56

negation / not — ~

~ (‘complement’) is bitwise version of !:

!0 == 1 !notZero == 0 ~0 == (int) 0xFFFFFFFF (aka −1) ~2 == (int) 0xFFFFFFFD (aka −3) ~((unsigned) 2) == 0xFFFFFFFD

~ … 1 1 … 1 1 1 1 32 bits

33

slide-57
SLIDE 57

note: ternary operator

w = (x ? y : z) if (x) { w = y; } else { w = z; }

34

slide-58
SLIDE 58
  • ne-bit ternary

(x ? y : z) constraint: x, y, and z are 0 or 1 now: reimplement in C without if/else/||/etc.

(assembly: no jumps probably)

divide-and-conquer:

(x ? y : 0) (x ? 0 : z)

35

slide-59
SLIDE 59
  • ne-bit ternary

(x ? y : z) constraint: x, y, and z are 0 or 1 now: reimplement in C without if/else/||/etc.

(assembly: no jumps probably)

divide-and-conquer:

(x ? y : 0) (x ? 0 : z)

35

slide-60
SLIDE 60
  • ne-bit ternary parts (1)

constraint: x, y, and z are 0 or 1 (x ? y : 0) y=0 y=1 x=0 0 x=1 1 (x & y)

36

slide-61
SLIDE 61
  • ne-bit ternary parts (1)

constraint: x, y, and z are 0 or 1 (x ? y : 0) y=0 y=1 x=0 x=1 1 → (x & y)

36

slide-62
SLIDE 62
  • ne-bit ternary parts (2)

(x ? y : 0) = (x & y) (x ? 0 : z)

  • pposite x: ~x

((~x) & z)

37

slide-63
SLIDE 63
  • ne-bit ternary parts (2)

(x ? y : 0) = (x & y) (x ? 0 : z)

  • pposite x: ~x

((~x) & z)

37

slide-64
SLIDE 64
  • ne-bit ternary

constraint: x, y, and z are 0 or 1 (x ? y : z) (x ? y : 0) | (x ? 0 : z) (x & y) | ((~x) & z)

38

slide-65
SLIDE 65

multibit ternary

constraint: x is 0 or 1

  • ld solution ((x & y) | (~x) & 1)
  • nly gets least sig. bit

(x ? y : z) (x ? y : 0) | (x ? 0 : z) (( x) & y) | (( (x ^ 1)) & z)

39

slide-66
SLIDE 66

multibit ternary

constraint: x is 0 or 1

  • ld solution ((x & y) | (~x) & 1)
  • nly gets least sig. bit

(x ? y : z) (x ? y : 0) | (x ? 0 : z) (( x) & y) | (( (x ^ 1)) & z)

39

slide-67
SLIDE 67

constructing masks

constraint: x is 0 or 1 (x ? y : 0) if x = 1: want 1111111111…1 (keep y) if x = 0: want 0000000000…0 (want 0) a trick: x (-1 is 1111…1) ((-x) & y)

40

slide-68
SLIDE 68

constructing masks

constraint: x is 0 or 1 (x ? y : 0) if x = 1: want 1111111111…1 (keep y) if x = 0: want 0000000000…0 (want 0) a trick: −x (-1 is 1111…1) ((-x) & y)

40

slide-69
SLIDE 69

constructing masks

constraint: x is 0 or 1 (x ? y : 0) if x = 1: want 1111111111…1 (keep y) if x = 0: want 0000000000…0 (want 0) a trick: −x (-1 is 1111…1) ((-x) & y)

41

slide-70
SLIDE 70

constructing other masks

constraint: x is 0 or 1 (x ? 0 : z) if x = ✓

✓ ❙ ❙

1 0: want 1111111111…1 if x = ✁

✁ ❆ ❆

0 1: want 0000000000…0 mask: ✟✟

❍❍

  • x

(x^1)

42

slide-71
SLIDE 71

constructing other masks

constraint: x is 0 or 1 (x ? 0 : z) if x = ✓

✓ ❙ ❙

1 0: want 1111111111…1 if x = ✁

✁ ❆ ❆

0 1: want 0000000000…0 mask: ✟✟

❍❍

  • x −(x^1)

42

slide-72
SLIDE 72

multibit ternary

constraint: x is 0 or 1

  • ld solution ((x & y) | (~x) & 1)
  • nly gets least sig. bit

(x ? y : z) (x ? y : 0) | (x ? 0 : z) ((−x) & y) | ((−(x ^ 1)) & z)

43

slide-73
SLIDE 73

fully multibit

✭✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤❤ ❤

constraint: x is 0 or 1 (x ? y : z) easy C way: !x = 0 or 1, !!x = 0 or 1

x86 assembly: testq %rax, %rax then sete/setne (copy from ZF)

(x ? y : 0) | (x ? 0 : z) (( !!x) & y) | (( !x) & z)

44

slide-74
SLIDE 74

fully multibit

✭✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤❤ ❤

constraint: x is 0 or 1 (x ? y : z) easy C way: !x = 0 or 1, !!x = 0 or 1

x86 assembly: testq %rax, %rax then sete/setne (copy from ZF)

(x ? y : 0) | (x ? 0 : z) (( !!x) & y) | (( !x) & z)

44

slide-75
SLIDE 75

fully multibit

✭✭✭✭✭✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤❤❤❤❤❤ ❤

constraint: x is 0 or 1 (x ? y : z) easy C way: !x = 0 or 1, !!x = 0 or 1

x86 assembly: testq %rax, %rax then sete/setne (copy from ZF)

(x ? y : 0) | (x ? 0 : z) ((−!!x) & y) | ((−!x) & z)

44

slide-76
SLIDE 76

simple operation performance

typical modern desktop processor:

bitwise and/or/xor, shift, add, subtract, compare — ∼ 1 cycle integer multiply — ∼ 1-3 cycles integer divide — ∼ 10-150 cycles

(smaller/simpler/lower-power processors are difgerent) add/subtract/compare are more complicated in hardware! but much more important for typical applications

45

slide-77
SLIDE 77

simple operation performance

typical modern desktop processor:

bitwise and/or/xor, shift, add, subtract, compare — ∼ 1 cycle integer multiply — ∼ 1-3 cycles integer divide — ∼ 10-150 cycles

(smaller/simpler/lower-power processors are difgerent) add/subtract/compare are more complicated in hardware! but much more important for typical applications

45

slide-78
SLIDE 78

problem: any-bit

is any bit of x set? goal: turn 0 into 0, not zero into 1 easy C solution: !(!(x))

another easy solution if you have − or + (lab exercise)

what if we don’t have ! or − or + how do we solve is x is two bits? four bits?

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

46

slide-79
SLIDE 79

problem: any-bit

is any bit of x set? goal: turn 0 into 0, not zero into 1 easy C solution: !(!(x))

another easy solution if you have − or + (lab exercise)

what if we don’t have ! or − or + how do we solve is x is two bits? four bits?

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

46

slide-80
SLIDE 80

problem: any-bit

is any bit of x set? goal: turn 0 into 0, not zero into 1 easy C solution: !(!(x))

another easy solution if you have − or + (lab exercise)

what if we don’t have ! or − or + how do we solve is x is two bits? four bits?

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

46

slide-81
SLIDE 81

wasted work (1)

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

in general: (x & 1) | (y & 1) == (x | y) & 1

(x | (x >> 1) | (x >> 2) | (x >> 3)) & 1

47

slide-82
SLIDE 82

wasted work (1)

((x & 1) | ((x >> 1) & 1) | ((x >> 2) & 1) | ((x >> 3) & 1))

in general: (x & 1) | (y & 1) == (x | y) & 1

(x | (x >> 1) | (x >> 2) | (x >> 3)) & 1

47

slide-83
SLIDE 83

wasted work (2)

4-bit any set: (x | (x >> 1)| (x >> 2) | (x >> 3)) & 1 performing 3 bitwise ors …each bitwise or does 4 OR operations but only result of one of the 4!

(x) (x >> 1)

48

slide-84
SLIDE 84

wasted work (2)

4-bit any set: (x | (x >> 1)| (x >> 2) | (x >> 3)) & 1 performing 3 bitwise ors …each bitwise or does 4 OR operations but only result of one of the 4!

(x) (x >> 1)

48

slide-85
SLIDE 85

any-bit: divide and conquer

four-bit input x = x1x2x3x4 x | (x >> 1) = (x1|0)(x2|x1)(x3|x2)(x4|x3) = y1y2y3y4 y | (y >> 2) = “is any bit set?” unsigned int any_of_four(unsigned int x) { int part_bits = (x >> 1) | x; return ((part_bits >> 2) | part_bits) & 1; }

49

slide-86
SLIDE 86

any-bit: divide and conquer

four-bit input x = x1x2x3x4 x | (x >> 1) = (x1|0)(x2|x1)(x3|x2)(x4|x3) = y1y2y3y4 y | (y >> 2) = (y1|0)(y2|0)(y3|y1)(y4|y2) = z1z2z3z4 z4 = (y4|y2) = ((x2|x1)|(x4|x3)) = x4|x3|x2|x1 “is any bit set?” unsigned int any_of_four(unsigned int x) { int part_bits = (x >> 1) | x; return ((part_bits >> 2) | part_bits) & 1; }

49

slide-87
SLIDE 87

any-bit: divide and conquer

four-bit input x = x1x2x3x4 x | (x >> 1) = (x1|0)(x2|x1)(x3|x2)(x4|x3) = y1y2y3y4 y | (y >> 2) = (y1|0)(y2|0)(y3|y1)(y4|y2) = z1z2z3z4 z4 = (y4|y2) = ((x2|x1)|(x4|x3)) = x4|x3|x2|x1 “is any bit set?” unsigned int any_of_four(unsigned int x) { int part_bits = (x >> 1) | x; return ((part_bits >> 2) | part_bits) & 1; }

49

slide-88
SLIDE 88

any-bit-set: 32 bits

unsigned int any(unsigned int x) { x = (x >> 1) | x; x = (x >> 2) | x; x = (x >> 4) | x; x = (x >> 8) | x; x = (x >> 16) | x; return x & 1; }

50

slide-89
SLIDE 89

bitwise strategies

use paper, fjnd subproblems, etc. mask and shift

(x & 0xF0) >> 4

factor/distribute

(x & 1) | (y & 1) == (x | y) & 1

divide and conquer common subexpression elimination

return ((−!!x) & y) | ((−!x) & z) becomes d = !x; return ((−!d) & y) | ((−d) & z)

51

slide-90
SLIDE 90

exercise

Which of these will swap last and second-to-last bit of an unsigned int x? (abcdef becomes abcd fe)

/* version A */ return ((x >> 1) & 1) | (x & (~1)); /* version B */ return ((x >> 1) & 1) | ((x << 1) & (~2)) | (x & (~3)); /* version C */ return (x & (~3)) | ((x & 1) << 1) | ((x >> 1) & 1); /* version D */ return (((x & 1) << 1) | ((x & 3) >> 1)) ^ x;

52

slide-91
SLIDE 91

version A

/* version A */ return ((x >> 1) & 1) | (x & (~1)); // ^^^^^^^^^^^^^^ // abcdef --> 0abcde -> 00000e // ^^^^^^^^^^ // abcdef --> abcde0 // ^^^^^^^^^^^^^^^^^^^^^^^^^^^ // 00000e | abcde0 = abcdee

53

slide-92
SLIDE 92

version B

/* version B */ return ((x >> 1) & 1) | ((x << 1) & (~2)) | (x & (~3)); // ^^^^^^^^^^^^^^ // abcdef --> 0abcde --> 00000e // ^^^^^^^^^^^^^^^ // abcdef --> bcdef0 --> bcde00 // ^^^^^^^^^ // abcdef --> abcd00

54

slide-93
SLIDE 93

version C

/* version C */ return (x & (~3)) | ((x & 1) << 1) | ((x >> 1) & 1); // ^^^^^^^^^^ // abcdef --> abcd00 // ^^^^^^^^^^^^^^ // abcdef --> 00000f --> 0000f0 // ^^^^^^^^^^^^^ // abcdef --> 0abcde --> 00000e

55

slide-94
SLIDE 94

version D

/* version D */ return (((x & 1) << 1) | ((x & 3) >> 1)) ^ x; // ^^^^^^^^^^^^^^^ // abcdef --> 00000f --> 0000f0 // ^^^^^^^^^^^^^^ // abcdef --> 0000ef --> 00000e // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // 0000fe ^ abcdef --> abcd(f XOR e)(e XOR f)

56

slide-95
SLIDE 95

expanded code

int lastBit = x & 1; int secondToLastBit = x & 2; int rest = x & ~3; int lastBitInPlace = lastBit << 1; int secondToLastBitInPlace = secondToLastBit >> 1; return rest | lastBitInPlace | secondToLastBitInPlace;

57

slide-96
SLIDE 96

backup slides

58

slide-97
SLIDE 97

dividing negative by two

start with −x fmip all bits and add one to get x right shift by one to get x/2 fmip all bits and add one to get −x/2 same as right shift by one, adding 1s instead of 0s (except for rounding)

59

slide-98
SLIDE 98

divide with proper rounding

C division: rounds towards zero (truncate) arithmetic shift: rounds towards negative infjnity solution: “bias” adjustments — described in textbook

divideBy8: // GCC generated code leal 7(%rdi), %eax // eax edi 7 testl %edi, %edi // set cond. codes based on %edi cmovns %edi, %eax // if (edi sign bit = 0) eax edi sarl $3, %eax // arithmetic shift

60

slide-99
SLIDE 99

divide with proper rounding

C division: rounds towards zero (truncate) arithmetic shift: rounds towards negative infjnity solution: “bias” adjustments — described in textbook

divideBy8: // GCC generated code leal 7(%rdi), %eax // eax ← edi + 7 testl %edi, %edi // set cond. codes based on %edi cmovns %edi, %eax // if (edi sign bit = 0) eax ← edi sarl $3, %eax // arithmetic shift

60

slide-100
SLIDE 100

miscellaneous bit manipulation

common bit manipulation instructions are not in C: rotate (x86: ror, rol) — like shift, but wrap around fjrst/last bit set (x86: bsf, bsr) population count (some x86: popcnt) — number of bits set

61

slide-101
SLIDE 101

parallelism

bitwise operations — each bit is seperate same idea can apply to more interesting operations ; sometimes specifjc HW support

e.g. x86-64 has a “multiply four pairs of fmoats” instruction

62

slide-102
SLIDE 102

parallelism

bitwise operations — each bit is seperate same idea can apply to more interesting operations 010 + 011 = 101; 001 + 010 = 011 → 01000001 + 01100010 = 10100011 sometimes specifjc HW support

e.g. x86-64 has a “multiply four pairs of fmoats” instruction

62

slide-103
SLIDE 103

parallelism

bitwise operations — each bit is seperate same idea can apply to more interesting operations 010 + 011 = 101; 001 + 010 = 011 → 01000001 + 01100010 = 10100011 sometimes specifjc HW support

e.g. x86-64 has a “multiply four pairs of fmoats” instruction

62

slide-104
SLIDE 104

two’s complement refresher

1

−231

1

+230

1

+229

… 1

+22

1

+21

1

+20

−1 =

0111 1111… 1111 1000 0000… 0000 1111 1111… 1111

63

slide-105
SLIDE 105

two’s complement refresher

1

−231

1

+230

1

+229

… 1

+22

1

+21

1

+20

−1 =

−1 1 231 − 1 −231 −231 + 1

0111 1111… 1111 1000 0000… 0000 1111 1111… 1111

63

slide-106
SLIDE 106

two’s complement refresher

1

−231

1

+230

1

+229

… 1

+22

1

+21

1

+20

−1 =

−1 1 231 − 1 −231 −231 + 1

0111 1111… 1111 1000 0000… 0000 1111 1111… 1111

63