Channels Ryan Eberhardt and Armin Namavari May 14, 2020 Logistics - - PowerPoint PPT Presentation

channels
SMART_READER_LITE
LIVE PREVIEW

Channels Ryan Eberhardt and Armin Namavari May 14, 2020 Logistics - - PowerPoint PPT Presentation

Channels Ryan Eberhardt and Armin Namavari May 14, 2020 Logistics Congrats on making it through week 6! Week 5 exercises due Saturday Project 1 due Tuesday Let us know if you have questions! We have OH after class


slide-1
SLIDE 1

Channels

Ryan Eberhardt and Armin Namavari May 14, 2020

slide-2
SLIDE 2

Logistics

  • Congrats on making it through week 6!
  • Week 5 exercises due Saturday
  • Project 1 due Tuesday
  • Let us know if you have questions! We have OH after class
slide-3
SLIDE 3

Reconsidering multithreading

slide-4
SLIDE 4

Characteristics of multithreading

  • Why do we like multithreading?
  • It’s fast (lower context switching overhead than multiprocessing)
  • It’s easy (sharing data is straightforward when you share memory)
  • Why do we not like multithreading?
  • It’s easy to mess up: data races
slide-5
SLIDE 5

Radical proposition

  • What if we didn’t share memory?

○ Could we come up with a way to do multithreading that is just as fast and just as easy?

  • If threads don’t share memory, how are they supposed to work together when

data is involved?

  • Golang concurrency slogan: “Do not communicate by sharing memory;

instead, share memory by communicating.” (Effective Go)

  • Message passing: Independent threads/processes collaborate by exchanging

messages with each other ○ Can’t have data races because there is no shared memory

slide-6
SLIDE 6

Communicating Sequential Processes

  • Theoretical model introduced in 1978: sequential processes communicate via

by sending messages over “channels” ○ Sequential processes: easy peasy ○ No shared state -> no data races!

  • Serves as the basis for newer systems languages such as Go and Erlang
  • Also served as an early model for Rust!

○ Channels used to be the only communication/synchronization primitive

  • Channels are available in other languages as well (e.g. Boost includes an

implementation for C++)

slide-7
SLIDE 7

Channels: like semaphores

slide-8
SLIDE 8

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Unlocked

slide-9
SLIDE 9

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait()

slide-10
SLIDE 10

Semaphores

thread1

semaphore.wait()

Buffer: SomeStruct {
 …
 } Mutex: Unlocked

slide-11
SLIDE 11

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait()

slide-12
SLIDE 12

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Unlocked

mutex.lock()

slide-13
SLIDE 13

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Locked

mutex.lock()

slide-14
SLIDE 14

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Locked

slide-15
SLIDE 15

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Locked

mutex.unlock()

slide-16
SLIDE 16

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Unlocked

mutex.unlock()

slide-17
SLIDE 17

Semaphores

thread1

Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again)

slide-18
SLIDE 18

Semaphores

thread1 (blocked)

Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again)

slide-19
SLIDE 19

Semaphores

thread1 (blocked) thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again)

slide-20
SLIDE 20

Semaphores

thread1 (blocked) thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again) mutex.lock()

slide-21
SLIDE 21

Semaphores

thread1 (blocked) thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Locked

semaphore.wait() (again) mutex.lock()

slide-22
SLIDE 22

Semaphores

thread1 (blocked) thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Locked

semaphore.wait() (again)

slide-23
SLIDE 23

Semaphores

thread1 (blocked) thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Locked

semaphore.wait() (again) mutex.unlock()

slide-24
SLIDE 24

Semaphores

thread1 (blocked) thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again) mutex.unlock()

slide-25
SLIDE 25

Semaphores

thread1 (blocked) thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again) semaphore.signal()

slide-26
SLIDE 26

Semaphores

thread1 (blocked) thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again) semaphore.signal()

slide-27
SLIDE 27

Semaphores

thread1 thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again)

slide-28
SLIDE 28

Semaphores

thread1 thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

semaphore.wait() (again)

slide-29
SLIDE 29

Semaphores

thread1 thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

mutex.lock()

slide-30
SLIDE 30

Semaphores

thread1 thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Locked

mutex.lock()

slide-31
SLIDE 31

Semaphores

thread1 thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Locked

slide-32
SLIDE 32

Semaphores

thread1 thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Locked

mutex.unlock()

slide-33
SLIDE 33

Semaphores

thread1 thread2

SomeStruct {
 …
 } Buffer: SomeStruct {
 …
 } Mutex: Unlocked

mutex.unlock()

slide-34
SLIDE 34

SomeStruct {
 …
 }

Channels

thread1

slide-35
SLIDE 35

SomeStruct {
 …
 }

Channels

thread1

let struct = receive_end.recv().unwrap()

slide-36
SLIDE 36

SomeStruct {
 …
 }

Channels

let struct = receive_end.recv().unwrap()

thread1

slide-37
SLIDE 37

SomeStruct {
 …
 }

Channels

thread1

let struct = receive_end.recv().unwrap()

slide-38
SLIDE 38

SomeStruct {
 …
 }

Channels

thread1

let struct2 = receive_end.recv().unwrap() (again)

slide-39
SLIDE 39

Channels

thread1 (blocked)

SomeStruct {
 …
 }

let struct2 = receive_end.recv().unwrap() (again)

slide-40
SLIDE 40

Channels

thread1 (blocked) thread2

SomeStruct {
 …
 }

let struct2 = receive_end.recv().unwrap() (again)

slide-41
SLIDE 41

Channels

thread1 (blocked) thread2

SomeStruct {
 …
 } SomeStruct {
 …
 }

send_end.send(struct).unwrap() let struct2 = receive_end.recv().unwrap() (again)

slide-42
SLIDE 42

Channels

thread1 (blocked) thread2

SomeStruct {
 …
 } SomeStruct {
 …
 }

send_end.send(struct).unwrap() let struct2 = receive_end.recv().unwrap() (again)

slide-43
SLIDE 43

Channels

thread1 thread2

SomeStruct {
 …
 } SomeStruct {
 …
 }

let struct2 = receive_end.recv().unwrap() (again)

slide-44
SLIDE 44

Channels

thread1 thread2

SomeStruct {
 …
 } SomeStruct {
 …
 }

let struct2 = receive_end.recv().unwrap() (again)

slide-45
SLIDE 45

Channels: like strongly-typed pipes

slide-46
SLIDE 46

Chrome architecture diagram

https://www.chromium.org/developers/design-documents/multi-process-architecture (slightly out of date)

Inter-Process Communication channels: Pipes, but with an extra layer of abstraction to serialize/deserialize objects

slide-47
SLIDE 47

Using channels

slide-48
SLIDE 48

Isn’t message passing bad for performance?

  • If you don’t share memory, then you need to copy data into/out of messages.

That seems expensive. What gives?

  • Theory != practice

○ We share some memory (the heap) and only make shallow copies into channels

slide-49
SLIDE 49

Partly-shared memory (shallow copies only)

thread1 thread2

Vec { len: 6,
 alloc_len: 16,
 data: Box<>,
 }

Heap

[3, 4, 5, 6, 7, 8]

slide-50
SLIDE 50

Partly-shared memory (shallow copies only)

thread1 thread2

Vec { len: 6,
 alloc_len: 16,
 data: Box<>,
 }

Heap

[3, 4, 5, 6, 7, 8]

slide-51
SLIDE 51

Partly-shared memory (shallow copies only)

thread1 thread2

Vec { len: 6,
 alloc_len: 16,
 data: Box<>,
 }

Heap

[3, 4, 5, 6, 7, 8]

slide-52
SLIDE 52

Partly-shared memory (shallow copies only)

thread1 thread2

Vec { len: 6,
 alloc_len: 16,
 data: Box<>,
 }

Heap

[3, 4, 5, 6, 7, 8]

slide-53
SLIDE 53

Partly-shared memory (shallow copies only)

thread1 thread2

Vec { len: 6,
 alloc_len: 16,
 data: Box<>,
 }

Heap

[3, 4, 5, 6, 7, 8]

slide-54
SLIDE 54

Isn’t message passing bad for performance?

  • If you don’t share memory, then you need to copy data into/out of messages. That

seems expensive. What gives?

  • Theory != practice

○ We share some memory (the heap) and only make shallow copies into channels

  • In Go, passing pointers is potentially dangerous! Channels make data races less

likely but don’t preclude races if you use them wrong

  • In Rust, passing pointers (e.g. Box) is always safe despite sharing memory

○ When you send to a channel, ownership of value is transferred to the channel ○ The compiler will ensure you don’t use a pointer after it has been moved into the channel

slide-55
SLIDE 55

Channel APIs and implementations

  • The ideal channel is an MPMC (multi-producer, multi-consumer) channel

○ We implemented one of these on Tuesday! A simple Mutex<VecDeque<>> with a CondVar ○ However, that approach is much slower than we’d like. (Why?)

  • It’s really, really hard to implement a fast and safe MPMC channel!

○ Go’s channels are known for being slow ■ They essentially implement Mutex<VecDeque<>>, but using a “fast userspace mutex” (futex) ○ A fast implementation needs to use lock-free programming techniques to avoid lock contention and reduce latency

slide-56
SLIDE 56

Channel APIs and implementations

  • The Rust standard library includes an MPSC (multi-producer, single-

consumer) channel, but it’s not ideal (one of the oldest APIs in Rust stdlib) ○ Great if you want multiple threads to send to one thread (e.g. aggregating results of an operation) ○ Also great for thread-to-thread communication (superset of SPSC) ○ Not so great if you want to distribute data/work (e.g. a work queue) ○ Additionally, the API has some oddities (great article) ○ There’s a good chance this channel implementation will be replaced within the next year or two (discussion)

slide-57
SLIDE 57

Channel APIs and implementations

  • The crossbeam crate recently (2018) added an excellent MPMC

implementation ○ “If we were to redo Rust channels from scratch, how should they look?” Much improved API ○ Mostly lock free ○ Even faster than the existing MPSC channels ○ Great read here ○ Likely to replace the stdlib channels in some capacity

slide-58
SLIDE 58

Heap

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded();

channel { senders: 1, receivers: 1, … } Thread 1 stack Sender Receiver

slide-59
SLIDE 59

Heap

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() {

channel { senders: 1, receivers: 1, … } Thread 1 stack Sender Receiver

slide-60
SLIDE 60

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone();

Heap channel { senders: 1, receivers: 1, … } Thread 1 stack Sender Receiver

slide-61
SLIDE 61

Heap

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone();

channel { senders: 1, receivers: 1, … } Thread 1 stack Sender Receiver Receiver

slide-62
SLIDE 62

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || {

Heap channel { senders: 1, receivers: 1, … } Thread 1 stack Sender Receiver Receiver

slide-63
SLIDE 63

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); }

Heap channel { senders: 1, receivers: 1, … } Thread 1 stack Sender Receiver Receiver

slide-64
SLIDE 64

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); }

Heap channel { senders: 1, receivers: 1, … } Thread 1 stack Sender Receiver Receiver

slide-65
SLIDE 65

Thread 2 stack

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); }

Heap channel { senders: 1, receivers: 2, … } Thread 1 stack Sender Receiver Receiver

slide-66
SLIDE 66

Thread 2 stack

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); }

Heap channel { senders: 1, receivers: 2, … } Thread 1 stack Sender Receiver Receiver

Read until recv() returns Err (i.e. until the channel is closed)

slide-67
SLIDE 67

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); } let stdin = std::io::stdin(); for line in stdin.lock().lines() { let num = line.unwrap().parse::<u32>().unwrap();

Thread 2 stack Heap channel { senders: 1, receivers: 2, … } Thread 1 stack Sender Receiver Receiver

slide-68
SLIDE 68

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); } let stdin = std::io::stdin(); for line in stdin.lock().lines() { let num = line.unwrap().parse::<u32>().unwrap(); sender .send(num) .expect("Tried writing to channel, but there are no receivers!"); }

Thread 2 stack Heap channel { senders: 1, receivers: 2, … } Thread 1 stack Sender Receiver Receiver

slide-69
SLIDE 69

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); } let stdin = std::io::stdin(); for line in stdin.lock().lines() { let num = line.unwrap().parse::<u32>().unwrap(); sender .send(num) .expect("Tried writing to channel, but there are no receivers!"); } drop(sender);

Thread 2 stack Heap channel { senders: 1, receivers: 2, … } Thread 1 stack Sender Receiver Receiver

slide-70
SLIDE 70

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); } let stdin = std::io::stdin(); for line in stdin.lock().lines() { let num = line.unwrap().parse::<u32>().unwrap(); sender .send(num) .expect("Tried writing to channel, but there are no receivers!"); } drop(sender);

Thread 2 stack Heap channel { senders: 0, receivers: 2, … } Thread 1 stack Receiver Receiver

slide-71
SLIDE 71

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); } let stdin = std::io::stdin(); for line in stdin.lock().lines() { let num = line.unwrap().parse::<u32>().unwrap(); sender .send(num) .expect("Tried writing to channel, but there are no receivers!"); } drop(sender);

Thread 2 stack Heap channel { senders: 0, receivers: 2, … } Thread 1 stack Receiver Receiver

Channel is closed! Worker threads will break out of while loop

slide-72
SLIDE 72

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); } let stdin = std::io::stdin(); for line in stdin.lock().lines() { let num = line.unwrap().parse::<u32>().unwrap(); sender .send(num) .expect("Tried writing to channel, but there are no receivers!"); } drop(sender);

Heap channel { senders: 0, receivers: 1, … } Thread 1 stack Receiver

slide-73
SLIDE 73

Implementing farm v3.0

fn main() { let (sender, receiver) = crossbeam::channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); } let stdin = std::io::stdin(); for line in stdin.lock().lines() { let num = line.unwrap().parse::<u32>().unwrap(); sender .send(num) .expect("Tried writing to channel, but there are no receivers!"); } drop(sender); for thread in threads { thread.join().expect("Panic occurred in thread"); }
 }

Heap channel { senders: 0, receivers: 1, … } Thread 1 stack Receiver

slide-74
SLIDE 74

Pick the right tool for the job

  • Using channels is often much simpler and safer than using mutexes + CVs

○ Even in Rust, mutexes can still cause problems if you lock/unlock at the wrong times ○ E.g. semaphore will break if you unlock after cv.wait() and then re-lock before decrementing the counter. You hold the lock while touching the counter, so the compiler doesn’t complain, but there is still a race condition

  • However, channels aren’t always the best choice

○ Not very well suited for global values (e.g. caches or global counters)