[T HREADS ] Shrideep Pallickara Computer Science Colorado State - - PDF document

t hreads
SMART_READER_LITE
LIVE PREVIEW

[T HREADS ] Shrideep Pallickara Computer Science Colorado State - - PDF document

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University CS 555: D ISTRIBUTED S YSTEMS [T HREADS ] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed Systems [Fall 2019] October


slide-1
SLIDE 1

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.1

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS 555: DISTRIBUTED SYSTEMS

[THREADS]

Shrideep Pallickara Computer Science Colorado State University

October 15, 2019

L15.1 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.2 Professor: SHRIDEEP PALLICKARA

Frequently asked questions from the previous class survey

October 15, 2019

slide-2
SLIDE 2

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.2

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.3 Professor: SHRIDEEP PALLICKARA

Topics covered in this lecture

¨ Threads ¤ Contrasting with processes ¤ Threads in Distributed Systems ¤ An example of performance improvements with Threads ¤ Threading architectures for Servers ¤ State

October 15, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

THREADS

October 15, 2019

L15.4

slide-3
SLIDE 3

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.3

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.5 Professor: SHRIDEEP PALLICKARA

Threads execute their own piece of code independently of other threads, but …

October 15, 2019

¨ No attempt is made to achieve high-degree of concurrency

transparency

¤ Especially, not at the cost of performance ¨ Only maintains information to allow a CPU to be shared among

several threads

¨ Thread context ¤ CPU Context + Thread Management info n List of blocked threads

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.6 Professor: SHRIDEEP PALLICKARA

Information not strictly necessary to manage multiple threads is ignored

¨ Protecting data against inappropriate accesses by multiple threads in

a process?

¤ Developers must deal with this

October 15, 2019

slide-4
SLIDE 4

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.4

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

THREADS VS. MULTIPLE PROCESSES

October 15, 2019

L5.7 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.8 Professor: SHRIDEEP PALLICKARA

Why prefer multiple threads over multiple processes?

October 15, 2019

¨ Threads are cheaper to create and manage than processes ¨ Resource sharing can be achieved more efficiently between threads

than processes

¤ Threads within a process share the address space of the process ¨ Switching between threads is cheaper than for processes ¨ BUT … threads within a process are not protected from one another

slide-5
SLIDE 5

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.5

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.9 Professor: SHRIDEEP PALLICKARA

Other costs for processes

October 15, 2019

¨ When a new process is created to perform a task there are other costs ¤ In a kernel supporting virtual memory the new process will incur page faults n Due to data and instructions being referenced for the first time ¨ Hardware caches must acquire new cache entries for that particular

process

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.10 Professor: SHRIDEEP PALLICKARA

Contrasting the costs for threads [1/2]

October 15, 2019

¨ With threads these overheads may also occur but they are likely to be

smaller

¨ When thread accesses code & data that was accessed recently by other

threads in the process?

¤ Automatically take advantage of any hardware or main memory caching

slide-6
SLIDE 6

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.6

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.11 Professor: SHRIDEEP PALLICKARA

Contrasting the costs for threads [2/2]

October 15, 2019

¨ Switching between threads is much faster than that between

processes

¨ This is a cost that is incurred many times throughout the lifecycle of the

thread or process

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.12 Professor: SHRIDEEP PALLICKARA

A process with multiple threads of control can perform more than 1 task at a time

October 15, 2019

CODE DATA FILES CODE DATA FILES

Registers Stack Registers Stack Registers Stack Registers Stack

Traditional Heavy weight process

slide-7
SLIDE 7

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.7

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.13 Professor: SHRIDEEP PALLICKARA

Implications?

¨ Performance of a multithreaded application is seldom worse than a

single threaded one

¤ Actually leads to performance gains ¨ Development requires additional effort ¤ No automatic protection against each other

October 15, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.14 Professor: SHRIDEEP PALLICKARA

Thread use in non-distributed settings

¨ Interactive multithreaded application ¤ Parts of program may be blocked or slow ¤ Remainder of program may still chug along ¨ A single threaded process can ONLY run on 1 processor ¤ Regardless of how many are available ¤ Underutilization of computational resources

October 15, 2019

slide-8
SLIDE 8

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.8

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.15 Professor: SHRIDEEP PALLICKARA

Another drawback of processes is the overheads for IPC (Inter Process Communications)

Process A Process B

Operating System

Switch from kernel space to user space Switch context from process A to B Switch from user space to kernel space

October 15, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.16 Professor: SHRIDEEP PALLICKARA

Applications can be constructed using separate threads

¨ Communications dealt entirely using shared data ¤ Performance is much better ¨ Software engineering ¤ Collection of several (generally independent) tasks ¤ Word Processor n Input handling, spell check, layout, index generation …

October 15, 2019

slide-9
SLIDE 9

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.9

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

THREADS IN DISTRIBUTED SYSTEMS

October 15, 2019

L5.17 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.18 Professor: SHRIDEEP PALLICKARA

Threads in distributed systems: Multithreaded clients

¨ Hide communication latencies ¤ Initiate communications ¤ Immediately do something else ¨ Web browsers ¤ As soon as main HTML page is fetched n Display it ¤ Activate threads to retrieve other data types

Interleave Identical Code

October 15, 2019

slide-10
SLIDE 10

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.10

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.19 Professor: SHRIDEEP PALLICKARA

Several connections can be opened simultaneously

¨ To the same server ¤ If the server is overloaded; things get even slower ¨ To replicated servers ¤ Data transfer in parallel ¤ Much faster rendering of content

October 15, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.20 Professor: SHRIDEEP PALLICKARA

Multithreaded Servers

October 15, 2019

¨ Simplifies server code ¨ Easier to develop servers that exploit parallelism ¨ E.g.: Handling concurrent connections ¤ Each connection managed by a different thread ¤ Multiple connections handled by a pool of threads

slide-11
SLIDE 11

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.11

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

AN EXAMPLE OF PERFORMANCE

IMPROVEMENTS WITH THREADS

October 15, 2019

L4.21 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.22 Professor: SHRIDEEP PALLICKARA

Client and Server with Threads

DISK I/O

Client

Requests

Request Queue Server may have up to N threads

Server

October 15, 2019

slide-12
SLIDE 12

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.12

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.23 Professor: SHRIDEEP PALLICKARA

Server side processing

October 15, 2019

¨ Server has queue of requests received from clients ¨ Server also has a pool of one or more threads ¤ Each thread repeatedly removes requests & processes it ¨ Each thread applies the same methods to process the requests ¤ Each request takes 2 ms of processing PLUS 8 ms of I/O (when server reads

from disk i.e. no caching)

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.24 Professor: SHRIDEEP PALLICKARA

Maximum server throughput with 1 thread

October 15, 2019

¨ The turnaround time for handling any request is 2+8 = 10 ms ¨ The server can handle 100 requests per second ¨ Any new requests that arrive while the thread is handling a request? ¤ These will be queued

slide-13
SLIDE 13

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.13

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.25 Professor: SHRIDEEP PALLICKARA

Server throughput with 2 threads

October 15, 2019

¨ We assume that the threads are independently schedulable

¤ One thread can be scheduled while the other is blocked for I/O

¨ Thread T2 can process a second request when thread T1 is blocked, and vice

versa

¨ This increases throughput … but both threads may be blocked for I/O on the

single disk drive

¨ If all I/O requests are serialized and take 8 ms each?

¤ Maximum throughput is 1000/8 = 125 requests/second

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.26 Professor: SHRIDEEP PALLICKARA

Server throughput with disk block caching

October 15, 2019

¨ Server keeps data that it reads in buffers ¨ When a server thread tries to retrieve data ¤ It first examines the cache and avoids disk accesses if it finds data element

there

¨ If the hit rate is 75%? ¤ The mean I/O time per-request reduces to

(0.75 x 0 + 0.25 x 8) = 2 milliseconds

¨ Maximum theoretical throughput? ¤ Becomes 500 requests per second

slide-14
SLIDE 14

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.14

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.27 Professor: SHRIDEEP PALLICKARA

But there are costs associated with caching

October 15, 2019

¨ Average processor time for a request increases ¤ This is because it takes time to search for cached data for every operation ¤ Let us assume that this is now 2.5 milliseconds ¨ The server can now handle 1000/2.5 requests per second i.e. 400

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.28 Professor: SHRIDEEP PALLICKARA

Let’s look at caching plus multiple threads

October 15, 2019

¨ Each request takes about 2.5 (processing) + 2 (I/O) ¤ Total time per request is now 4.5 mSecs when disk accesses are serialized ¤ Each thread can do 1000/4.5 requests per second i.e. 222 requests/second ¨ With two threads? ¤ 444 requests/second ¨ With three threads? ¤ 500 requests (bound by the I/O time)

slide-15
SLIDE 15

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.15

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

THREADING ARCHITECTURES FOR SERVERS

October 15, 2019

L5.29 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.30 Professor: SHRIDEEP PALLICKARA

Worker pool architecture

¨ Server creates a fixed pool of worker threads to process requests ¤ Pool is initialized when server starts up ¨ Incoming requests are placed into a queue ¤ Workers retrieve requests (work units) from the queue and process them

October 15, 2019

slide-16
SLIDE 16

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.16

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.31 Professor: SHRIDEEP PALLICKARA

Managing priorities in the worker pool?

October 15, 2019

¨ Introduce multiple queues ¨ Worker threads scan queues in the order of descending priority

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.32 Professor: SHRIDEEP PALLICKARA

Disadvantages of the worker pool model

¨ Number of worker threads is fixed ¤ So, threads in the pool may be too few to adequately cope with the rate of

requests

¨ Need to account for coordinated accesses to the shared queue

October 15, 2019

slide-17
SLIDE 17

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.17

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.33 Professor: SHRIDEEP PALLICKARA

Thread-per-request architecture

October 15, 2019

¨ Worker thread is spawned for each incoming request ¤ Worker thread destroys itself after processing request ¨ Advantages: ¤ Threads do not contend for the shared work-queue ¤ Throughput is potentially maximized ¨ Disadvantage ¤ Overhead for thread creation and destruction operations

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.34 Professor: SHRIDEEP PALLICKARA

Thread-per-connection architecture

October 15, 2019

¨ Associates a thread per connection ¨ New worker thread created when a client makes a connection ¤ Destroyed when client closes the connection ¨ Client may make many requests over the connection

slide-18
SLIDE 18

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.18

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.35 Professor: SHRIDEEP PALLICKARA

Thread-per-object architecture

¨ Associate a thread with each remote object ¨ A separate thread receives requests and queues them ¤ But there is a queue per-object

October 15, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.36 Professor: SHRIDEEP PALLICKARA

Thread-per-connection & Thread-per-object

¨ Advantages ¤ Server benefits from lower thread management overheads compared to

thread-per-request

¨ Disadvantages ¤ Clients may be delayed when a worker thread has several outstanding

requests, but another thread has no work to perform

October 15, 2019

slide-19
SLIDE 19

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.19

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

SERVER DESIGN ISSUES

October 15, 2019

L5.37 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.38 Professor: SHRIDEEP PALLICKARA

Server Design Issues

¨ Iterative Servers ¤ Handles request ¤ Returns response to requesting client ¨ Concurrent Servers ¤ Pass request to a separate thread/process n Multithreaded server ¤ Await new incoming request

October 15, 2019

slide-20
SLIDE 20

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.20

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.39 Professor: SHRIDEEP PALLICKARA

The endpoint issue

¨ Clients send their requests to an endpoint ¤ Port to which a server listens to ¨ But how do clients know about a port? ¤ Globally assign endpoints for well-known ports n Internet Assigned Numbers Authority (IANA) n FTP {TCP, 21}, HTTP {TCP, 80}

October 15, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.40 Professor: SHRIDEEP PALLICKARA

Implementing each service with a separate server could waste resources

¨ Instead of having multiple servers awaiting client requests ¤ Have a single super-server ¨ INETD daemon on Unix ¤ Listens to several ports for Internet services n Pop3 {110}, FTP {21}, Telnet {23} ¤ When request comes in:

Fork process to handle it

② Process exits once done

October 15, 2019

slide-21
SLIDE 21

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.21

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.41 Professor: SHRIDEEP PALLICKARA

Designing Servers: Support interruption

October 15, 2019

¨ Terminate client session ¤ Server will eventually detect connection loss (TCP) ¨ Send out-of-band data ¤ Data to be processed before any other client data ¨ But how can we send this out-of-band data?

① Send to a different port ② Reuse same connection

n TCP urgent data e.g. socket.sendUrgentData(int data)

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

STATE

October 15, 2019

L5.42

slide-22
SLIDE 22

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.22

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.43 Professor: SHRIDEEP PALLICKARA

Tracking State in Servers

¨ Stateless servers ¨ Stateful servers

October 15, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.44 Professor: SHRIDEEP PALLICKARA

Stateless servers

¨ No state information about clients ¤ E.g. Web Servers ¨ Usually some state is maintained ¤ Log of documents accessed by client ¤ But if this is lost, there should be no disruption of service ¨ Soft state: track state for a limited time ¤ When timer elapses, revert to default behavior

October 15, 2019

slide-23
SLIDE 23

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.23

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.45 Professor: SHRIDEEP PALLICKARA

Stateful servers

¨ Maintain persistent information on clients ¨ Use this to improve performance ¤ Real and perceived ¨ Special measures needed to recover from failures

October 15, 2019 CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.46 Professor: SHRIDEEP PALLICKARA

Stateful servers: A file server example

October 15, 2019

¨ Allows client to maintain local copy of file ¤ Even for updates to the file ¤ Maintain {client,file} tuples to track file state ¨ Identify who has most recent version of file ¨ If server crashes it must recover the {client,file} entries

slide-24
SLIDE 24

SLIDES CREATED BY: SHRIDEEP PALLICKARA L15.24

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.47 Professor: SHRIDEEP PALLICKARA

A hybrid approach: Have the client send its state to the server

October 15, 2019

¨ Cookies serve this purpose for Web pages ¨ Tells a site about the pages accessed by a user ¤ Use this to decide how to manage client ¤ Sent back to browser every time state info changes ¨ Cookies don’t stay where they are baked!

CS555: Distributed Systems [Fall 2019]

  • Dept. Of Computer Science, Colorado State University

L15.48 Professor: SHRIDEEP PALLICKARA

The contents of this slide-set are based on the following references

¨ Distributed Systems: Principles and Paradigms. Andrew S. Tanenbaum and Maarten Van

der Steen. 2nd Edition. Prentice Hall. ISBN: 0132392275/978-0132392273. [Chapter 6, 2]

¨ Distributed Systems: Concepts and Design. George Coulouris, Jean Dollimore, Tim

Kindberg, Gordon Blair. 5th Edition. Addison Wesley. ISBN: 978-0132143011. [Chapter 7, 14]

¨ Operating Systems Concepts. Avi Silberschatz, Peter Galvin, Greg Gagne. 8th edition.

John Wiley & Sons, Inc. ISBN-13: 978-0-470-12872-5. [Chapter 4]

¨ Unix Systems Programming. Kay Robbins & Steve Robbins, 2nd edition. Prentice Hall.

ISBN: 978-0-13-042411-2. [Chapter 2]

October 15, 2019