Memory Management Marco Serafini COMPSCI 532 Lecture 12 - - PowerPoint PPT Presentation

memory management
SMART_READER_LITE
LIVE PREVIEW

Memory Management Marco Serafini COMPSCI 532 Lecture 12 - - PowerPoint PPT Presentation

Memory Management Marco Serafini COMPSCI 532 Lecture 12 Announcements Project 2 published Info on website GitHub repo on Piazza Deadline: November 7 Next class: hands-on session Threads, processes, sockets,


slide-1
SLIDE 1

Memory Management

Marco Serafini

COMPSCI 532 Lecture 12

slide-2
SLIDE 2

2

Announcements

  • Project 2 published
  • Info on website
  • GitHub repo on Piazza
  • Deadline: November 7
  • Next class: hands-on session
  • Threads, processes, sockets, …
  • Bring your laptop with Java SDK installed!
  • AWS credits: invitation published on Piazza
slide-3
SLIDE 3

Virtual Memory

slide-4
SLIDE 4

4

Virtual Memory in Everyday Life

  • Tabs store webpages in main

memory (RAM)

  • Many tabs à Data written to disk
  • Switch to old tabs à Disk reads
  • The whole system becomes slow

4

slide-5
SLIDE 5

5

Programmer’s View of Memory

Global vars Stack Heap free space 0000…0000 FFFF….FFFF

void foo (int i) { int x = i; int * z = new int; *z = i; } x in stack *z in heap

slide-6
SLIDE 6

6

Many Abstractions…

  • Our code has dedicated memory
  • As much memory as addressable
  • 264-1 = 1.8 * 1019 addresses
  • One address, one 64-bit word
  • ~ 1.47 * 1020 bytes
  • > 105 petabytes (or 100 exabytes)
  • Neither assumption holds!

Virtual memory implements these abstractions

Global vars Stack Heap free space 0000…0000 FFFF….FFFF

slide-7
SLIDE 7

7

Memory Hierarchy and Caching

Fast memory

(small, expensive)

Slow memory

(large, cheap) Cache recently read data Evict inactive data

General pattern

slide-8
SLIDE 8

8

Memory Hierarchy and Caching

Fast memory

(small, expensive)

Slow memory

(large, cheap) Cache recently read data Evict inactive data

General pattern Virtual Memory

Fast Physical Memory (RAM) Slow Disk

slide-9
SLIDE 9

9

1 … … … … … … 264-1 access physical address access disk location, cache in main memory, return CPU executes store/load

  • peration
  • n virtual

address actual data actual data

Virtual Memory

Address translation virtual à physical Virtual Address Space (one separate per process) 1 2 3 3 OR Each byte has a 64 bit VIRTUAL address. Data layout as seen by the program (virtual)

slide-10
SLIDE 10

10

Extended Memory Hierarchy

  • Hierarchy is getting deeper
  • Non-volatile memory
  • Solid State Drives (SSD)
  • We will consider a simplified

view, as most OSs do

  • Swap partition on disk
  • File or separated logical drive

Caches Main memory Disk Swap

slide-11
SLIDE 11

11

Advantages of Virtual Memory

  • For programmers
  • Can use more space than main memory…
  • … with limited performance overhead
  • ... without having to explicitly code disk spilling
  • For the Operating System
  • Isolation: Each process has its own Virtual Address Space
  • Easier to enforce memory protection
slide-12
SLIDE 12

Virtual Memory v1

Basic functionalities: Address translation + caching

slide-13
SLIDE 13

13

Granularity

  • Question: Which level of granularity is better?

1. Group contiguous virtual addresses in a block…

  • … and move the block to and from disk at once

2. Store each virtual address separately…

  • … and move single bytes back and forth from disk
  • Correct Answer: 1
  • A virtual memory block is called page
  • Block of contiguous memory addresses
  • Treated as an “atom” by virtual memory
  • Usually a few KB (e.g., 4KB in my laptop)
slide-14
SLIDE 14

14

page address A A+O Page table address + (P * size(row))

Address Translation

P O … … … byte for V … 212-1 Memory page P 212 = 4096 bytes = 4 KB

  • ffset (O)

virtual address V 12 bit 52 bit page offset page number … … A dirty? disk? .. Page table (one separate per process) 1 2

2 main memory accesses (if page in main memory)

Page address Control bits Page table address

slide-15
SLIDE 15

15

Granularity: Advantages of Paging

  • A byte-granularity translation table would be huge
  • The page table is smaller
  • Sequential disk I/O is faster than random access
  • We read one full page at a time
  • Access locality
  • A process accessing an address is likely to access a nearby

address soon thereafter

slide-16
SLIDE 16

16

Caching: Fetching Pages from Disk

  • “Residency” control bit says if page is in memory
  • If page is not in memory: page fault
  • How virtual memory deals with page faults:

1. Get the disk position from the page table 2. Invoke the OS to fetch page from disk (swap file) 3. Store the page in memory, update page table 4. Return memory address

slide-17
SLIDE 17

17

Caching: Evicting Pages to Disk

  • If main memory is full, remove some page (eviction)
  • Update the residency bit and address in page table
  • “Modified” control bit: if modified, write evicted page to disk
  • Which pages to evict? Least Recently Used (LRU)
  • A simple implementation

1. Keep an “accessed” control bit 2. Periodically reset it 3. Pages that were not accessed can be removed

  • Actually more complex than that (requires hw support)
slide-18
SLIDE 18

18

Questions

  • Address translation occurs infrequently
  • True / false?
  • Address translation is better done in software
  • True / false?
  • The page table should be stored inside the CPU
  • True / false?
slide-19
SLIDE 19

Virtual Memory v2

Speeding up translation

slide-20
SLIDE 20

20

Hardware Support

  • Software implementation (OS only) is too slow
  • Solution: some VM logic implemented in hardware
  • Memory Management Unit (MMU)
  • Translation Lookaside Buffer (TLB)

CPU MMU TLB Bus

CPU accesses virtual addresses MMU accesses physical addresses (directly)

  • r disk addresses (with the OS)

CPU package Main memory Disk controller

Page table cache MMU implements address translation

slide-21
SLIDE 21

21

Translation Lookaside Buffer

  • Cache of page table entries
  • Entries: <Page number, Page address, Control bits>
  • Question: Can we access it using offsets like the

page table?

  • TLB is a fully associative cache
  • Need to read all elements to find the page table
  • This is done in parallel in hardware
slide-22
SLIDE 22

22

page address A A+O Page table address + (P * size(row))

Question: Page Table in Memory?

P O … … … byte for V … 212-1 Memory page P 212 = 4096 bytes = 4 KB

  • ffset (O)

virtual address V 12 bit 52 bit page offset page number … … A dirty? disk? .. Page table (one separate per process) 1 2 Page address Control bits Page table address

slide-23
SLIDE 23

Virtual Memory v3

Dealing with large page tables

slide-24
SLIDE 24

24

Page Table in Memory

  • Page tables can be very large
  • 32 bit architecture: A few (~ 4) megabytes
  • 64 bit architecture: (252 – 1 = 4.5 * 1015 entries) * a few bytes
  • How to avoid keeping everything in memory
  • Hierarchical page tables
  • Indirection with intermediate page tables
  • Partition page tables in chunks
  • Not all page tables chunks in memory
  • Inverted page tables
  • Hash map
  • Need additional data structures to track pages on disk
slide-25
SLIDE 25

25

PC + (P2 * size(row)) (chunk could be on disk) Page directory address + (P1 * size(row))

Hierarchical (2-Level) Page Table

P1 P2 O 12 bit 52 bit page offset page number PC Page directory (separate per process) 1 2 A dirty? disk? .. Page table chunk (separate per process) A + O (page could be

  • n memory or

disk) 3 Page table address Page address Control bits virtual address V

slide-26
SLIDE 26

26

A+O Page table address + (hash (P) mod N) must consider hash collisions

Inverted Page Table

P O … … Byte for V … 212-1 Memory page P 212 = 4096 bytes = 4 KB

  • ffset (Y)

12 bit 52 bit page offset page number A dirty? disk? .. Inverted page table (separate per process) 1 2 keeps only N elements needs additional data structures to track pages on disk (not depicted) Page address Control bits virtual address V

slide-27
SLIDE 27

27

Dynamic Page Sizing

  • Some architectures support multiple sizes
  • Examples: 4 KB, 2 MB, 1 GB, …
  • Linux: Huge Pages
  • Windows: Large Pages
  • FreeBSD: SuperPages
slide-28
SLIDE 28

Virtual Memory v4

Memory Protection

slide-29
SLIDE 29

29

Memory Protection

  • Page table / TLB store access rights
  • Read-Write, Read-only, Execute only
  • Which processes can access page (for sharing)
  • Only OS can change tables
  • I.e. only kernel-mode operations
slide-30
SLIDE 30

Memory-Mapped Files

slide-31
SLIDE 31

31

Memory-Mapped Files (mmap)

  • Goal: Use VM to cache your application file to memory
  • OS lets you map a file to a virtual memory space
  • Same VM mechanisms, but on your file (not swap file)
  • You access the file as if it was in memory
  • OS will transfer data from file to memory and back
  • Problem: you never know when data is written to disk…
  • Reads by multiple threads à excessive disk I/O (potentially)
  • Upon a crash, data might be lost (if page not yet written to disk)
  • ... unless you flush a page, which forces a write to disk
  • So use with care!
slide-32
SLIDE 32

32

Languages with Managed Memory

slide-33
SLIDE 33

33 33

Traditional Languages (C/C++)

  • Basic heap management
  • Must manually allocate/deallocate regions

(malloc/free)

  • Must manage pointers
  • Problem: difficult to tell if a region is still in use
  • Memory leaks: memory is not deallocated
  • Incorrectly reclaim memory for objects that are still used
  • Hard to debug: memory access may return “junk”
  • Advantage: control
slide-34
SLIDE 34

34

34

Memory-Managed Languages (Java)

  • “No pointers”: object variables are pointers
  • Garbage collection deallocates memory for you
  • Advantage: never read “junk”
  • Disadvantage: performance bottlenecks
  • Overhead can be mitigated by understanding how GC works!
  • Goals of GC
  • Maximize memory utilization
  • Minimize memory fragmentation
  • Minimize GC overhead
slide-35
SLIDE 35

35

35

Generational Garbage Collection

  • Standard approach used by Java and other langs
  • Assumptions
  • Most instantiated objects are short-lived
  • Few pointers between long-lived and short-lived objects
  • Approach: keep two generations
  • Young generation: recently instantiated objects
  • Old generation: non-recently instantiated objects
  • Following discussion targets Java but the concepts

are general

slide-36
SLIDE 36

36 36

Determining “Live” Objects

  • Algorithm called “Marking"
  • Start from a root object
  • Traverse all references (pointers)
  • See which objects are not referenced any longer
slide-37
SLIDE 37

37

37

Young Generation

  • Divided in three sections
  • Eden
  • Two survivors spaces S0 and S1
  • New objects are instantiated in eden
  • Survivor spaces labeled “From” and “To”
  • From contains objects, To is empty
  • When eden fills up, minor garbage collection
slide-38
SLIDE 38

38

38

Minor Garbage Collection

  • “Stop-the-world”: JVM stops while GCing
  • Steps
  • Scan Eden and From survivor space
  • Copy all live objects to To area
  • Logically clear Eden and From area
  • Switch From and To label
  • After n copies, objects promoted to Old generation
  • Logic
  • Eden and From areas are always compact
  • When scanning, few objects will be live and copied
slide-39
SLIDE 39

39

39

Major Garbage Collection

  • Run less frequently
  • “Stop-the-world”
  • Steps
  • Scan Old generation
  • Remove objects that are not referenced
  • Compact
slide-40
SLIDE 40

40

40

Permanent Generation

  • Used by JVM to store metadata (classes etc.)
  • Limited control over it
slide-41
SLIDE 41

41

41

GC and Data Processing

  • Large datasets, potentially LOTS of objects
  • Eden can fill up very quickly
  • Very frequent minor GCs
  • GC at one node can slow down the whole system
  • GC adds further load to CPU
slide-42
SLIDE 42

42

42

Workarounds

  • Reuse objects
  • Say that you are scanning a table of objects
  • Instantiate a single cursor object and reassign its fields
  • Eventually this object becomes old and not GCed
  • Use primitive data types
  • Boxing: Integer vs. int
  • int[] vs. ArrayList<Integer>
  • Beware: Strings, Integers etc. are immutable
  • Specialized libraries (fastutil, koloboke, …)