Dynamic Large Pages Dave Hansen IBM Linux Technology Center Why - - PowerPoint PPT Presentation

dynamic large pages
SMART_READER_LITE
LIVE PREVIEW

Dynamic Large Pages Dave Hansen IBM Linux Technology Center Why - - PowerPoint PPT Presentation

Dynamic Large Pages Dave Hansen IBM Linux Technology Center Why Large Pages? Fewer objects to manage Fit more objects in CPU caches Per-page operations become cheaper Per-page structures become smaller Any cache miss is


slide-1
SLIDE 1

Dynamic Large Pages

Dave Hansen

IBM Linux Technology Center

slide-2
SLIDE 2

The Linux Foundation Confidential 2

Why Large Pages?

  • Fewer objects to manage
  • Fit more objects in CPU caches
  • Per-page operations become cheaper
  • Per-page structures become

“smaller”

  • Any cache miss is increasingly

expensive

  • They are “special” on Linux
slide-3
SLIDE 3

The Linux Foundation Confidential 3

“Old” Workloads

  • Performance is critical
  • Grew out of HPC/DB space
  • Large Memory Footprint
  • Page-level handling (faults, etc...)
  • Willing to work around usability
  • mlock() tolerance
  • “Custom” Applications
slide-4
SLIDE 4

The Linux Foundation Confidential 4

State of the Art

  • Interfaces: fs / SHM / libhugetlbfs
  • Faulting replaces preallocation
  • COW support for private at fork()
  • Reservations
  • Quota Support
  • NUMA Policy Awareness
  • Lumpy Reclaim
  • Gigantic Pages
slide-5
SLIDE 5

The Linux Foundation Confidential 5

Linux VM Support

  • NUMA
  • Delayed allocation- better locality
  • Round-robin pool population
  • Deterministic COW support
  • Needed for MAP_PRIVATE support
  • MAP_PRIVATE needed for transparent

replacement

  • Lumpy Reclaim
slide-6
SLIDE 6

Admin Interfaces

  • Display
  • /proc/meminfo (incomplete)
  • /sys/kernel/mm/hugepages
  • Configuration
  • hugepages=
  • /proc/sys/vm/nr_hugepages (r/w)
  • Not 100% dependable
  • kernelcore=
  • hugeadm – wrapper for all of these
slide-7
SLIDE 7

The Linux Foundation Confidential 7

Multiple HW Sizes

  • ppc64: 4k, 64k, 16M, 16GB*
  • x86: 4k, 2M/4M, 1GB*
  • ia64: everything
  • parisc: 4M
  • s390: 4k, 1M
  • sh: 64k, 256k, 1M, 4M, 64M, 512M
  • sparc: 64k, 512k, 4M

* Gigantic Page size Base page size option

slide-8
SLIDE 8

The Linux Foundation Confidential 8

Gigantic Pages

  • amd64: 1GB
  • powerpc: 16GB
  • Early allocation is required
  • before power on – ppc
  • boot-time – x86
  • Separate pools from regular page

allocation and other huge pages

slide-9
SLIDE 9

The Linux Foundation Confidential 9

Multi-size support

  • Compile time selection?
  • Gigantic mean no one-size-fits-all

approach can possibly work

  • sysfs interfaces
  • enumerate/allocate
  • Permit multiple mounts
  • Separate allocation pools
slide-10
SLIDE 10

The Linux Foundation Confidential 10

Virtualization - KVM

  • Large memory use
  • mlock()
  • Custom app, willing to modify
  • Performance concerns...
  • TLB miss 5x cost, with new h/w
  • Perfect huge page application!
slide-11
SLIDE 11

The Linux Foundation Confidential 11

Caveats

  • Fragmentation
  • Locked into memory – no reclaim
  • Hardware must be dedicated
  • Separate, discrete interfaces
  • Permissions
  • Amplification of bad NUMA

placement decisions

  • Architecture TLB weakness
slide-12
SLIDE 12

The Linux Foundation Confidential 12

Candidate Users

  • Large, contiguous memory users
  • Poor temporal or spatial locality
  • Bottlenecks on fault speed
  • Pagetable size overhead
  • Large shared mapping
  • Pagetable cache footprint
slide-13
SLIDE 13

The Linux Foundation Confidential 13

Application Work

  • Using SHM? Add SHM_HUGETLB
  • libhugetlbfs
  • Drop-in replacement for

malloc()/shmget()

  • Works for complex apps like firefox!
  • Link normally or use LD_PRELOAD
  • Executables in huge pages
  • Administraton with hugeadm
slide-14
SLIDE 14

The Linux Foundation Confidential 14

Future Work

  • User Stacks
  • Transparent promotion/demotion
  • Continuing improvements in page

reclamation

  • Power management / Memory

Hotplug

  • libhugetlbfs
  • documentation/usability
slide-15
SLIDE 15

The Linux Foundation Confidential 15

Further Reading

  • Cost of Pagetable lookups in virtual machines:
  • http://www.amd64.org/fileadmin/user_upload/pub/p26-bhargava.pdf
  • http://sourceforge.net/projects/libhugetlbfs/
  • http://www.ibm.com/developerworks/wikis/display/LinuxP/libhugetlbfs+FAQs
slide-16
SLIDE 16

04/09/09 1 Click to add title

Dynamic Large Pages

Dave Hansen

IBM Linux Technology Center

slide-17
SLIDE 17

04/09/09 2

The Linux Foundation Confidential 2 The Linux Foundation Confidential 2

Why Large Pages?

  • Fewer objects to manage
  • Fit more objects in CPU caches
  • Per-page operations become cheaper
  • Per-page structures become

“smaller”

  • Any cache miss is increasingly

expensive

  • They are “special” on Linux

It costs the same number of cpu cycles more or less to do a large page minor fault or a small page one. But, the benefits of a large page fault are much higher. smaller in terms of percentage. A fixed N-byte object becomes relatively much smaller when the M-byte page it represents gets larger 'expensive' in terms of performance. CPUs are bottlenecked on memory bandwidth and caches are continuing to increase in their importance.

slide-18
SLIDE 18

04/09/09 3

The Linux Foundation Confidential 3 The Linux Foundation Confidential 3

“Old” Workloads

  • Performance is critical
  • Grew out of HPC/DB space
  • Large Memory Footprint
  • Page-level handling (faults, etc...)
  • Willing to work around usability
  • mlock() tolerance
  • “Custom” Applications

There are classic workloads that have used large pages not necessarily the ones where they best fit

slide-19
SLIDE 19

04/09/09 4

The Linux Foundation Confidential 4 The Linux Foundation Confidential 4

State of the Art

  • Interfaces: fs / SHM / libhugetlbfs
  • Faulting replaces preallocation
  • COW support for private at fork()
  • Reservations
  • Quota Support
  • NUMA Policy Awareness
  • Lumpy Reclaim
  • Gigantic Pages
slide-20
SLIDE 20

04/09/09 5

The Linux Foundation Confidential 5 The Linux Foundation Confidential 5

Linux VM Support

  • NUMA
  • Delayed allocation- better locality
  • Round-robin pool population
  • Deterministic COW support
  • Needed for MAP_PRIVATE support
  • MAP_PRIVATE needed for transparent

replacement

  • Lumpy Reclaim

COW usage used to give random app behavior. Now we can at least guarantee that parents will keep their huge pages and children have an

  • pportunity to to get their own copies, too.
slide-21
SLIDE 21 The Linux Foundation Confidential 6

Admin Interfaces

  • Display
  • /proc/meminfo (incomplete)
  • /sys/kernel/mm/hugepages
  • Configuration
  • hugepages=
  • /proc/sys/vm/nr_hugepages (r/w)
  • Not 100% dependable
  • kernelcore=
  • hugeadm – wrapper for all of these
slide-22
SLIDE 22

04/09/09 7

The Linux Foundation Confidential 7 The Linux Foundation Confidential 7

Multiple HW Sizes

  • ppc64: 4k, 64k, 16M, 16GB*
  • x86: 4k, 2M/4M, 1GB*
  • ia64: everything
  • parisc: 4M
  • s390: 4k, 1M
  • sh: 64k, 256k, 1M, 4M, 64M, 512M
  • sparc: 64k, 512k, 4M

* Gigantic Page size Base page size option

just an indicator of why we need hstates so badly

slide-23
SLIDE 23

04/09/09 8

The Linux Foundation Confidential 8 The Linux Foundation Confidential 8

Gigantic Pages

  • amd64: 1GB
  • powerpc: 16GB
  • Early allocation is required
  • before power on – ppc
  • boot-time – x86
  • Separate pools from regular page

allocation and other huge pages

just an indicator of why we need hstates so badly

slide-24
SLIDE 24

04/09/09 9

The Linux Foundation Confidential 9 The Linux Foundation Confidential 9

Multi-size support

  • Compile time selection?
  • Gigantic mean no one-size-fits-all

approach can possibly work

  • sysfs interfaces
  • enumerate/allocate
  • Permit multiple mounts
  • Separate allocation pools
slide-25
SLIDE 25

04/09/09 10

The Linux Foundation Confidential 10 The Linux Foundation Confidential 10

Virtualization - KVM

  • Large memory use
  • mlock()
  • Custom app, willing to modify
  • Performance concerns...
  • TLB miss 5x cost, with new h/w
  • Perfect huge page application!
slide-26
SLIDE 26

04/09/09 11

The Linux Foundation Confidential 11 The Linux Foundation Confidential 11

Caveats

  • Fragmentation
  • Locked into memory – no reclaim
  • Hardware must be dedicated
  • Separate, discrete interfaces
  • Permissions
  • Amplification of bad NUMA

placement decisions

  • Architecture TLB weakness
slide-27
SLIDE 27

04/09/09 12

The Linux Foundation Confidential 12 The Linux Foundation Confidential 12

Candidate Users

  • Large, contiguous memory users
  • Poor temporal or spatial locality
  • Bottlenecks on fault speed
  • Pagetable size overhead
  • Large shared mapping
  • Pagetable cache footprint

Temporal locality – Tendency to re-reference memory – Sparse accesses imply low temporal locality – Use-once (e.g. STREAM) has low locality – Tree elimination solves have higher locality

  • Spacial locality

– Tendency to reference nearby memory – Random access low locality – Cache blocking, higher spacial locality

slide-28
SLIDE 28

04/09/09 13

The Linux Foundation Confidential 13 The Linux Foundation Confidential 13

Application Work

  • Using SHM? Add SHM_HUGETLB
  • libhugetlbfs
  • Drop-in replacement for

malloc()/shmget()

  • Works for complex apps like firefox!
  • Link normally or use LD_PRELOAD
  • Executables in huge pages
  • Administraton with hugeadm
slide-29
SLIDE 29

04/09/09 14

The Linux Foundation Confidential 14 The Linux Foundation Confidential 14

Future Work

  • User Stacks
  • Transparent promotion/demotion
  • Continuing improvements in page

reclamation

  • Power management / Memory

Hotplug

  • libhugetlbfs
  • documentation/usability
slide-30
SLIDE 30

04/09/09 15

The Linux Foundation Confidential 15 The Linux Foundation Confidential 15

Further Reading

  • Cost of Pagetable lookups in virtual machines:
  • http://www.amd64.org/fileadmin/user_upload/pub/p26-bhargava.pdf
  • http://sourceforge.net/projects/libhugetlbfs/
  • http://www.ibm.com/developerworks/wikis/display/LinuxP/libhugetlbfs+FAQs