pinned_vector A Contiguous Container without Pointer Invalidation - - PowerPoint PPT Presentation

pinned vector
SMART_READER_LITE
LIVE PREVIEW

pinned_vector A Contiguous Container without Pointer Invalidation - - PowerPoint PPT Presentation

pinned_vector A Contiguous Container without Pointer Invalidation Meeting C++ 2018 std::vector contiguous layout cache locality fastest iteration O(1) lookup random access amortized O(1) growth 2 std::vector contiguous layout cache


slide-1
SLIDE 1

pinned_vector

A Contiguous Container without Pointer Invalidation

Meeting C++ 2018

slide-2
SLIDE 2

std::vector

2

contiguous layout cache locality fastest iteration O(1) lookup random access amortized O(1) growth

slide-3
SLIDE 3

std::vector

3

contiguous layout cache locality fastest iteration O(1) lookup random access amortized O(1) growth

POINTER INVALIDATION

slide-4
SLIDE 4

capacity=6

std::vector Invalidation

4

slide-5
SLIDE 5

capacity=6

std::vector Invalidation

5

slide-6
SLIDE 6

capacity=12 capacity=6

std::vector Invalidation

6

slide-7
SLIDE 7

capacity=12 capacity=6

std::vector Invalidation

7

slide-8
SLIDE 8

capacity=12 capacity=6

std::vector Invalidation

8

slide-9
SLIDE 9

std::vector Invalidation

9

may invalidate all always invalidates all may invalidate other push_back clear insert emplace_back assign erase insert emplace reserve resize shrink_to_fit

slide-10
SLIDE 10

10

Contiguous Storage Invariant

slide-11
SLIDE 11

Contiguous Storage Invariant

11

erase( )

slide-12
SLIDE 12

Contiguous Storage Invariant

12

erase( )

slide-13
SLIDE 13

Contiguous Storage Invariant

13

erase( )

slide-14
SLIDE 14

Contiguous Storage Invariant

14

insert( )

slide-15
SLIDE 15

Contiguous Storage Invariant

15

insert( )

slide-16
SLIDE 16

Contiguous Storage Invariant

16

insert( )

slide-17
SLIDE 17

Alternatives with Truly Stable Pointers

17 https://en.cppreference.com/w/cpp/container

slide-18
SLIDE 18

Alternatives with Truly Stable Pointers

18

boost::stable_vector<T>

  • Not a “vector”
  • Not contiguous
  • Equivalent to

vector<unique_ptr<T>>

T* T* T* T* T* T* T* T* T* T* T* T*

slide-19
SLIDE 19

Alternatives with Truly Stable Pointers

19

plf::colony

slide-20
SLIDE 20

Alternatives with Truly Stable Pointers

20

plf::colony

  • Manages elements in disjoint

memory chunks

  • Contiguous layout not guaranteed
  • Iteration performance comparable to

std::deque

  • Primary use case is storage, not

iteration

slide-21
SLIDE 21

“A Contiguous Container without Pointer Invalidation”

21

slide-22
SLIDE 22

“A Contiguous Container without Pointer Invalidation”

22

not quite… must maintain contiguous layout invariant

slide-23
SLIDE 23

“A Contiguous Container with Essential Pointer Invalidation”

23

The minimum amount of pointer invalidation absolutely necessary to maintain the contiguous layout invariant. If insertion or erasure occurs only at the end of the container then pointers to all other elements shall remain valid. Idealized std::vector with infinite capacity.

slide-24
SLIDE 24

std::vector Invalidation

24

may invalidate all always invalidates all may invalidate other push_back clear insert emplace_back assign erase insert emplace reserve resize shrink_to_fit

slide-25
SLIDE 25

pinned_vector Invalidation

25

may invalidate all always invalidates all may invalidate other push_back clear insert emplace_back assign erase insert emplace reserve resize shrink_to_fit

slide-26
SLIDE 26

Virtual Memory History

  • Introduced in DEC’s VAX-11/780

(“Virtual Address eXtension”, 1977)

  • First consumer CPU with integrated

MMU Intel 80286 (1982)

26

slide-27
SLIDE 27

Virtual Memory

  • Illusion of huge memory
  • Abstraction of Hardware Storage and Resources

○ Physical Memory ○ Filesystem ○ Memory mapped I/O ○ Inter-Process Communication

27

slide-28
SLIDE 28

Virtual Memory vs Physical Memory

28

#include <memory> #include <iostream> int main() { auto foo = std::make_unique( 42) std::cout << foo.get() << std::endl; return 0; }

slide-29
SLIDE 29

Virtual Memory

29

Virtual Memory Main Memory Filesystem GPU Memory Other Process

slide-30
SLIDE 30

Virtual Memory

  • Process isolation

○ Separate address space

  • More space then physical available

○ x86-64 eg. 128TiB

30

slide-31
SLIDE 31

Page

  • Fixed size block of virtual memory
  • Most CPUs have a minimum page size of 4 KiB

○ Memory aligned in page size

  • Huge Pages

○ x86-64 has also 2 MiB and 1 GiB pages ○ Performance

31

slide-32
SLIDE 32

Memory Management Unit

  • Everyone here has seen it in action already

○ terminated by signal SIGSEGV (Address boundary error) ○ Access Violation

  • Separate part on the CPU to map virtual memory addresses to physical

memory addresses

  • Page protection

○ Check Read, Write, Executable Bit

32

slide-33
SLIDE 33

Translation Lookaside Buffer

  • Part of the MMU
  • Stores mapping of physical and virtual addresses
  • Hardware accelerated
  • Typically has 4096 entries

33

slide-34
SLIDE 34

Page Table

  • Cache for TLB
  • Stored in memory
  • Page walk

○ Hardware or Software

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

Swap Space

  • File / Partition
  • Unused Pages are saved on disk to free physical memory
  • Controlled by the OS

36

slide-37
SLIDE 37

Page-Faults

37

Virtual Memory Physical Memory Swap file

slide-38
SLIDE 38

Page-Faults

38

Virtual Memory Physical Memory Swap file

slide-39
SLIDE 39

Page-Faults

39

Virtual Memory Physical Memory Swap file

slide-40
SLIDE 40

Page-Faults

40

Virtual Memory Physical Memory Swap file

slide-41
SLIDE 41

Page-Faults

41

Virtual Memory Physical Memory Swap file

slide-42
SLIDE 42

Page-Faults

42

  • Access to pages which are not loaded in physical memory
  • Swap of pages into/from swap file
  • Super expensive
slide-43
SLIDE 43

TLB Miss

43

MMU Memory Page table TLB Translate virtual address ✅ Return physical address

slide-44
SLIDE 44

Thrashing

44

  • Constant swapping of pages
  • Unresponsive system

○ Filesystem Access

slide-45
SLIDE 45

Mapping Memory

  • Prevents other allocations within reserved

area

  • Does not consume memory or swap space

45

  • Get physical memory space
  • Consumes memory or swap space

Reserve Commit

slide-46
SLIDE 46

Virtual Memory Address Space

fff...

pinned_vector Internals

46

auto v = pinned_vector<int>(max_elements(1’000’000’000));

VirtualAlloc(..., MEM_RESERVE); mmap(..., PROT_NONE, MAP_ANON | MAP_PRIVATE);

v.max_size();

max_pages max_bytes

slide-47
SLIDE 47

Virtual Memory Address Space

fff...

pinned_vector Internals

47

auto v = pinned_vector<int>(max_elements(1’000’000’000)); v.push_back(279); v.push_back(188); ...

VirtualAlloc(..., MEM_COMMIT); mprotect(..., PROT_READ | PROT_WRITE);

slide-48
SLIDE 48

Virtual Memory Address Space

fff...

pinned_vector Internals

48

auto v = pinned_vector<int>(max_elements(1’000’000’000)); v.pop_back(); ...

VirtualFree(..., MEM_DECOMMIT); mprotect(..., PROT_NONE); madvise(..., MADV_DONTNEED);

v.shrink_to_fit();

slide-49
SLIDE 49

But Is It Any Good?

49

std::vector pinned_vector

auto v = Container<T>(); v.reserve(n); ⏱ fill_n(back_inserter(v), n, x); ⏱

Round 1: establish a common baseline

slide-50
SLIDE 50

50

Baseline for int

slide-51
SLIDE 51

51

Baseline for bigval

struct bigval { double data[10]; };

slide-52
SLIDE 52

52

Baseline for std::string

slide-53
SLIDE 53

53

Baseline All

slide-54
SLIDE 54

So Is It Any Good?

54

std::vector pinned_vector

auto v = Container<T>(); v.reserve(n); ⏱ fill_n(back_inserter(v), n, x); ⏱

Round 2: size not known upfront

slide-55
SLIDE 55

55

Total Time for int

slide-56
SLIDE 56

56

Total Time for bigval

struct bigval { double data[10]; };

slide-57
SLIDE 57

57

Total Time for std::string

slide-58
SLIDE 58

58

Total Time

slide-59
SLIDE 59

Yes It Is Good

59

std::vector pinned_vector Round 3: so how much faster is it?

  • Normalize the runtimes:
  • Treat vector<T> time as 1.0
  • Rescale pinned_vector<T> time based on that
slide-60
SLIDE 60

60

Total Speedup

Windows 10 build 17134 (x64) Intel Core i7-7700HQ @ 2.80 GHz Clang-7.0.0 (VS 15.8.4 stdlib)

slide-61
SLIDE 61

61

Total Speedup

MacOS 10.14.1 (x64) Intel Core i7-7820HQ @ 2.90 GHz Apple LLVM 10.0.0 (clang-1000.11.45.5)

slide-62
SLIDE 62

62

Total Speedup

Windows 10 build 17134 (x64) Intel Core i7-7820HQ @ 2.90 GHz Clang-7.0.0 (VS 15.8.4 stdlib)

slide-63
SLIDE 63

But Why Is It Good?

63

std::vector pinned_vector Round 4: where does a vector’s time go?

auto v = vector<T, bump_alloc>(); ⏱ fill_n(back_inserter(v), n, x); ⏱ ≡ total time - allocations ≡ insertion + copying ≡ baseline + copying Times for: insertion + allocation + copying

slide-64
SLIDE 64

64

Breakdown of push_back

slide-65
SLIDE 65

65

Breakdown of push_back

slide-66
SLIDE 66

66

Breakdown of push_back

slide-67
SLIDE 67

Benchmark Conclusions

67

push_back with preceding reserve() roughly equivalent slower than std::vector for small sizes faster than std::vector after a breaking point achieved by not copying values around exact numbers vary significantly by system and value_type

slide-68
SLIDE 68

Availability

  • Virtual Memory Support
  • Desktop

○ Linux ○ macOS ○ Windows

  • Mobile

○ Android ○ iOS (reserve limited by physical memory)

68

slide-69
SLIDE 69

Use Case ECS

  • ECS: Entity Component System
  • Entity: ID
  • Component: Data only storage
  • System: Uses Components to operate on these
  • Data Oriented Design

○ Data oriented design in C++ by Mike Acton ○ Data-oriented design in practice by Stoyan Nikolov

  • Mostly used in Games

69

slide-70
SLIDE 70

ECS with std::vector

70

Storage (std::vector) Handle: Raw Pointer to Component

slide-71
SLIDE 71

ECS with std::vector

71

Storage (std::vector) Handle: Raw Pointer to Component New Storage (std::vector)

slide-72
SLIDE 72

ECS with std::vector

72

Handle:

  • Index

Entity System Logic Storage Index std::vector Data Component

slide-73
SLIDE 73

ECS with std::vector

73

Pro:

  • Dynamic Storage

○ grow/shrink dynamically during runtime Con:

  • Use of Handles

○ e.g. index ○ Indirection

slide-74
SLIDE 74

ECS with std::array

74

Pro:

  • No Indirection

Con:

  • Preallocate memory

=> waste of memory

  • Need max size
  • No dynamic resizing
slide-75
SLIDE 75

ECS with pinned_vector

75

Component Handle:

  • Pointer

Entity System Logic Storage of Components Data Component pinned_vector

slide-76
SLIDE 76

Future Work

  • pinned_stack
  • Shared memory
  • Page-fault avoiding hash table

76

slide-77
SLIDE 77

Thank you

77

Implementation will be released at https://github.com/mknejp/vmcontainer Once all the finishing touches are done.

Jakob Schweisshelm @jakouf Miro Knejp @mknejp