Overlayfs And Containers Miklos Szeredi, Red Hat Vivek Goyal, Red - - PowerPoint PPT Presentation

overlayfs and containers
SMART_READER_LITE
LIVE PREVIEW

Overlayfs And Containers Miklos Szeredi, Red Hat Vivek Goyal, Red - - PowerPoint PPT Presentation

Overlayfs And Containers Miklos Szeredi, Red Hat Vivek Goyal, Red Hat Introduction to overlayfs Union or? Union: all layers made equal How do you take the union of two files? Or a file and a directory? NO! Layers cant be


slide-1
SLIDE 1

Overlayfs And Containers

Miklos Szeredi, Red Hat Vivek Goyal, Red Hat

slide-2
SLIDE 2

Introduction to overlayfs

slide-3
SLIDE 3

Union or…?

  • Union: all layers made equal
  • How do you take the union of two files?
  • Or a file and a directory?
  • NO! Layers can’t be treated equal
slide-4
SLIDE 4

...overlay!

  • Layer upon layer upon layer…
  • Only upper layer can be modified

○ copy-up (exception: directory contents)

  • Objects in one layer cover up objects with the same name in layer(s) below
  • Exception: directories, which are merged
  • Exception for the exception: opaque directories
  • One more exception: whiteout

○ covers up anything and makes it look like nothing

slide-5
SLIDE 5

Design

  • Userspace API (most important!)

○ No new object types ■ Whiteout -> char dev with 0/0 device number ■ Opaque dir -> xattr

  • Make it as simple as possible (and not a bit simpler)

○ Most of the logic is in a separate filesystem module ○ Some VFS impact but not much; some FS impact but not much

  • Upstream early

○ It doesn’t have to do everything right; features can be added later...

slide-6
SLIDE 6

Implementation

  • Separate cache for the overlay directory tree

○ Allows less impact on VFS/FS ○ BUT bad for memory use

  • Shared cache for the file contents

○ Copy-up when opened for write (may be too early) ○ Ugliness when copy-up happens while file is already open read-only ○ BUT great for performance and memory use

  • Limitations

○ modifying lower layer -> don’t care ○ Not (yet) a “POSIX” filesystem (st_dev/ino quirks, directory rename, hard link copy-up, etc)

slide-7
SLIDE 7

Features added later

  • Multiple lower layers
  • Renaming directories
  • SELinux
  • POSIX ACL
  • File locking
slide-8
SLIDE 8

Features (work in progress)

  • RW-RO file consistency after copy-up

○ Just need to fix this case up in VFS

  • Fix st_dev, constant st_ino/d_ino

○ Store inode number for copied up files ○ Finding a common ino space for different underlying filesystems

  • Hard link copy up

○ Should be very rare ○ Can use a global database for storing inode numbers of copied up hard links

slide-9
SLIDE 9
  • verlayfs usage in docker
slide-10
SLIDE 10

Container 1

  • verlay graph driver

Image Layer 1 Image Layer 2 Image Layer N Container writable dir lower dir 1 lower dir 2 lower dir N upper dir merged dir (rootfs) Confined Process

docker daemon option --storage-driver=overlay Overlay supported single lower directory Hard links created between image layers Higher inode utilization

hard links

slide-11
SLIDE 11

Container 1

  • verlay2 graph driver

Image Layer 1 Image Layer 2 Image Layer N Container writable dir lower dir 1 lower dir 2 lower dir N upper dir merged dir (rootfs) Confined Process

docker daemon option --storage-driver=overlay2

  • verlayfs should support multiple lower dirs

No hardlinks and dir creation in every layer Better inode utilization

slide-12
SLIDE 12

Container security and overlayfs

slide-13
SLIDE 13

How do we handle access permissions? DAC(Ownership/Permissions) MAC (SELinux)

slide-14
SLIDE 14

Container 1

An example setup

upper dir 1 lower dir Image Layer merged Confined Process 1 upper dir 2

Container 2

merged Confined Process 2

Two containers sharing lower dir with separate upper dir

slide-15
SLIDE 15

Container 1

Escaped container process writes to image dir/files

upper dir 1 lower dir Image Layer merged Confined Process 1 upper dir 2

Container 2

merged Confined Process 2 Escaped Process 1

DAC allows writing to /etc/passwd

  • rw-r--r--. root root /etc/passwd
slide-16
SLIDE 16

Container 1

Security goal 1

upper dir 1 lower dir Image Layer merged Confined Process 1 upper dir 2

Container 2

merged Confined Process 2 Escaped Process 1

Do not allow writing to image dir/files

  • rw-r--r--. root root /etc/passwd
slide-17
SLIDE 17

Allow access through overlay mount point

Overlay mount lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)
slide-18
SLIDE 18

Deny write access on underlying file

Overlay mount lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)
slide-19
SLIDE 19

DAC allows access through both paths

(When root inside container is root outside) Overlay mount lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)
  • rw-r--r--. root root
slide-20
SLIDE 20

Read only label on lower files

Overlay mount lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585

system_u:object_r:container_share_t:s0

slide-21
SLIDE 21

Use context mount option for overlay

Overlay mount (context=label) lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 system_u:object_r:container_share_t:s0 system_u:object_r:container_file_t:s0:c16,c585

mount -t overlay -o context=”system_u:object_r:container_file_t:s0:c16,c585”.... merged/

slide-22
SLIDE 22

That did not work

slide-23
SLIDE 23

Access permission checks in overlay

  • verlay inode

context label

real inode

(real label) MAC DAC + MAC

inode_permission()

slide-24
SLIDE 24

Read only label on lower file

merged/foo.txt lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 system_u:object_r:container_file_t:s0:c16,c585 system_u:object_r:container_share_t:s0

  • rw-r--r--. root root
slide-25
SLIDE 25

Process Overlay inode check

merged/foo.txt lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 Process MAY_WRITE system_u:object_r:container_file_t:s0:c16,c585 system_u:object_r:container_share_t:s0

  • rw-r--r--. root root
slide-26
SLIDE 26

Process lower inode DAC check

merged/foo.txt lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 Process MAY_WRITE system_u:object_r:container_file_t:s0:c16,c585 system_u:object_r:container_share_t:s0 Process MAY_WRITE

  • rw-r--r--. root root
slide-27
SLIDE 27

Process lower inode MAC check

merged/foo.txt lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 Process MAY_WRITE system_u:object_r:container_file_t:s0:c16,c585 system_u:object_r:container_share_t:s0 Process MAY_WRITE

  • rw-r--r--. root root
slide-28
SLIDE 28

What if we don’t do WRITE checks on lower inode

slide-29
SLIDE 29

But that will break DAC

DAC checks happen only at real inode

  • r--r--r--. root root

merged/ lower/foo.txt upper/foo.txt

  • pen(foo.txt, O_RDWR)
  • r--r--r--. root root

Copy Up

slide-30
SLIDE 30

Why not do DAC checks on both inodes

  • r--r--r--. root root

merged/foo.txt lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

Copy Up

  • r--r--r--. root root
slide-31
SLIDE 31

That kind of worked but...

slide-32
SLIDE 32

Certain overlayfs operations failed MAC checks

slide-33
SLIDE 33

Certain overlayfs operations fail MAC checks File creation over whiteout

slide-34
SLIDE 34

Certain overlayfs operations fail MAC checks File creation over whiteout Use mounter’s creds for privileged operations

slide-35
SLIDE 35

Two Levels of Permission Checks

  • Overlay inode is checked with

creds of task

  • Underlying inode is checked

with creds of mounter

  • Certain privileged operations

are done with the creds of mounter

  • verlay inode

context label

real inode

(real label) DAC +MAC (Caller Creds) DAC + MAC Mounter Creds

inode_permission()

slide-36
SLIDE 36

Two levels of checks

merged/foo.txt lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585

  • rw-r--r--. root root

system_u:object_r:container_file_t:s0:c16,c585

  • rw-r--r--. root root

system_u:object_r:container_share_t:s0

slide-37
SLIDE 37

Process Overlay inode check

merged/foo.txt lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 Process MAY_WRITE

  • rw-r--r--. root root

system_u:object_r:container_file_t:s0:c16,c585

  • rw-r--r--. root root

system_u:object_r:container_share_t:s0

slide-38
SLIDE 38

Mounter real lower inode check

merged/foo.txt lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 Mounter MAY_READ Process MAY_WRITE

  • rw-r--r--. root root

system_u:object_r:container_file_t:s0:c16,c585

  • rw-r--r--. root root

system_u:object_r:container_share_t:s0

slide-39
SLIDE 39

First requirement met

slide-40
SLIDE 40

Container 1

Escaped process accesses other container’s data

upper dir 1 lower dir Image Layer merged Confined Process 1 upper dir 2

Container 2

merged Confined Process 2

Container1 accesses container2’s data

Escaped Process 1

  • rw-r--r--. root root /etc/data.txt
slide-41
SLIDE 41

Container 1

Security goal 2

upper dir 1 lower dir Image Layer merged Confined Process 1 upper dir 2

Container 2

merged Confined Process 2

One container should not be able to access other container’s data

Escaped Process 1

  • rw-r--r--. root root /etc/data.txt
slide-42
SLIDE 42

Label upper files for container access only

  • rw-r--r--. root root

merged/

system_u:object_r:container_file_t:s0:c16,c585

lower/foo.txt upper/

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 system_u:object_r:container_share_t:s0

slide-43
SLIDE 43

Label upper files for container access only

  • rw-r--r--. root root

merged/

system_u:object_r:container_file_t:s0:c16,c585

lower/foo.txt upper/foo.txt

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 system_u:object_r:container_share_t:s0

  • rw-r--r--. root root

system_u:object_r:container_file_t:s0:c16,c585

slide-44
SLIDE 44

One container can’t access data of another container

  • rw-r--r--. root root

merged/

system_u:object_r:container_file_t:s0:c16,c585

lower/foo.txt upper/foo.txt

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c16,c585 system_u:object_r:container_share_t:s0

  • rw-r--r--. root root

system_u:object_r:container_file_t:s0:c16,c585

  • pen(foo.txt, O_RDWR)

system_u:system_r:container_t:s0:c548,c591

slide-45
SLIDE 45

New LSM Hooks

  • inode_copy_up()

○ Called during copy up. Returns new set of creds for file creation. ○ For context mounts, file is created with label specified in context= option.

  • inode_copy_up_xattr()

○ Called during copy up of xattrs. SELinux blocks copying up of SELinux xattr.

  • dentry_create_files_as()

○ Called during creation of new file. Returns new set of creds for file creation. ○ For context mounts, file is created with label specified in context= option.

slide-46
SLIDE 46

Overlayfs vs. devicemapper

  • In general, faster than devicemapper

○ Page cache sharing

  • Not fully POSIX compliant, yet

○ So some workloads might experience issues

  • Fedora 26 will have overlay2 as default graph driver

○ Switch back to devicemapper if you face issues

slide-47
SLIDE 47

DAC and container security

  • DAC will solve these issues if containers run in user namespaces with

different mappings

  • Needing to do a chown on image continues to be a issue
  • shiftfs or something else?
slide-48
SLIDE 48

Thank You