Improve ARM guest performance with 64KB pages Julien Grall - - PowerPoint PPT Presentation

improve arm guest performance with 64kb pages
SMART_READER_LITE
LIVE PREVIEW

Improve ARM guest performance with 64KB pages Julien Grall - - PowerPoint PPT Presentation

Improve ARM guest performance with 64KB pages Julien Grall julien.grall@citrix.com Xen Developper Summit 2015 K ezaco Why? Constraints Implementation Improvements Conclusion K ezaco Page is 64KB Remove 1-level of page table


slide-1
SLIDE 1

Improve ARM guest performance with 64KB pages

Julien Grall julien.grall@citrix.com Xen Developper Summit 2015

slide-2
SLIDE 2

K´ ezaco Why? Constraints Implementation Improvements Conclusion

K´ ezaco

◮ Page is 64KB ◮ Remove 1-level of page table compare to 4K

◮ Faster TLB lookup

◮ Introduced for AArch64 in ARMv8

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 2 / 20

slide-3
SLIDE 3

K´ ezaco Why? Constraints Implementation Improvements Conclusion

4KB page granularity

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 3 / 20

slide-4
SLIDE 4

K´ ezaco Why? Constraints Implementation Improvements Conclusion

64KB page granularity

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 4 / 20

slide-5
SLIDE 5

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Why?

◮ Choice of the granularity done at config time in Linux ◮ Some major distribution will ship only Linux with 64KB page

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 5 / 20

slide-6
SLIDE 6

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Xen and hypercall

◮ Based on 4KB page granularity ◮ Must be able to run guests with different page granularity

◮ Modifying the interface too much might not be possible Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 6 / 20

slide-7
SLIDE 7

K´ ezaco Why? Constraints Implementation Improvements Conclusion

PV drivers

◮ Grant are currently only 4KB

◮ Based on the hypercall page granularity

◮ Must be able to talk with the current backend/frontend

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 7 / 20

slide-8
SLIDE 8

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Goal

◮ First implementation ◮ Allowing 64KB guest running on current Xen

◮ No modification in hypercalls and PV protocol

◮ Get something upstreamed quickly

◮ Linux with 64KB page is crashing at the moment Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 8 / 20

slide-9
SLIDE 9

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Changes in Xen

◮ Hypervisor: None ◮ Tools: 3 minor patches to use correct size for the rings

◮ Present in Xen 4.6 ◮ Backported requested in Xen 4.5 Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 9 / 20

slide-10
SLIDE 10

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Changes in Linux

◮ Linux is assuming that Xen is using the same page granularity

◮ Need to introduce XEN PAGE * helpers

◮ 1 foreign grant = 1 Linux page

◮ Easier implementation ◮ 60KB of memory waste per grant ◮ Affect only backend domain

◮ A Linux page may be split between multiple grant

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 10 / 20

slide-11
SLIDE 11

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Example of handling request on 4K

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 11 / 20

slide-12
SLIDE 12

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Example of handling request on 64K

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 12 / 20

slide-13
SLIDE 13

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Changes in Linux - 2

◮ Introduce of helpers to deal with the splitting

◮ Avoid to expose the page granularity to PV drivers ◮ Easier to spot changes which don’t handle 64/4 KB granularity Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 13 / 20

slide-14
SLIDE 14

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Improvement - 1

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 14 / 20

slide-15
SLIDE 15

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Improvement - 1

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 15 / 20

slide-16
SLIDE 16

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Support of 64KB grant - 1

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 16 / 20

slide-17
SLIDE 17

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Support of 64KB grant - 2

◮ PV drivers can take advantages of it

◮ No need to split page ◮ Less grants to setup

◮ Need to find agreement on where the grant size is decided:

◮ during the protocol negotiation ◮ can change for each request Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 17 / 20

slide-18
SLIDE 18

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Improvement 2 - Memory Usage

◮ Sharing a Linux page between multiple foreign grant

◮ Need some care with swiotlb

◮ Make Xen drivers fully using the Linux page

◮ Event Channel ◮ PV Ring ◮ ... Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 18 / 20

slide-19
SLIDE 19

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Status

◮ Where are we?

◮ First implementation done ◮ Only net and block PV drivers supported ◮ On the way to version 4

◮ Future

◮ Write design doc for grant improvement ◮ Fix memory usage with 64KB page granularity ◮ Convert the remaining PV drivers and QEMU Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 19 / 20

slide-20
SLIDE 20

K´ ezaco Why? Constraints Implementation Improvements Conclusion

Fin

Xen Developper Summit 2015 Improve ARM guest performance with 64KB pages 20 / 20