Go GC: Prioritizing Low Latency and Simplicity Rick Hudson Google - - PowerPoint PPT Presentation

go gc prioritizing low latency and simplicity
SMART_READER_LITE
LIVE PREVIEW

Go GC: Prioritizing Low Latency and Simplicity Rick Hudson Google - - PowerPoint PPT Presentation

Go GC: Prioritizing Low Latency and Simplicity Rick Hudson Google Engineer QCon San Francisco Nov 16, 2015 My Codefendants: The Cambridge Runtime Gang


slide-1
SLIDE 1

Go GC: Prioritizing Low Latency and Simplicity

Rick Hudson Google Engineer QCon San Francisco Nov 16, 2015

slide-2
SLIDE 2

Google Confidential and Proprietary

My Codefendants: The Cambridge Runtime Gang

https://upload.wikimedia.org/wikipedia/commons/thumb/2/2f/Sato_Tadanobu_with_a_goban.jpeg/500px-Sato_Tadanobu_with_a_goban.jpeg

slide-3
SLIDE 3

Google Confidential and Proprietary

Go: A Language for Scalable Concurrency

Lightweight threads (Goroutines) Channels for communication GC for scalable APIs Simple Foreign Function Interface

Simplicity: The Key to Success

slide-4
SLIDE 4

Google Confidential and Proprietary

Go: A Language for Scalable Open Source Projects

Do Less, Enable More Learning Implementation Tooling Reading Understanding

Sharing

slide-5
SLIDE 5

Google Confidential and Proprietary

Go: A Runtime for Scalable Applications

This is the story of Go’s garbage collector

Image by Renee French

slide-6
SLIDE 6

Google Confidential and Proprietary

Making Go Go: Establish A Virtuous Cycle

News Flash: 2X Transistors != 2X Frequency More transistors == more cores Only if software uses more cores Long term Establish a virtuous cycle Short term Increase Go Adoption

Software++ Hardware++

Hardware++

HW++

HW++ Software++

Software++

Software++

#1 Barrier: GC Latency

slide-7
SLIDE 7

Google Confidential and Proprietary

When is the best time to do a GC?

When nobody is looking. Using camera to track eye movement When subject looks away do a GC.

Recovering

https://upload.wikimedia.org/wikipedia/commons/3/35/Computer_Workstation_Variables.jpg

slide-8
SLIDE 8

Google Confidential and Proprietary

Waiting Pop up a network wait icon

https://commons.wikimedia.org/wiki/File:WIFI_icon.svg#globalusage

slide-9
SLIDE 9

Google Confidential and Proprietary

Or Trade Throughput for Reduced GC Latency

A L i t t l e

V

slide-10
SLIDE 10

Google Confidential and Proprietary

Latency

Nanosecond 1: Grace Hopper Nanosecond 11.8 inches Microsecond 5.4: Time light travels 1 mile in vacuum Millisecond 1: Read 1 MB sequentially from SSD 20: Read 1 MB from disk 50: Perceptual Causality (cursor response threshold) 50+: Various network delays

slide-11
SLIDE 11

Saccades (ms) 30 Reading 200 Involuntary Eye Blink 300 ms

slide-12
SLIDE 12

Google Confidential and Proprietary

GC 101 Root Scan Phase

Heap Stacks/Registers Globals

slide-13
SLIDE 13

Google Confidential and Proprietary

Mark Phase

Stacks/Registers Globals

Righteous Concurrent GC struggles with Evil Application changing pointers

slide-14
SLIDE 14

Google Confidential and Proprietary

Sweep Phase

Stacks/Registers Globals

slide-15
SLIDE 15

Google Confidential and Proprietary

Go isn’t Java: GC Related Go Differences

Java Tens of Java Threads Synchronization via objects/locks Runtime written in C Objects linked with pointers Go Thousands of Goroutines Synchronization via channels Runtime written in Go Leverages Go same as users Control of spatial locality Objects can be embedded Interior pointers (&foo.field) Simpler foreign function interface

Let’s Build a GC for Go

slide-16
SLIDE 16

Google Confidential and Proprietary

1.4 Stop the World

GC GC Application Application

slide-17
SLIDE 17

Google Confidential and Proprietary

Application Application Application Assist GC Application Assist GC 1 ms 3 ms

1.5 Concurrent GC

slide-18
SLIDE 18

Google Confidential and Proprietary

GC Algorithm Phases

Off Stack scan Mark Mark termination Sweep Off Correctness proofs in literature (see me) WB on

STW

GC disabled Pointer writes are just memory writes: *slot = ptr Collect pointers from globals and goroutine stacks Stacks scanned at preemption points Mark objects and follow pointers until pointer queue is empty Write barrier tracks pointer changes by mutator Rescan globals/changed stacks, finish marking, shrink stacks, … Literature contains non-STW algorithms: keeping it simple for now Reclaim unmarked objects as needed Adjust GC pacing for next cycle Rinse and repeat

slide-19
SLIDE 19

Google Confidential and Proprietary

Garbage Benchmark

9 8 7 6 5 4 3 2 1

GC Pause (Lower is better) Seconds

Heap Size (Gigabytes)

slide-20
SLIDE 20

Google Confidential and Proprietary

Garbage Benchmark

2x Live heap size

slide-21
SLIDE 21

GOGC knob: Space-Time Trade off More heap space: less GC time, and vice-versa

Implementing a one knob GC is a challenge

slide-22
SLIDE 22

Google Confidential and Proprietary

GOGC=200 Heap Size (Megabytes): Live heap kept constant

Splay: Increasing Heap Size == Better Performance

Execution Time (Lower is Better)

slide-23
SLIDE 23

Google Confidential and Proprietary

JSON: Increasing Heap Size == Better Performance

Heap Size (Megabytes) GOGC=200 Execution Time (Lower is Better)

slide-24
SLIDE 24

Google Confidential and Proprietary

Onward: We’re not done yet….

Tell people that GC latency is not a barrier to Go’s adoption Tune for even lower latency higher throughput more predictability Tune for user’s applications Fight devils reported by users

Increase Go Adoption

Establish Virtuous Cycle

slide-25
SLIDE 25

Questions