Cache Me If You Can: Effects of DNS Time-to-Live Giovane C. M. Moura - - PowerPoint PPT Presentation

cache me if you can effects of dns time to live
SMART_READER_LITE
LIVE PREVIEW

Cache Me If You Can: Effects of DNS Time-to-Live Giovane C. M. Moura - - PowerPoint PPT Presentation

Cache Me If You Can: Effects of DNS Time-to-Live Giovane C. M. Moura 1 , 2 , John Heidemann 3 , Wes Hardaker 3 , Ricardo de O. Schmidt 4 RIPE 79 Rotterdam, The Netherlands 2019-10-15 1 SIDN Labs, 2 TU Delft, 3 USC/ISI, 4 UPF Outline


slide-1
SLIDE 1

Cache Me If You Can: Effects of DNS Time-to-Live

Giovane C. M. Moura1,2, John Heidemann3, Wes Hardaker3, Ricardo de O. Schmidt4 RIPE 79 Rotterdam, The Netherlands 2019-10-15

1SIDN Labs, 2TU Delft, 3USC/ISI, 4UPF

slide-2
SLIDE 2

Outline

Introduction Parent vs Child Zone configurations and Effective TTL TTLs Use in the Wild Operators Notification Caching (Longer TTL) vs Anycast Shorter vs Longer TTLs Recommendation and Conclusions

slide-3
SLIDE 3

Our research on DNS over the last years

Our rearch on DNS security/stability:

  • Anycast and DDoS: IMC 2016 [2]
  • Resolvers: IMC 2017 [5]
  • Anycast Engineering: IMC 2017 [1]
  • Caching and DDoS: IMC 2018 [4]
  • Caching and TTL, and performance: IMC 2019 [3]
  • (this paper)
  • IMC will be next week in Amsterdam

1

slide-4
SLIDE 4

Introduction

slide-5
SLIDE 5

The role of TTL

authoritative server resolver user

1

slide-6
SLIDE 6

The role of TTL

Q: google.com?

authoritative server resolver user

1

slide-7
SLIDE 7

The role of TTL

authoritative server resolver user

Q: google.com? Q: google.com?

1

slide-8
SLIDE 8

The role of TTL

authoritative server resolver user

Q: google.com? Q: google.com? A: 10.10.10.10 A: 10.10.10.10

1

slide-9
SLIDE 9

The role of TTL

A: 10.10.10.10

authoritative server resolver user

Q: google.com? Q: google.com? A: 10.10.10.10 cache

1

slide-10
SLIDE 10

The role of TTL

A: 10.10.10.10

authoritative server resolver user

Q: google.com? Q: google.com? A: 10.10.10.10 cache Q : g

  • g

l e . c

  • m

?

1

slide-11
SLIDE 11

The role of TTL

A: 10.10.10.10

authoritative server resolver user

Q: google.com? Q: google.com? A: 10.10.10.10 cache Q : g

  • g

l e . c

  • m

? A : 1 . 1 . 1 . 1

x

cache hit! FASTER

1

slide-12
SLIDE 12

The role of TTL

ISP GOOGLE

BUT caching FOR HOW LONG???

A: 10.10.10.10

authoritative server resolver user

Q: google.com? Q: google.com? A: 10.10.10.10 cache Q : g

  • g

l e . c

  • m

? A : 1 . 1 . 1 . 1 cache hit! FASTER

x

1

slide-13
SLIDE 13

The role of TTL

TTL

ISP GOOGLE

BUT caching FOR HOW LONG???

A: 10.10.10.10

authoritative server resolver user

Q: google.com? Q: google.com? A: 10.10.10.10 cache Q : g

  • g

l e . c

  • m

? A : 1 . 1 . 1 . 1 cache hit! FASTER

x

1

slide-14
SLIDE 14

The role of TTL

TTL

ISP GOOGLE

BUT caching FOR HOW LONG???

A: 10.10.10.10

authoritative server resolver user

Q: google.com? Q: google.com? A: 10.10.10.10 cache Q : g

  • g

l e . c

  • m

? A : 1 . 1 . 1 . 1 cache hit! FASTER

x

  • TTL controls caching
  • auth servers SIGNAL to resolvers how long (TTL)
  • Caching is VERY important for performance
  • improves user experience

2

slide-15
SLIDE 15

And you must set TTLs

  • Say you register cachetest.net

3

slide-16
SLIDE 16

What TTL values are good?

Today it is unclear what an operator should do

  • DNS OPs folks on TTLs: “if it ain’t

broke don’t fix it” We think we can help

Figure 1: DNS ops chaging

  • TTLs. src: trainworld.be

4

slide-17
SLIDE 17

Our contribution

Because of conflicting and under-explained TTL advice, we show:

  • 1. the effective TTL comes from multiple places
  • Parent and Child authoritative servers
  • NS and A records (sometimes)
  • 2. TTLs are unnecesssarily short
  • a. because sometimes multiple places → one is shorter and

wins

  • or operators don’t realize the cost
  • 3. We show that longer TTLs are MUCH faster
  • 4. Our results were adopted by 3 ccTLD for ∼20ms median

latency improvement; 171ms 75%ile

5

slide-18
SLIDE 18

The rest of this talk

  • 1. Parent vs Child: who really sets the TTL?
  • 2. NS and A records: are they limited? And bailiwick?
  • 3. Real-world variation exists
  • 4. Longer TTLs are MUCH better
  • 5. Our recommendations

6

slide-19
SLIDE 19

Parent vs Child

slide-20
SLIDE 20

Duplicate info: which one is chosen?

  • Parent and child TTLs may vary: dig NS cachetest.net

ROOT

.

.org

cachetest.net

.nl ... .net NS cachetest.net: * ns1.cachetest.net

* TTL: 172800s

NS cachetest.net: * ns1.cachetest.net

* TTL: 3600s

Which TTL will Rembrandt use? Parent ( 172800s) or child ( TTL: 3600s)

Resolver

7

slide-21
SLIDE 21

Are resolvers parent- or child-centric?

Parent vs Child experiment

  • Test with experiment on .uy: (2019-02-14)
  • Parent : NS/A TTL: 172800s
  • Child: NS TTL: 300s ; A: 120s
  • We query with 15k VPs (Ripe Atlas) mutliple times, every

10min

  • We analyze TTL values received at VPs

8

slide-22
SLIDE 22

Most Atlas VPs resolvers are child-centric

Figure 2: Observed TTLs from Atlas VPs for .uy-NS and a.nic.uy-A queries.

0.2 0.4 0.6 0.8 1 5 10 50 120 300 1000 CDF TTL Answers Answers TTL(s) NS queries A queries

Spike at Child TTL A (120s) : most resolvers are child centric Spike at Child TTL NS (300s): child centric

  • Remember: TTL parents: 2 days

9

slide-23
SLIDE 23

Most Atlas VPs resolvers are child-centric

Figure 2: Observed TTLs from Atlas VPs for .uy-NS and a.nic.uy-A queries.

0.2 0.4 0.6 0.8 1 5 10 50 120 300 1000 CDF TTL Answers Answers TTL(s) NS queries A queries

Spike at Child TTL A (120s) : most resolvers are child centric Spike at Child TTL NS (300s): child centric

  • Remember: TTL parents: 2 days

9

slide-24
SLIDE 24

Most Atlas VPs resolvers are child-centric

Figure 2: Observed TTLs from Atlas VPs for .uy-NS and a.nic.uy-A queries.

0.2 0.4 0.6 0.8 1 5 10 50 120 300 1000 CDF TTL Answers Answers TTL(s) NS queries A queries

Spike at Child TTL A (120s) : most resolvers are child centric Spike at Child TTL NS (300s): child centric

  • Remember: TTL parents: 2 days

9

slide-25
SLIDE 25

Is centricity true for TLDs and SLDs?

  • Test with .nl TLD A records (ns*.dns.nl)
  • TTLs are 3600s (child) vs. 17800s (parent)

Figure 3: Minimum interarrival time of A queries for TLD

0.2 0.4 0.6 0.8 1 1 2 5 10 20 50 CDF Interarrival time (h) TTL 3600s TTL 173800s

Spike at Child TTL A (3600s): confirm child centric for TLD We confirmed this with a second-level domain ( paper)

10

slide-26
SLIDE 26

Is centricity true for TLDs and SLDs?

  • Test with .nl TLD A records (ns*.dns.nl)
  • TTLs are 3600s (child) vs. 17800s (parent)

Figure 3: Minimum interarrival time of A queries for TLD

0.2 0.4 0.6 0.8 1 1 2 5 10 20 50 CDF Interarrival time (h) TTL 3600s TTL 173800s

Spike at Child TTL A (3600s): confirm child centric for TLD We confirmed this with a second-level domain ( paper)

10

slide-27
SLIDE 27

Is centricity true for TLDs and SLDs?

  • Test with .nl TLD A records (ns*.dns.nl)
  • TTLs are 3600s (child) vs. 17800s (parent)

Figure 3: Minimum interarrival time of A queries for TLD

0.2 0.4 0.6 0.8 1 1 2 5 10 20 50 CDF Interarrival time (h) TTL 3600s TTL 173800s

Spike at Child TTL A (3600s): confirm child centric for TLD We confirmed this with a second-level domain ( paper)

10

slide-28
SLIDE 28

Most resolvers wil use child TTLs

  • Rembrant (and users) mostly use child TTLs
  • Child TTL controls caching (most times)

ROOT

.

.org

cachetest.net

.nl ... .net NS cachetest.net: * ns1.cachetest.net

* TTL: 172800s

NS cachetest.net: * ns1.cachetest.net

* TTL: 3600s

Which TTL will Rembrandt use? Parent ( 172800s) or child ( TTL: 3600s)

Resolver

11

slide-29
SLIDE 29

Outline

Introduction Parent vs Child Zone configurations and Effective TTL TTLs Use in the Wild Operators Notification Caching (Longer TTL) vs Anycast Shorter vs Longer TTLs Recommendation and Conclusions

slide-30
SLIDE 30

Zone configurations and Effective TTL

slide-31
SLIDE 31

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-32
SLIDE 32

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-33
SLIDE 33

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-34
SLIDE 34

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-35
SLIDE 35

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-36
SLIDE 36

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-37
SLIDE 37

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-38
SLIDE 38

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-39
SLIDE 39

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-40
SLIDE 40

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-41
SLIDE 41

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-42
SLIDE 42

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-43
SLIDE 43

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-44
SLIDE 44

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-45
SLIDE 45

Are there dependencies between A and NS TTLs?

sub.cachetest.net

In zone Out of zone

NS: ns1.sub.cachetest.net NS: ns1.zurrundeddu.com A :10.10.10.10 A :10.10.10.10

7200 3600 7200 3600

To resolve *.sub.cachetest.net, you need both NS and A Are NS and A cached independently?

  • 1. t=0: all Atlas VPs query (fills cache with NS and A)
  • 2. t=4800: what happens ? NS is expired; A is still in cache:

do resolvers use the “cached A” or refresh it again? trick: at t=540, we renumber A to 10.10.10.2 (diff answer)

Will Marcus Aurelius receive cached or new answer?

12

slide-46
SLIDE 46

Are they dependent? Yes, for in zone

5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires. Both Original NS and A Original expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new 5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires Both Original NS and A expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new

Cache warms NS Expires, A Valid (3600< t <7200) in zone: A refreshed (new server): dependent caching?

  • ut-of- zone: cached A (old server): independent caching?

Why? Glues cause cache refresh

13

slide-47
SLIDE 47

Are they dependent? Yes, for in zone

5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires. Both Original NS and A Original expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new 5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires Both Original NS and A expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new

Cache warms NS Expires, A Valid (3600< t <7200) in zone: A refreshed (new server): dependent caching?

  • ut-of- zone: cached A (old server): independent caching?

Why? Glues cause cache refresh

13

slide-48
SLIDE 48

Are they dependent? Yes, for in zone

5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires. Both Original NS and A Original expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new 5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires Both Original NS and A expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new

Cache warms NS Expires, A Valid (3600< t <7200) in zone: A refreshed (new server): dependent caching?

  • ut-of- zone: cached A (old server): independent caching?

Why? Glues cause cache refresh

13

slide-49
SLIDE 49

Are they dependent? Yes, for in zone

5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires. Both Original NS and A Original expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new 5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires Both Original NS and A expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new

Cache warms NS Expires, A Valid (3600< t <7200) in zone: A refreshed (new server): dependent caching?

  • ut-of- zone: cached A (old server): independent caching?

Why? Glues cause cache refresh

13

slide-50
SLIDE 50

Are they dependent? Yes, for in zone

5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires. Both Original NS and A Original expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new 5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires Both Original NS and A expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new

Cache warms NS Expires, A Valid (3600< t <7200) in zone: A refreshed (new server): dependent caching?

  • ut-of- zone: cached A (old server): independent caching?

Why? Glues cause cache refresh

13

slide-51
SLIDE 51

Are they dependent? Yes, for in zone

5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires. Both Original NS and A Original expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new 5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires Both Original NS and A expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new

Cache warms NS Expires, A Valid (3600< t <7200) in zone: A refreshed (new server): dependent caching?

  • ut-of- zone: cached A (old server): independent caching?

Why? Glues cause cache refresh

13

slide-52
SLIDE 52

Are they dependent? Yes, for in zone

5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires. Both Original NS and A Original expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new 5000 10000 15000 20000 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 Original NS expires Both Original NS and A expired. Original A still valid DNS redirect: new A.

  • Orig. NS and A valid

answers minutes after start

  • riginal

new

Cache warms NS Expires, A Valid (3600< t <7200) in zone: A refreshed (new server): dependent caching?

  • ut-of- zone: cached A (old server): independent caching?

Why? Glues cause cache refresh

13

slide-53
SLIDE 53

Are there dependencies between A and NS TTLs?

src: https://en.wikipedia.org/wiki/Marcus_Aurelius CC BY-SA 3.0

  • Marcus Aurelius will

notice“early” refreshed A for in-zone (in bailiwick)

  • The way you configure your

zone impacts caching , not

  • nly TTLs

14

slide-54
SLIDE 54

Outline

Introduction Parent vs Child Zone configurations and Effective TTL TTLs Use in the Wild Operators Notification Caching (Longer TTL) vs Anycast Shorter vs Longer TTLs Recommendation and Conclusions

slide-55
SLIDE 55

TTLs Use in the Wild

slide-56
SLIDE 56

How are TTLs used in the wild?

  • There is no consensus how to choose TTLs
  • But folks have to choose them anyway
  • We use 5 lists:
  • Alexa
  • Majestic
  • Umbrella
  • .nl
  • Root (TLDs)
  • We probe several records types
  • We analyze child TTL values
  • And discuss results with some operators

15

slide-57
SLIDE 57

Most domains are out-of-bailiwick

Alexa Majestic Umbre.

.nl

Root responsive 988654 928299 783343 5454833 1535 CNAME 50981 7017 452711 9436 SOA 12741 8352 59083 12268 responsive NS 924932 912930 271549 5433129 1535 Out only 878402 873447 244656 5417599 748 ratio out only 95.0% 95.7% 90.1 99.7% 48.7% In only 37552 28577 20070 12586 654 Mixed 8978 10906 6823 2941 133

  • Out of bailiwick (out-of-zone): records are cached

independently (no glues)

  • Your chosen TTLs values for different records will be

respected

16

slide-58
SLIDE 58

NS records have longer TTLs (>24h)

0.2 0.4 0.6 0.8 1 0.01 0.1 1 24 48 256 2048 CDF answers TTL (h) Alexa Majestic Umbrella .nl root

Spike at 24h

  • > 60% NS records are long ( Good for caching and

performance)

  • But 40% are one hour or less (not so good)

17

slide-59
SLIDE 59

NS records have longer TTLs (>24h)

0.2 0.4 0.6 0.8 1 0.01 0.1 1 24 48 256 2048 CDF answers TTL (h) Alexa Majestic Umbrella .nl root

Spike at 24h

  • > 60% NS records are long ( Good for caching and

performance)

  • But 40% are one hour or less (not so good)

17

slide-60
SLIDE 60

A records TTLs far shorter than NS

0.2 0.4 0.6 0.8 1 0.01 0.1 1 24 48 256 2048 CDF answers TTL (h) Alexa Majestic Umbrella .nl root 0.2 0.4 0.6 0.8 1 0.01 0.1 1 24 48 256 2048 CDF answers TTL (h) Alexa Majestic Umbrella .nl root

Shorter A records TTLs leads to poor caching

18

slide-61
SLIDE 61

A records TTLs far shorter than NS

0.2 0.4 0.6 0.8 1 0.01 0.1 1 24 48 256 2048 CDF answers TTL (h) Alexa Majestic Umbrella .nl root 0.2 0.4 0.6 0.8 1 0.01 0.1 1 24 48 256 2048 CDF answers TTL (h) Alexa Majestic Umbrella .nl root

Shorter A records TTLs leads to poor caching

18

slide-62
SLIDE 62

A records TTLs far shorter than NS

0.2 0.4 0.6 0.8 1 0.01 0.1 1 24 48 256 2048 CDF answers TTL (h) Alexa Majestic Umbrella .nl root 0.2 0.4 0.6 0.8 1 0.01 0.1 1 24 48 256 2048 CDF answers TTL (h) Alexa Majestic Umbrella .nl root

Shorter A records TTLs leads to poor caching

18

slide-63
SLIDE 63

Operators Notification: 3 changed their TTLs

  • We found 34 TLDs with short TTL for NSes (<=30min)
  • We notified 8 ccTLDs
  • 3 TLDs increased their TTL to 1 day after our notification
  • .uy, and
  • another in Africa
  • and another in the Middle-East

19

slide-64
SLIDE 64

.uy latency reduced a lot!

  • .uy NS TTL from 300s to 86400s

0.2 0.4 0.6 0.8 1 1 5 10 50 500 5000 CDF RTT (ms) TTL 300s TTL 86400s

Figure 4: RTT from RIPE Atlas VPs for NS .uy queries (NS)

20

slide-65
SLIDE 65

.uy latency reduced a lot!

  • .uy NS TTL from 300s to 86400s

0.2 0.4 0.6 0.8 1 1 5 10 50 500 5000 CDF RTT (ms) TTL 300s TTL 86400s

Figure 4: RTT from RIPE Atlas VPs for NS .uy queries (NS)

20

slide-66
SLIDE 66

.uy latency reduced a lot!

  • .uy NS TTL from 300s to 86400s: lower latency for clients

0.2 0.4 0.6 0.8 1 1 5 10 50 500 5000 Median RTT: from 28 to 8ms 75%ile: from 173 to 21ms CDF RTT (ms) TTL 300s TTL 86400s

Figure 5: RTT from RIPE Atlas VPs for NS .uy queries (NS)

Median RTT improves by 20ms; 75%ile by 152ms

21

slide-67
SLIDE 67

.uy latency reduced for all regions

Check for Atlas location bias

50 100 150 200 250 300 350 400 450 AF (327) AS (846) EU (9691) NA (2307) OC (267) SA (293) ALL (13731) RTT (ms) continent code (# of VPs) TTL 300s TTL 86400s

Figure 6: Median RTT as seen by RIPE Atlas VPs per region

Longer TTL → longer caching → faster answers

22

slide-68
SLIDE 68

.uy latency reduced for all regions

Check for Atlas location bias

50 100 150 200 250 300 350 400 450 AF (327) AS (846) EU (9691) NA (2307) OC (267) SA (293) ALL (13731) RTT (ms) continent code (# of VPs) TTL 300s TTL 86400s

Figure 7: Median RTT as seen by RIPE Atlas VPs per region

Longer TTL → longer caching → faster answers Up to 150ms median latency reduction (AF)

23

slide-69
SLIDE 69

We are no Luiz Suárez... but

  • We still helped Uruguayan .uy users
  • And two other countries:
  • One in East Africa
  • Another one in the Middle East
  • Experiment proving how TTLs are

important for performance

src: https://commons.wikimedia.org/wiki/File: Luis_Su%C3%A1rez_2018.jpg CC BY-SA 3.0

24

slide-70
SLIDE 70

Longer TTLs are like the old Turbo button

  • Some DNS OPs spend

1000s too reduce latency

  • Longer TTLs improve

latency at zero cost

src: wikipedia.org 25

slide-71
SLIDE 71

Outline

Introduction Parent vs Child Zone configurations and Effective TTL TTLs Use in the Wild Operators Notification Caching (Longer TTL) vs Anycast Shorter vs Longer TTLs Recommendation and Conclusions

slide-72
SLIDE 72

Caching (Longer TTL) vs Anycast

slide-73
SLIDE 73

Caching vs Anycast

  • People and CDNs spend lots on huge anycast deployments
  • OPs could say: “I’ll have short TTL since I use anycast”,

because anycast can make it up for it.

  • Does anycast really beats caching?

26

slide-74
SLIDE 74

Caching vs Anycast: experiment

Probes + Resolver

FRA

Unicast (EC2) TTL86400 (good caching) Anycast (Route53) TTL60s (no caching)

Which one is faster?

27

slide-75
SLIDE 75

Caching vs Anycast: experiment

Probes + Resolver

FRA

Unicast (EC2) TTL86400 (good caching) Anycast (Route53) TTL60s (no caching)

Which one is faster?

27

slide-76
SLIDE 76

Caching vs Anycast: experiment

Probes + Resolver

FRA

Unicast (EC2) TTL86400 (good caching) Anycast (Route53) TTL60s (no caching)

Which one is faster?

27

slide-77
SLIDE 77

Caching vs Anycast: experiment

Probes + Resolver

FRA

Unicast (EC2) TTL86400 (good caching) Anycast (Route53) TTL60s (no caching)

Which one is faster?

27

slide-78
SLIDE 78

Caching vs Anycast: experiment

Probes + Resolver

FRA

Unicast (EC2) TTL86400 (good caching) Anycast (Route53) TTL60s (no caching)

Which one is faster?

27

slide-79
SLIDE 79

Caching vs Anycast: experiment

Probes + Resolver

FRA

Unicast (EC2) TTL86400 (good caching) Anycast (Route53) TTL60s (no caching)

Which one is faster?

27

slide-80
SLIDE 80

TTLs (caching) matter more than anycast

0.2 0.4 0.6 0.8 1 1 5 10 50 500 5000 ECDF RTT (ms) TTL 60s TTL 60s anycast 22ms diff unicast+cache wrt anycast-cache

  • Caching near client beats even great server infrastructure!
  • Anycast TTL60 (no cache): 29.96ms (median)
  • Unicast TTL86400 (cache): 7.38ms (median):
  • 22ms median latency reduction
  • Query load: 77% down with caching
  • so TTLs matter more for performance
  • (anycast is great to many things too, DDoS for example [2])
  • We strongly recommend anycast [5]

28

slide-81
SLIDE 81

TTLs (caching) matter more than anycast

0.2 0.4 0.6 0.8 1 1 5 10 50 500 5000 ECDF RTT (ms) TTL 60s TTL 60s anycast 22ms diff unicast+cache wrt anycast-cache

  • Caching near client beats even great server infrastructure!
  • Anycast TTL60 (no cache): 29.96ms (median)
  • Unicast TTL86400 (cache): 7.38ms (median):
  • 22ms median latency reduction
  • Query load: 77% down with caching
  • so TTLs matter more for performance
  • (anycast is great to many things too, DDoS for example [2])
  • We strongly recommend anycast [5]

28

slide-82
SLIDE 82

TTLs (caching) matter more than anycast

0.2 0.4 0.6 0.8 1 1 5 10 50 500 5000 ECDF RTT (ms) TTL 60s TTL 60s anycast 22ms diff unicast+cache wrt anycast-cache

  • Caching near client beats even great server infrastructure!
  • Anycast TTL60 (no cache): 29.96ms (median)
  • Unicast TTL86400 (cache): 7.38ms (median):
  • 22ms median latency reduction
  • Query load: 77% down with caching
  • so TTLs matter more for performance
  • (anycast is great to many things too, DDoS for example [2])
  • We strongly recommend anycast [5]

28

slide-83
SLIDE 83

TTLs (caching) matter more than anycast

0.2 0.4 0.6 0.8 1 1 5 10 50 500 5000 ECDF RTT (ms) TTL 60s TTL 60s anycast 22ms diff unicast+cache wrt anycast-cache

  • Caching near client beats even great server infrastructure!
  • Anycast TTL60 (no cache): 29.96ms (median)
  • Unicast TTL86400 (cache): 7.38ms (median):
  • 22ms median latency reduction
  • Query load: 77% down with caching
  • so TTLs matter more for performance
  • (anycast is great to many things too, DDoS for example [2])
  • We strongly recommend anycast [5]

28

slide-84
SLIDE 84

Outline

Introduction Parent vs Child Zone configurations and Effective TTL TTLs Use in the Wild Operators Notification Caching (Longer TTL) vs Anycast Shorter vs Longer TTLs Recommendation and Conclusions

slide-85
SLIDE 85

Reasons for Longer or shorter TTLs

  • Longer caching:
  • faster responses
  • lower DNS traffic
  • more robust to DDoS attack [4]
  • Shorter caching:
  • faster operational changes
  • useful for DNS redirect based DDoS
  • DNS-load balance

Organizations must weight these trade-offs to find a good balance

29

slide-86
SLIDE 86

Recommendation and Conclusions

slide-87
SLIDE 87

Conclusions

  • Recommendation: longer TTLs (1 day) if you can
  • unless using CDN load-balancing or DNS-redir DDoS
  • Why? Because it can save you more than 50ms or more
  • But keep on using anycast too [2, 5]
  • People have designed caches; use them wisely
  • Should you reconsider your TTLs as well?
  • Paper: https://www.isi.edu/

~johnh/PAPERS/Moura19b.html

  • IETF draft:

draft-moura-dnsop- authoritative-recommendations 30

slide-88
SLIDE 88

References i

[1] DE VRIES, W. B., DE O. SCHMIDT, R., HARAKER, W.,

HEIDEMANN, J., DE BOER, P.-T., AND PRAS, A. Verfploeter: Broad and load-aware anycast mapping. In Proceedings of the ACM Internet Measurement Conference (London, UK, 2017). [2] MOURA, G. C. M., DE O. SCHMIDT, R., HEIDEMANN, J., DE VRIES, W. B., MÜLLER, M., WEI, L., AND HESSELMAN, C. Anycast vs. DDoS: Evaluating the November 2015 root DNS event.

31

slide-89
SLIDE 89

References ii

In Proceedings of the ACM Internet Measurement Conference (Santa Monica, California, USA, Nov. 2016), ACM,

  • pp. 255–270.

[3] MOURA, G. C. M., HEIDEMANN, J., DE O. SCHMIDT, R., AND HARDAKER, W. Cache me if you can: Effects of DNS Time-to-Live (extended). In Proceedings of the ACM Internet Measurement Conference (Amsterdam, the Netherlands, Oct. 2019), ACM, p. to appear.

32

slide-90
SLIDE 90

References iii

[4] MOURA, G. C. M., HEIDEMANN, J., MÜLLER, M.,

DE O. SCHMIDT, R., AND DAVIDS, M.

When the dike breaks: Dissecting DNS defenses during DDoS. In Proceedings of the ACM Internet Measurement Conference (Boston, MA, USA, Oct. 2018), pp. 8–21. [5] MÜLLER, M., MOURA, G. C. M., DE O. SCHMIDT, R., AND HEIDEMANN, J. Recursives in the wild: Engineering authoritative DNS servers. In Proceedings of the ACM Internet Measurement Conference (London, UK, 2017), ACM, pp. 489–495.

33