CS 557 Domain Name System Development of the Domain Name System - - PowerPoint PPT Presentation

cs 557 domain name system
SMART_READER_LITE
LIVE PREVIEW

CS 557 Domain Name System Development of the Domain Name System - - PowerPoint PPT Presentation

CS 557 Domain Name System Development of the Domain Name System Mockapetris and Dunlap, 1988 Impact of Configuration Errors on DNS Robustness V. Pappas, Z. Xu, S. Lu, D. Massey, A. Terzis, and L. Zhang, 2004 Spring 2013 The Story So Far .


slide-1
SLIDE 1

CS 557 Domain Name System Development of the Domain Name System

Mockapetris and Dunlap, 1988 Impact of Configuration Errors on DNS Robustness

  • V. Pappas, Z. Xu, S. Lu, D. Massey, A. Terzis, and L. Zhang, 2004

Spring 2013

slide-2
SLIDE 2

Network layer: Addressing, Fragmentation, Dynamic Routing, Best Effort Forwarding Transport layer: End to End communication, Multiplexing, Reliability, Congestion control, Flow control,

The Story So Far….

Data Layer: richly connected network (many paths) with many types of unreliable links Some Essential Apps: DNS (naming) and NTP (time).

slide-3
SLIDE 3

Slides Adopted From SIGCOMM 2004 Presentation

slide-4
SLIDE 4

Motivation

  • DNS: part of the Internet core infrastructure

– Applications: web, e-mail, e164, CDNs …

  • DNS: considered as a very reliable system

– Works almost always

  • Question: is DNS a robust system?

– User-perceived robustness – System robustness

are they the same?

slide-5
SLIDE 5

– Thousands or even millions of users affected – All due to a single DNS configuration error

Motivation Short Answer:

“Microsoft's websites were offline for up to 23 hours

  • - the most dramatic snafu to date on the Internet --

because of an equipment misconfiguration”

  • - Wired News, Jan 2001
slide-6
SLIDE 6

Related Work

  • Traffic & implementation errors studies:

– Danzig et al. [SIGCOMM92]: bugs – CAIDA : traffic & bugs

  • Performance studies:

– Jung et al. [IMW01]: caching – Cohen et al. [SAINT01]: proactive caching – Liston et al. [IMW02]: diversity

  • Server availability :

– To appear [OSDI04, IMC04]

slide-7
SLIDE 7

Our Work: Study DNS Robustness

  • Classify DNS operational errors:

– Study known errors – Identify new types of errors

  • Measure their pervasiveness
  • Quantify their impact on DNS

– availability – performance

slide-8
SLIDE 8

Outline

  • DNS Overview
  • Measurement Methodology
  • DNS Configuration Errors

– Example Cases – Measurement Results

  • Discussion & Summary
slide-9
SLIDE 9

net com uk ca jp foo buz bar bar1 bar2 bar3

Zone:

Occupies a continues subspace Served by the same nameservers bar.foo.com. NS ns1.bar.foo.com. bar.foo.com. NS ns3.bar.foo.com. bar.foo.com. NS ns2.bar.foo.com. bar.foo.com. MX mail.bar.foo.com. www.bar.foo.com. A 10.10.10.10

bar

name servers resource records

Background

slide-10
SLIDE 10

caching server client bar zone foo zone com zone root zone

asking for www.bar.foo.com answer:

www.bar.foo.com A 10.10.10.10

referral:

com NS RRs com A RRs

referral:

foo NS RRs foo A RRs

referral:

bar NS RRs bar A RRs

slide-11
SLIDE 11

Infrastructure RRs

foo.com. NS ns1.foo.com. foo.com. NS ns2.foo.com. foo.com. NS ns3.foo.com.

foo.com. NS ns1.foo.com. foo.com. NS ns2.foo.com. foo.com. NS ns3.foo.com.

foo.com com

ns1.foo.com. A 1.1.1.1 ns2.foo.com. A 2.2.2.2 ns3.foo.com. A 3.3.3.3

ns1.foo.com. A 1.1.1.1 ns2.foo.com. A 2.2.2.2 ns3.foo.com. A 3.3.3.3

  • NS Resource Record:

– Provides the names of a zone’s authoritative servers – Stored both at the parent and at the child zone

  • A Resource Record

– Associated with a NS resource record – Stored at the parent zone (glue A record)

slide-12
SLIDE 12

What Affects DNS Availability

  • Name Servers:

– Software failures – Network failures – Scheduled maintenance tasks

  • Infrastructure Resource Records:

– Availability of these records – Configuration errors

focus of

  • ur work
slide-13
SLIDE 13

Classification of Measured Errors

Inconsistency Dependency

Lame Delegation Delegation Inconsistency Diminished Redundancy Cyclic Dependency The configuration of infrastructure RRs does not correspond to the actual authoritative name-servers. More than one name-servers share a common point of failure.

slide-14
SLIDE 14

What is Measured?

  • Frequency of configuration errors:

– System parameters: TLDs , DNS level, zone size (i.e. the number of delegations)

  • Impact on availability:

– Number of servers: lost due to these errors – Zone’s availability: probability of resolving a name

  • Impact on performance:

– Total time to resolve a query

  • Starting from the query issuing time
  • Finishing at the query final answer time
slide-15
SLIDE 15

Measurement Methodology

  • Error frequency and availability impact:

– 3 sets of active measurements

  • Random set of 50K zones
  • 20K zones that allow zone transfers
  • 500 popular zones
  • Performance impact:

– 2 sets of passive measurements:1-week DNS packet traces

slide-16
SLIDE 16

Lame Delegation

com foo

foo.com. NS A.foo.com. foo.com. NS B.foo.com.

A.foo.com

A.foo.com. A 1.1.1.1 B.foo.com. A 2.2.2.2

2) DNS error code

  • - 1 RTT perf. penalty

3) Useless referral

  • - 1 RTT perf. penalty

4) Non-authoritative answer (cached) 1) Non-existing server

  • - 3 seconds perf. penalty

B.foo.com

slide-17
SLIDE 17

Lame Delegation Results

slide-18
SLIDE 18

Lame Delegation Results

0.06 sec 0.4 sec 3 sec 50%

slide-19
SLIDE 19

Lame Delegation Results

  • Error Frequency:

– 15% of the zones – 8% for the 500 most popular zones – independent of the zone’s size, varies a lot per TLD

  • Impact:

– 70% of the zones with errors lose half or more of the authoritative servers – 8% of the queries experience increased response times (up to an order of magnitude) due to lame delegation

slide-20
SLIDE 20

C) Geographic location level:

  • belong to the same city

B) Autonomous system level:

  • belong to the same AS

Diminished Server Redundancy

com foo

foo.com. NS A.foo.com. foo.com. NS B.foo.com.

A.foo.com B.foo.com

A.foo.com. A 1.1.1.1 B.foo.com. A 2.2.2.2

A) Network level:

  • belong to the same subnet
slide-21
SLIDE 21

Diminished Server Redundancy Results

  • Error Frequency:

– 45% of all zones have all servers in the same /24 subnet – 75% of all zones have servers in the same AS – large & popular zones: better AS and geo diversity

  • Impact:

– less than 99.9% availability: all servers in the same /24 subnet – more than 99.99% availability: 3 servers at different ASs or different cities

slide-22
SLIDE 22

Cyclic Zone Dependency (1)

com foo

foo.com. NS A.foo.com. foo.com. NS B.foo.com.

A.foo.com B.foo.com

A.foo.com. A 1.1.1.1

B.foo.com depends

  • n A.foo.com

The A glue RR for B.foo.com missing

B.foo.com. A 2.2.2.2

If A.foo.com is unavailable then B.foo.com is too

slide-23
SLIDE 23

Cyclic Zone Dependency (2)

com foo

foo.com. NS A.foo.com. foo.com. NS B.bar.com.

A.foo.com B.bar.com

A.foo.com. A 1.1.1.1

bar

B.foo.com A.bar.com

bar.com. NS A.bar.com. bar.com. NS B.foo.com. A.bar.com. A 2.2.2.2

The foo.com zone seems correctly configured The combination of foo.com and bar.com zones is wrongly configured The B servers depend on A servers If A.foo and A.bar are unavailable, B addr. are unresolvable

slide-24
SLIDE 24

Cyclic Zone Dependency Results

  • Error Frequency:

– 2% of the zones – None of the 500 most popular zones

  • Impact:

– 90% of the zones with cyclic dependency errors lose 25% (or even more) of their servers – 2 or 4 zones are involved in most errors

slide-25
SLIDE 25

Discussion: User-Perceived != System Robustness

  • User-perceived robustness:

– Data replication: only one server is needed – Data caching: temporary masks infrastructure failures – Popular zones: fewer configuration errors

  • System robustness:

– Fewer available servers: due to inconsistency errors – Fewer redundant servers: due to dependency errors

slide-26
SLIDE 26

Discussion: Why so many errors?

  • Superficially: are due to operators:

– Unaware of these errors – Lack of coordination

  • parent-child zone, secondary servers hosting
  • Fundamentally: are due to protocol design:

– Lack of mechanisms to handle these errors

  • proactively or reactively

– Design choices that embrace some of them:

  • Name-servers are recognized with names
  • Glue NS & A records necessary to set up the DNS tree
slide-27
SLIDE 27

Summary

  • DNS operational errors are widespread
  • DNS operational errors affect availability:

– 50% of the servers lost – less than 99.9% availability

  • DNS operational errors affect performance:

– 1 or even 2 orders of magnitude

  • DNS system robustness lower than user perception

– Due to protocol design, not just due to operator errors

slide-28
SLIDE 28

Ongoing Work

  • Reactive mechanisms:

– DNS Troubleshooting [NetTs 04]

  • Proactive mechanisms:

– Enhancing DNS replication & caching

slide-29
SLIDE 29

Thank You!!!