SLIDE 1
SLIDE 2 Resilience of the Domain Name System: A case study for .nl
Lars Bade | Radboud University Nijmegen, SIDN
21 March 2016, ICT.Open
SLIDE 3 Agenda
- 1. Research question
- 2. Domain Name System
- 3. Methodology & Results
- 4. Conclusion
- 5. Time for asking questions
SLIDE 4 Research question
What is the impact on the availability of the .nl-zone and underlying second level domains for the users of the Internet when parts of the DNS infrastructure become unavailable? Goal:
- 1. Develop a method to measure the resilience of a DNS zone
- 2. Apply this method on the .nl-zone
NB: Research performed on autonomous system (AS) level
SLIDE 5 Domain Name System (DNS)
Initially used to translate domain names into IP addresses
- IP addresses are hard to remember
- IP addresses do change from time to time
Crucial part of the Internet’s infrastructure Nowerdays DNS can hold arbitrary data Distributed database to query for information on a name
SLIDE 6
DNS: Resolution
Domain name is resolved from root zone to authoritative name server Name server of root zone hard-coded in resolver Caching at resolver reduces traffic on name servers
SLIDE 7 Research Methodology
- 1. Map domain names onto the ASs their name servers are located in
- 2. Obtain graph representing network topology
- 3. Identify locations of most commonly used resolvers
- 4. Generate reachability baseline for domain names from the viewpoint of the
identified resolver locations
- 5. Simulate failure of ASs and connections in the topology
- 6. Analyze impact of the simulated failures on reachability of domains
N.B.: All used data was generated in the first week of June 2016
SLIDE 8 Step 1: Map domain names onto ASs
Obtain a list of domain names
- Full .nl-zone provided by SIDN
Obtain IP addresses of name servers per domain
- Resolve NS records of domain names
- Resolve A records of the obtained name server names
Map IP addresses of name servers onto ASs
- Public data set provided by MaxMind (and other parties)
SLIDE 9 Step 2: Obtain graph representing network topology
Lot of research done in this field, lots of problems:
- No Internet authority
- Inference of connections depends on local knowledge
- Backup routes may be invisible
Eventually used data set provided by CAIDA:
- Most complete dataset known, ~45.000 ASs, ~500.000 connections
- Aggregated information from various sources: RIPE RIS, CAIDA ark monitors,
Routeviews project
SLIDE 10 Step 3: Identify most common resolver locations
Necessary to define a clear scope and keep calculations feasible Investigation of incoming DNS requests at the authoritative name servers of .nl (powered by ENTRADA) Two approaches:
- 1. Analysis of human-generated requests (top 30)
- Excluding resolvers sending disproportionately many requests
- Classification not very accurate
- 2. Analysis of all requests (top 30)
SLIDE 11 Step 3: Combining both lists
Concatenate both lists and thereby investigate both, most queries and most human-generated queries Remove duplicates Result:
- List of 39 most commonly used resolver locations
- Contains ISPs, hosting providers, search engines, content providers, ..
SLIDE 12 Step 4: Baseline generation
Even in full topology, not all authoritative name servers can be reached from all resolvers Find valid paths between resolver locations and name server locations using breadth-first-search Per resolver location calculate:
- #Unreachable ASs
- #Unreachable domains
- Mean length of shortest path between resolver and name server
SLIDE 13 Step 5: Simulation of failure scenarios
Scenarios selected with maximal (negative) impact in mind Failure of ASs (ISPs):
- Simulated by removing the AS from the topology
- Selection based on ASs hosting the most domain names (top 20)
- Selection based on ASs providing most transit traffic on baseline (top 20)
Failure of connections:
- Simulated by removing the connection from the topology
- Selection based on connections used for most transit traffic on baseline (top 20)
Failure of IXP (AMS-IX):
- Simulated by removing all peering connections between members
SLIDE 14 Step 6: Analysis of simulation results
Find (valid) paths between resolver locations and name server locations using breadth-first-search with some optimizations: Per resolver locations calculate:
- #Unreachable ASs
- #Unreachable domains
- Mean length of shortest path to domains
SLIDE 15 Results: Failure of ASs
Failure of hosting ASs
- Mostly only that AS unavailable
- Example shown for AS60781
(LeaseWeb) Failure of transit ASs
- Also other ASs unavailable
- Example shown for AS174 (Cogent
Communications)
SLIDE 16
Results: Failure of connections
Impact limited to single resolver or name server locations Example shown for the connection between AS2914 (NTT) and AS5432 (Proximus) Mostly no impact on availability Mean shortest path length increases for single resolver location
SLIDE 17 Conclusion
Impact of failure scenario highly dependent on resolver location Most problems solved by using name servers located in multiple ASs Some network bottlenecks, e.g. ASs with single connections
- Most of them only influence connectivity of some ASs
SLIDE 18
Are there any questions?
SLIDE 19
Thank you for your attention!
@SIDN SIDN SIDN.nl Follow us
SLIDE 20 Research question (elaborated)
What is the impact on the availability of the .nl-zone and underlying second level domains for the users of the Internet when parts of the DNS infrastructure become unavailable? Subquestions:
- How are domain names within the .nl-zone distributed over autonomous
systems?
- How are autonomous systems serving parts of DNS data within the .nl-zone
interconnected?
- Where in the network topology do the majority of DNS requests originate from?
- How does the reachability of autonomous systems change when connections or
even whole ASs become unavailable? And what is the impact hereof for the reachability of authoritative name servers?
SLIDE 21 Border Gateway Protocol (BGP)
Protocol for exchanging routing information between autonomous systems (ASs) AS:
- Collection of IP address ranges under the same administrative boundary
- Identified by Autonomous System Number (ASN)
Basic working:
- Router announces his IP addresses to connected routers of neighboring ASs
- After receiving an announcement it is stored and possibly propagated to
neighboring routers
SLIDE 22 Border Gateway Protocol (BGP) (2)
Global (constantly changing) routing table Relationships between ASs classified according to contractual agreements:
- customer-to-provider (provider-to-customer)
- peer-to-peer
- sibling-to-sibling
SLIDE 23 BGP: valid paths for traffic flow
Valid paths between ASs follow a fixed pattern:
- 1. 0+ customer-to-provider links
- 2. 0-1 peer-to-peer links
- 3. 0+ provider-to-customer links
General rule: no transit AS that does not get paid by either of its neighbors on the path
SLIDE 24
BGP: valid paths for traffic flow (2)
SLIDE 25 DNS: Hierarchy
Domain namespace is built up in a tree structure Domain names are formed by concatenating labels in the tree from leaf to root Multiple (redundant) name servers per zone
- Root zone: 13 name servers
- .nl-zone: 7 name servers
SLIDE 26 DNS: Resource Records
Name servers store information in resource records that consist of multiple fields:
- owner: The domain name
- type: The type of the stored data
- A (IPv4 address)
- AAAA (IPv6 address)
- NS (name server name)
- MX (mail server name)
- CNAME (name pointer, alias)
- TXT (arbitrary text)
- …
- class: mostly IN (Internet)
- TTL: Time to live, time to cache record
- RDATA: The data
SLIDE 27
BGP: Internet Exchanges (IXPs)
Network facility that enables exchange of traffic between more than two independent ASs Imagine this as a giant switch where members can connect to peer with each other Largest IXPs handle the same amount of traffic as the largest Tier-1 ISPs
SLIDE 28 Obtain ASN of name servers per domain
- 1. Resolve NS records of domain names
- 2. Resolve A records of the obtained name server names
- 3. Map IP addresses of name servers onto ASN
SLIDE 29 Obtain ASN of name servers per domain (2)
5,262,381 domain names in zonefile
- 259,364 NS record unresolvable
- 2,229 A record of name server unresolvable
5,364,788 domain names investigated Configuration errors:
- 16,394 NS sets in the zonefile do not match the NS set received when querying
an authoritative name server of the domain
SLIDE 30 ENTRADA
Developed by SIDNLabs Database containing all incoming DNS requests at 4 of the authoritative name servers of .nl 100 billion queries in the last 2 years, 800 million new queries per day SQL interface to investigate these requests:
- Time of receipt
- Resolver IP address
- Domain name
- Query type
- IP version
- ...
SLIDE 31 Step 3.1: Analysis of human-generated requests
Motivation:
- Some resolvers send more queries than others (increase performance, absence
- f caching, DNSSEC validation, ...)
- Some resolvers used for scanning the zone (research, profit)
Resolvers should be excluded from the analysis Measures:
- Percentage of NxDomain responses per resolver
- Percentage of distinct domain names queried
SLIDE 32 Percentage of NxDomain responses per resolver
- Human users query mainly existing
domain names
- Typos only occur occasionally
- Two categories of resolver usage:
- High percentage scan (brute-
force), botnets
- Low percentage human users
Normal behaviour: <10% NxDomain responses
SLIDE 33 Percentage of distinct domain names queried per resolver
Human users query well known domain names more often Scanning resolvers query every domain name just once Three categories of resolver usage
- High percentage scan
- Medium percentage human
users
SLIDE 34 Step 3.2: Analysis all requests received
Motivation:
- Classification of human-generated requests not very accurate
- Crawling and monitoring is legit usage of DNS
Simply counting all queries received
SLIDE 35 Results: Baseline
Reachability (both unreachable ASs and shortest path length) highly dependend on resolver location Just a few availability issues on the baseline, some resolvers can not reach name servers of a single domain Some resolver locations have shorter paths to authoritative name servers than
- thers
- AS1103 (SURFnet): 1.182
- AS8737 (KPN): 2.886
SLIDE 36
Results: Failure of connections (2)
Some ASs only have a single connection in the topology, in these cases AS becomes disconnected Might not be the case in real world situation
SLIDE 37 Results: Failure of AMS-IX
Only small impact on availability Mean shortest path lengths increases Realistic impact even smaller:
- Not all members peer with each
- ther
- Other peering connections at other
IXPs
SLIDE 38
Future work
Simulate other events Investigate more resolver locations Use better (more informed) datasets, e.g. information about physical layer Investigate other TLDs for comparison