Creating a long-term memory for the global DNS Mattijs Jonker - - PowerPoint PPT Presentation

creating a long term memory for the global dns
SMART_READER_LITE
LIVE PREVIEW

Creating a long-term memory for the global DNS Mattijs Jonker - - PowerPoint PPT Presentation

Creating a long-term memory for the global DNS Mattijs Jonker Introduction Almost fjve years ago, we started with an idea: Can we measure (large parts) of the global DNS on a daily basis? In this talk, I will discuss:


slide-1
SLIDE 1

Creating a long-term “memory” for the global DNS

Mattijs Jonker

slide-2
SLIDE 2

2019-12-09 OpenINTEL WIE-KISMET 2019 2/11

Introduction

  • Almost fjve years ago, we started with an idea:

“Can we measure (large parts) of the global DNS on a daily basis?”

  • In this talk, I will discuss:

– The data we gather (nowadays) – How do we perform our measurements – Which data do we share – And planned improvements / additional data

slide-3
SLIDE 3

2019-12-09 OpenINTEL WIE-KISMET 2019 3/11

How we perform our measurements

  • OpenINTEL performs an active measurement, sending a fjxed

set of queries for all covered names, once every 24 hours

  • We do this at scale, covering over 227 million domains per day:

– gTLDs:

.com, .net, .org, .info, .mobi, .aero, .asia, .name, .biz, .gov + almost 1200 "new" gTLDs (.xxx, .amsterdam, .berlin, …)

– ccTLDs:

.nl, .se, .nu, .ca, .fj, .at, .dk, .ru, .рф, .us, .na, .gt, .co

– Various other sources:

Alexa top 1M, Cisco Umbrella, diverse blacklists

slide-4
SLIDE 4

2019-12-09 OpenINTEL WIE-KISMET 2019 4/11

How we perform our measurements

  • The measurement process involves three stages
  • 1. Extraction of names
  • 2. Active measurement
  • 3. Streaming and persisting data
slide-5
SLIDE 5

2019-12-09 OpenINTEL WIE-KISMET 2019 5/11

Stage I: collecting names

  • Extraction of names from zone fjles and other sources (at

least once daily)

  • Store state of covered namespace in “names to measure” DB
  • Convert zone fjles to Avro
slide-6
SLIDE 6

2019-12-09 OpenINTEL WIE-KISMET 2019 6/11

Stage II: main measurement

  • Actively sending queries for all collected names (daily)
  • Workers write results to fjles, chunked per 100k names
  • Also track measurement performance (meta-data)

Stage II: measurements / querying coordinator (per source) Set of workers per source (scalable)

Internet

DNS queries & answers Domain names DB Measurement data (Avro) Measurement meta-data

slide-7
SLIDE 7

2019-12-09 OpenINTEL WIE-KISMET 2019 7/11

Stage III: storage and persistence

  • We stream the data (measurement, meta, zone fjles) to a Kafka cluster

Allows near real-time stream-based analysis (WIP)

  • Data is persisted in HDFS

allowing batch-based, longit. analyses (many successes)

  • Clone data ofg-site (archive on tape & CAIDA clone)
  • We are adding additional data to our streaming system (e.g., CTLs, RPKI data, ...)

Stage III: data streaming, enrichment & persistence Measurement data & meta (Avro) Zone fjles data (Avro) Kafka cluster Hadoop cluster Persist (HDFS) Ofg-site archival (tape) SDSC (Swift) CAIDA clone (com/net/org) Other data sources RV pfx2as geo- location <add more> “Stream” additional data

slide-8
SLIDE 8

2019-12-09 OpenINTEL WIE-KISMET 2019 8/11

What do we have, in simple numbers

  • Started measuring February, 2015
  • We collect over 2.4 10

9 DNS records each day

  • So far, we collected over 3.6 10

12 results (3.6 trillion)

slide-9
SLIDE 9

2019-12-09 OpenINTEL WIE-KISMET 2019 9/11

Which data do we share

  • We share open data publicly

– Open sources (e.g., .se, .nu, Alexa) – As Avro fjles on openintel.nl, /w “light” docs

  • We share closed data with other researchers

– Typically require them to have registry operator contracts

  • We share closed data with the respective registry operators
slide-10
SLIDE 10

2019-12-09 OpenINTEL WIE-KISMET 2019 10/11

Ongoing and planned improvements

  • More and improved data sharing

– Aggregate datasets – Public Kafka broker – Rolling stats & insights (openintel.nl) – Jupyter containers (Dockerfjle) /w example analyses

(also for education purposes)

  • Fusing more data in streaming system

– e.g.: certifjcate transparency logs, BGP events, outages, DoS attacks, …

  • Reverse address space measurements (in-addr.arpa)
  • Targeting additional authoritative(?) name servers
  • Support distributed (multi-VP) measurement
slide-11
SLIDE 11

2019-12-09 OpenINTEL WIE-KISMET 2019 11/11

Questions ?