DNSql Processing Massive DNS Collections Stephen Herwig, Dave - - PowerPoint PPT Presentation

dnsql
SMART_READER_LITE
LIVE PREVIEW

DNSql Processing Massive DNS Collections Stephen Herwig, Dave - - PowerPoint PPT Presentation

DNSql Processing Massive DNS Collections Stephen Herwig, Dave Levin, Bobby Bhattacharjee, Neil Spring University of Maryland, College Park D-root Operated by UMD Anycast with 109 replicas Hourly sampled collection by replica global local


slide-1
SLIDE 1

DNSql

Processing Massive DNS Collections

Stephen Herwig, Dave Levin, Bobby Bhattacharjee, Neil Spring University of Maryland, College Park

slide-2
SLIDE 2

D-root

Anycast with 109 replicas Operated by UMD Hourly sampled collection by replica

global local

slide-3
SLIDE 3

Problem

~140 GiB / day

Lots of data Diverse analyses

Short-term, Long-Term Aggregation by source, replica, geography, topology

Serial processing is slow

~8h to read a month’s worth of collection for CPMD replica

slide-4
SLIDE 4

Approach

pcap.gz sqlite3 MapReduce

CREATE TABLE queryresp ( id INTEGER PRIMARY KEY, sec INTEGER, usec INTEGER, src BLOB, sport INTEGER,

  • pcode INTEGER,

qclass INTEGER, qtype INTEGER, rcode INTEGER, qname TEXT ); CREATE INDEX qname_index ON queryresp(qname); CREATE INDEX src_index ON queryresp(src); CREATE TABLE qps (sec INTEGER, n INTEGER);

dnsqlite3c

slide-5
SLIDE 5

Processing Speed

single pcap.gz month of pcap.gzs zcat | tcpdump dnsqlite3c aggregate.db parallel dnsqlite3c CPMD March 2015 resp (K) / sec 100 200 300 400 500 600 700

slide-6
SLIDE 6

Database Size

normal gzip'd month of pcaps month of SQLite3 shards aggregate.db CPMD March 2015 GiB 250 500 750 1000 1250 1500 1750

slide-7
SLIDE 7

Query Speed

QPS distinct source IPs source IP frequency count distinct hashed qnames aggregate.db mapreduce CPMD March 2015 minutes 2 4 6 8

slide-8
SLIDE 8

Additional Data Sources

MaxMind GeoLite database 7m query time

Percent of Queries to CPMD By Country (March 2015)

3 6 9 12 15 18 21 24 27 30 33

slide-9
SLIDE 9

Per-Source Metrics

466,021 unique sources 1h 10m query time

slide-10
SLIDE 10

Discussion

Additional queries? Optimizations? Extension to non-root servers?