DNSql
Processing Massive DNS Collections
Stephen Herwig, Dave Levin, Bobby Bhattacharjee, Neil Spring University of Maryland, College Park
DNSql Processing Massive DNS Collections Stephen Herwig, Dave - - PowerPoint PPT Presentation
DNSql Processing Massive DNS Collections Stephen Herwig, Dave Levin, Bobby Bhattacharjee, Neil Spring University of Maryland, College Park D-root Operated by UMD Anycast with 109 replicas Hourly sampled collection by replica global local
Stephen Herwig, Dave Levin, Bobby Bhattacharjee, Neil Spring University of Maryland, College Park
Anycast with 109 replicas Operated by UMD Hourly sampled collection by replica
global local
~140 GiB / day
Lots of data Diverse analyses
Short-term, Long-Term Aggregation by source, replica, geography, topology
Serial processing is slow
~8h to read a month’s worth of collection for CPMD replica
pcap.gz sqlite3 MapReduce
CREATE TABLE queryresp ( id INTEGER PRIMARY KEY, sec INTEGER, usec INTEGER, src BLOB, sport INTEGER,
qclass INTEGER, qtype INTEGER, rcode INTEGER, qname TEXT ); CREATE INDEX qname_index ON queryresp(qname); CREATE INDEX src_index ON queryresp(src); CREATE TABLE qps (sec INTEGER, n INTEGER);
dnsqlite3c
single pcap.gz month of pcap.gzs zcat | tcpdump dnsqlite3c aggregate.db parallel dnsqlite3c CPMD March 2015 resp (K) / sec 100 200 300 400 500 600 700
normal gzip'd month of pcaps month of SQLite3 shards aggregate.db CPMD March 2015 GiB 250 500 750 1000 1250 1500 1750
QPS distinct source IPs source IP frequency count distinct hashed qnames aggregate.db mapreduce CPMD March 2015 minutes 2 4 6 8
MaxMind GeoLite database 7m query time
Percent of Queries to CPMD By Country (March 2015)
3 6 9 12 15 18 21 24 27 30 33
466,021 unique sources 1h 10m query time
Additional queries? Optimizations? Extension to non-root servers?