dnstap : high speed DNS logging without packet capture Jeroen - - PowerPoint PPT Presentation

dnstap high speed dns logging without packet capture
SMART_READER_LITE
LIVE PREVIEW

dnstap : high speed DNS logging without packet capture Jeroen - - PowerPoint PPT Presentation

dnstap : high speed DNS logging without packet capture Jeroen Massar Farsight Security, Inc. Unifying the Global Response to Cybercrime Credits & More Info Design & Implementation: Robert Edmonds <edmonds@fsi.io> Website:


slide-1
SLIDE 1

Unifying the Global Response to Cybercrime

dnstap: high speed DNS logging without packet capture Jeroen Massar

Farsight Security, Inc.

slide-2
SLIDE 2

Unifying the Global Response to Cybercrime

2

Design & Implementation: Robert Edmonds <edmonds@fsi.io> Website: http://dnstap.info Documentation/Presos/Tutorials/Mailinglist/ Downloads/Code-repos

Credits & More Info

slide-3
SLIDE 3

Unifying the Global Response to Cybercrime

3

Simplified DNS Overview

slide-4
SLIDE 4

Unifying the Global Response to Cybercrime

4

Query Logging

slide-5
SLIDE 5

Unifying the Global Response to Cybercrime

5

  • Log information about DNS queries:
  • Client IP address
  • Question name
  • Question type
  • Other related information?
  • EDNS options
  • DNSSEC status
  • Cache miss or cache hit?
  • May have to look at both queries and responses.

Query Logging: Details Logged

slide-6
SLIDE 6

Unifying the Global Response to Cybercrime

6

  • DNS server generates log messages in the normal

course of processing requests.

  • Reputed to impact performance significantly.
  • Typical implementation:
  • Parse the request.
  • Format it into a text string.
  • Send to syslog or write to a log file.

Query Logging: How

slide-7
SLIDE 7

Unifying the Global Response to Cybercrime

7

  • Implementation issues that affect performance:
  • Transforming the query into a text string takes time.
  • Memory copies, format string parsing, etc.
  • Writing the log message using synchronous I/O in the

worker thread.

  • Using syslog instead of writing log files directly.
  • syslog() takes out a process-wide lock and does a

blocking, unbuffered write for every log message.

  • Using stdio to write log files.
  • printf(), fwrite(), etc. take out a lock on the output

Query Logging: Issues

slide-8
SLIDE 8

Unifying the Global Response to Cybercrime

8

§ Do it with packet capture instead:

  • Eliminates the performance issues.
  • But, can't replicate state that doesn't appear

directly in the packet.

  • E.g., whether the request was served from the cache.

§ What if the performance issues in the server software were fixed?

Query Logging: Improving

slide-9
SLIDE 9

Unifying the Global Response to Cybercrime

9

Passive DNS

slide-10
SLIDE 10

Unifying the Global Response to Cybercrime

10

  • Deployment options:
  • (1) “Below the recursive”
  • (2) “Above the recursive”

Passive DNS: Setup

slide-11
SLIDE 11

Unifying the Global Response to Cybercrime

11

§ Log information about zone content:

  • Record name
  • Record type
  • Record data
  • Nameserver IP address

Passive DNS: Details Logged

slide-12
SLIDE 12

Unifying the Global Response to Cybercrime

12

§ Typical implementation:

  • Capture the DNS response packets at the

recursive DNS server.

  • Reassemble the DNS response messages from the

packets.

  • Extract the DNS resource records contained in

the response messages.

  • Low to no performance impact

Passive DNS: Implementations

slide-13
SLIDE 13

Unifying the Global Response to Cybercrime

13

§ Discard out-of-bailiwick records. § Discard spoofed UDP responses. § UDP fragment, TCP stream reassembly. § UDP checksum verification. But, the DNS server and its networking stack are already doing these things...

Passive DNS: Issues

slide-14
SLIDE 14

Unifying the Global Response to Cybercrime

14

§ Query logging:

  • Make it faster by eliminating bottlenecks like text

formatting and synchronous I/O. § Passive DNS replication:

  • Avoid complicated state reconstruction issues by

capturing messages instead of packets. § Support both use cases with the same generic mechanism.

Insights

slide-15
SLIDE 15

Unifying the Global Response to Cybercrime

15

§ Add a lightweight message duplication facility directly into the DNS server.

  • Verbatim wire-format DNS messages with

context. § Use a fast logging implementation that doesn't degrade performance.

  • Circular queues.
  • Asynchronous, buffered I/O.
  • Prefer to drop log payloads instead of blocking

the server under load.

dnstap

slide-16
SLIDE 16

Unifying the Global Response to Cybercrime

16

§ DNS server has internal message buffers:

  • Receiving a query.
  • Sending a query.
  • Receiving a response.
  • Sending a response.

§ Instrument the call sites in the server implementation so that message buffers can be duplicated and exported outside of the server process. § Be able to enable/disable each logging site independently.

dnstap: Message Duplication

slide-17
SLIDE 17

Unifying the Global Response to Cybercrime

17

Currently 10 defined subtypes of dnstap “Message”:

§ AUTH_QUERY § AUTH_RESPONSE § RESOLVER_QUERY § RESOLVER_RESPONSE § CLIENT_QUERY § CLIENT_RESPONSE § FORWARDER_QUERY § FORWARDER_RESPONSE § STUB_QUERY § STUB_RESPONSE

dnstap: “Message” Log Format

slide-18
SLIDE 18

Unifying the Global Response to Cybercrime

18

Dnstap: Overview

slide-19
SLIDE 19

Unifying the Global Response to Cybercrime

19

slide-20
SLIDE 20

Unifying the Global Response to Cybercrime

20

§ Turn on AUTH_QUERY and/or CLIENT_QUERY message duplication.

  • Optionally turn on AUTH_RESPONSE and/or

CLIENT_RESPONSE. § Connect a dnstap receiver to the DNS server. § Performance impact should be minimal. § Full verbatim message content is available without text log parsing.

dnstap: Query Logging

slide-21
SLIDE 21

Unifying the Global Response to Cybercrime

21

§ Turn on RESOLVER_RESPONSE message duplication. § Connect a dnstap receiver to the DNS server.

dnstap: Passive DNS

slide-22
SLIDE 22

Unifying the Global Response to Cybercrime

22

§ Once inside the DNS server, the issues caused by being outside disappear.

  • Out-of-bailiwick records: the DNS server already

knows which servers are responsible for which zones.

  • Spoofing: the DNS server already has its state
  • table. Unsuccessful spoofs are excluded.
  • TCP/UDP packet issues: already handled by the

kernel and the DNS server.

dnstap: Passive DNS advantages

slide-23
SLIDE 23

Unifying the Global Response to Cybercrime

23

§ Flexible, structured log format for DNS software. § Helper libraries for adding support to DNS software. § Patch sets that integrate dnstap support into existing DNS software. § Capture tools for receiving dnstap messages from dnstap-enabled software.

dnstap: Components

slide-24
SLIDE 24

Unifying the Global Response to Cybercrime

24

§ Encoded using Protocol Buffers.

  • Compact
  • Binary clean
  • Backwards, forwards compatibility
  • Implementations for numerous programming

languages available

dnstap: Log Format

slide-25
SLIDE 25

Unifying the Global Response to Cybercrime

25

§ fstrm: “Frame Streams” library.

  • Encoding-agnostic transport.
  • Adds ~1.5K LOC to the DNS server.
  • https://github.com/farsightsec/fstrm

§ protobuf-c: “Protocol Buffers” library.

  • Transport-agnostic encoding.
  • Adds ~2.5K LOC to the DNS server.
  • https://github.com/protobuf-c/protobuf-c

Dnstap: Helper Libraries

slide-26
SLIDE 26

Unifying the Global Response to Cybercrime

26

Plans to add dnstap support to software that handles DNS messages: § DNS servers: BIND, Unbound, Knot DNS, etc. § Analysis tools: Wireshark, etc. § Utilities: dig, kdig, drill, dnsperf, resperf § More?

Dnstap: Integration

slide-27
SLIDE 27

Unifying the Global Response to Cybercrime

27

Unbound DNS server with dnstap support. § Supports the relevant dnstap “Message” types for a recursive DNS server: § {CLIENT,RESOLVER,FORWARDER}_{QUERY_RESPONSE} § Adds <1K LOC to the DNS server.

dnstap: Unbound Integration

slide-28
SLIDE 28

Unifying the Global Response to Cybercrime

28

§ Command-line tool/daemon for collecting dnstap log payloads.

  • Print payloads.
  • Save to log file.
  • Retransmit over the network.

§ Similar role to tcpdump, syslogd, or flow-tools.

Dnstap: Capture Tool

slide-29
SLIDE 29

Unifying the Global Response to Cybercrime

29

§ More of a “microbenchmark”. § Meant to validate the architectural approach. § Not meant to accurately characterize the performance of a dnstap-enabled DNS server under “realistic” load.

Benchmark

slide-30
SLIDE 30

Unifying the Global Response to Cybercrime

30

§ One receiver:

  • Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
  • No HyperThreading, no SpeedStep, no Turbo

Boost. § One sender:

  • Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz

§ Intel Corporation I350 Gigabit Network Connection § Sender and receiver directly connected via crossover cable. No switch, RX/TX flow control disabled.

Benchmark setup

slide-31
SLIDE 31

Unifying the Global Response to Cybercrime

31

§ Linux 3.11/3.12. § Defaults, no attempt to tune networking stack. § trafgen used to generate identical UDP DNS questions with random UDP ports / DNS IDs. § tc token bucket filter used to precisely vary the query load offered by the sender. § mpstat used to measure receiver’s system load. § ifpps used to measure packet RX/TX rates on the receiver. § perf used for whole-system profiling.

Benchmark host setup

slide-32
SLIDE 32

Unifying the Global Response to Cybercrime

32

§ Offer particular DNS query loads in 25 Mbps steps:

  • 25 Mbps, 50 Mbps, …, 725 Mbps, 750 Mbps.

§ Measure system load and responses/second at the receiver, where the DNS server is running.

  • Most DNS benchmarks plot queries/second

against response rate to characterize drop rates.

  • Plotting responses/second can still reveal

bottlenecks.

Benchmark tests

slide-33
SLIDE 33

Unifying the Global Response to Cybercrime

33

slide-34
SLIDE 34

Unifying the Global Response to Cybercrime

34

slide-35
SLIDE 35

Unifying the Global Response to Cybercrime

35

slide-36
SLIDE 36

Unifying the Global Response to Cybercrime

36

slide-37
SLIDE 37

Unifying the Global Response to Cybercrime

37

slide-38
SLIDE 38

Unifying the Global Response to Cybercrime

38

Three recursive DNS servers were tested: § BIND 9.9.4, with and without query logging. § Unbound 1.4.21, with and without query logging. § Unbound with a dnstap patch logging incoming queries. Results: § Unbound generally scaled better than BIND 9. § Both DNS servers implement query logging in a way that significantly impacts performance. § dnstap added some overhead, but scaled well.

Benchmark summary

slide-39
SLIDE 39

Unifying the Global Response to Cybercrime

39

§ Additional dnstap logging payload types:

  • DNS cache events: insertions, expirations,
  • verwrites of individual resource records

§ Patches to add dnstap support to more DNS software

  • Not just DNS servers!

§ More documentation & specifications § More tools that can consume dnstap formatted data § More benchmarking

Future Work

slide-40
SLIDE 40

Unifying the Global Response to Cybercrime

40

§ Examined query logging and passive DNS replication. § Introduced new dnstap technology that can support both use cases with an in-process message duplication facility.

Summary