Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD - - PowerPoint PPT Presentation

query log analysis
SMART_READER_LITE
LIVE PREVIEW

Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD - - PowerPoint PPT Presentation

Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD Resolver Pieter Robberechts , Maarten Bosteels, Jesse Davis and Wannes Meert Goal and Context Goal and Context The QLAD System Results Conclusion DNS The Domain Name System


slide-1
SLIDE 1

Query Log Analysis

Detecting Anomalies in DNS Traffic at a TLD Resolver

Pieter Robberechts, Maarten Bosteels, Jesse Davis and Wannes Meert

slide-2
SLIDE 2

The QLAD System Results Conclusion Goal and Context Goal and Context

slide-3
SLIDE 3

DNS


The Domain Name System

Browser www.cs.kuleuven.be 14.154.78.252 DNS

slide-4
SLIDE 4

Browser Recursive Resolver
 (ISP) Root 
 Servers TLD 
 Name Server Authoritative 
 Name Server Cache Cache

DNS


The Domain Name System

? .cs.kuleuven.be ? .be 174.34.28.193 ? .kuleuven.be 54.186.35.8 ? .cs.kuleuven.be 1 4 . 1 5 4 . 7 8 . 2 5 2 14.154.78.252

slide-5
SLIDE 5

DNS Belgium


The .be ccTLD resolver

Domain name registry for .be/.vlaanderen/.brussels

  • Manage registration of domains
  • Provide infrastructure to answer queries

1.5 million 


domains

Highlights uit 2016. DNS Belgium. 
 URL: https://www.dnsbelgium.be/sites/default/files/generated/files/documents/cijfers%20deel%201%20-%20980px_v04_NL.pdf

4


nameservers

350 million 


queries / day

slide-6
SLIDE 6

DNS Belgium


Current Situation

PCAP files

  • stored for 10 days
  • analysed post-mortem
slide-7
SLIDE 7

DNS Belgium


Current Situation

We believe that proactive and real-time analysis

  • f this data could contribute to the resilience

and security of DNS Belgium’s service.

slide-8
SLIDE 8

4 Challenges

  • 1. Huge data volume
  • efficiency and scalability !
  • easy to stay under the hood
  • 2. No labelled or clean training data
  • 3. Wide range of attacks, under constant evolution
  • 4. Specific nature of DNS traffic
  • periodicity and trends
  • few (typically two) packets per flow

slide-9
SLIDE 9

The QLAD System Results Conclusion Goal and Context The QLAD System

slide-10
SLIDE 10

QLAD


System Overview

ENTRADA DSC QLAD-flow QLAD-global QLAD-UI

  • - OR --

DATA TRANSFORMATION ANOMALY 
 DETECTION PRESENTATION

slide-11
SLIDE 11

Data Transformation


ENTRADA vs DSC

ENTRADA DSC + convert archive SQL aggregate archive MonogDB API

slide-12
SLIDE 12

Data Transformation


ENTRADA vs DSC

ENTRADA DSC

  • Stores all traffic
  • Allows a detailed analysis
  • SQL interface
  • Storage cost and infrastructure
  • Lightweight
  • No additional infrastructure
  • No detailed (log level) analysis

"ClientAddr": [ { "val": "195.238.24.111", "count": 1014 }, { "val": "195.238.25.53", "count": 70 }, { "val": "195.238.25.99", "count": 63 }, { "val": "195.238.24.117", "count": 61 }, { "val": "194.78.30.189", "count": 59 }, { "val": "42.236.23.92", "count": 55 }, { "val": "195.238.25.108", "count": 55 }, { "val": "42.236.23.91", "count": 54 }, { "val": "193.58.1.131", "count": 52 },

slide-13
SLIDE 13

QLAD-flow


Dewaele, G., Fukuda, K., Borgnat, P ., Abry, P ., & Cho, K. (2007). Extracting Hidden Anomalies using Sketch and Non Gaussian Multiresolution Statistical Detection Procedures. Proc. ACM SIGCOMM Workshop on Large-Scale Attack Defense (LSAD’07), 1–8.

h₁

Hash packets

slide-14
SLIDE 14

α₁, β₁ Level 1 2 1 3 2 1 3 1 2 1 α₂, β₂ Level 2 3 3 3 4 2 α₃, β₃ Level 3 6 7 2

QLAD-flow


Algorithm

Count packets at different aggregation levels

slide-15
SLIDE 15

α₁ β₁ α₁ β₁ α₁ β₁ + + + Level 1 Level 2 Level 3 Avg Distance

QLAD-flow


Algorithm

Anomalous group

Compare groups at each aggregation level Identify groups that differ from average

slide-16
SLIDE 16

QLAD-flow


Algorithm

h1 h2 h3

Repeat with different hash functions

slide-17
SLIDE 17

Some attacks span a lot of flows


e.g. DoS with spoofed IP address

QLAD-flow


Shortcomings

QLAD-flow is unable to detect these

slide-18
SLIDE 18

QLAD-global


Algorithm

Observation: each traffic anomaly causes changes in the distribution of one or more traffic features

Look at entropy!

slide-19
SLIDE 19

QLAD-global


Algorithm

ENTRADA DSC GET NEW ENTROPIES UPDATE MODELS RUN DETECTOR

  • EMA
  • Kalman
  • ...

REPORT ANOMALIES

  • timestamp
  • features with

anomaly

TLD SLD qtype rcode client ASN country response size TLD SLD qtype rcode client ASN country response size

  • - OR --

1 2 4 3

slide-20
SLIDE 20

The QLAD System Results Conclusion Goal and Context Results

slide-21
SLIDE 21

Data


Description of the evaluation dataset

Sunday 12 to Monday 13 February 2017

58,345,819 queries 42 GB 1

server

slide-22
SLIDE 22

Results


Detected anomalies

QLAD-flow (source IP) QLAD-flow (query name) QLAD-global Total (unique) Bening Caching resolver 12 2 12 Email marketing 8 2 9 Other 1 2 3 Malicious Spam sender 3 3 Domain enumeration 5 2 5 Reflection attack 1 1 2 Phishing 1 1 DoS attack 3 2 1 4 Unknown 1 1 1 TOTAL 35 4 9 39

slide-23
SLIDE 23

Results


Detected anomalies

  • No ground truth


→ Impossible to use standard evaluation
 → Manual inspection of detected anomalies

  • Only tip of the iceberg?
slide-24
SLIDE 24

The QLAD System Results Conclusion Goal and Context Conclusion

slide-25
SLIDE 25

Conclusion

QLAD

  • ENTRADA / DSC
  • QLAD-flow
  • QLAD-global
  • QLAD-UI

is a combination that works! Anomaly ≠ attack / abuse ➡ filtering needed

Can this be automated?

However,

slide-26
SLIDE 26

Thanks!

Any questions? Interested? All software is open source! QLAD: https://github.com/DNSBelgium/qlad ENTRADA: https://github.com/SIDN/entrada DSC: https://github.com/DNS-OARC/dsc