Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD - - PowerPoint PPT Presentation

query log analysis
SMART_READER_LITE
LIVE PREVIEW

Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD - - PowerPoint PPT Presentation

Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD Resolver Pieter Robberechts Promotor: Prof. Hendrik Blockeel Thesis Defence Co-promoter: Ronald Geens Jun 30, 2017 Goal and Context Goal and Context The QLAD System Results


slide-1
SLIDE 1

Query Log Analysis

Detecting Anomalies in DNS Traffic at a TLD Resolver Thesis Defence Jun 30, 2017 Pieter Robberechts Promotor: Prof. Hendrik Blockeel Co-promoter: Ronald Geens

slide-2
SLIDE 2

The QLAD System

Results Conclusion Goal and Context Goal and Context

slide-3
SLIDE 3

DNS Belgium


The .be ccTLD resolver

Domain name registry for .be/.vlaanderen/.brussels

  • Manage registration of domains
  • Provide infrastructure to answer queries

1.5 million 


domains

Highlights uit 2016. DNS Belgium. 
 URL: https://www.dnsbelgium.be/sites/default/files/generated/files/documents/cijfers%20deel%201%20-%20980px_v04_NL.pdf

4


nameservers

350 million 


queries / day

slide-4
SLIDE 4

DNS Belgium


Current Situation

pcap files

  • stored for 10 days
  • analysed post-mortem
slide-5
SLIDE 5

DNS Belgium


Current Situation

We believe that proactive and real-time analysis of this data could contribute to the resilience and security of DNS Belgium’s service.

slide-6
SLIDE 6

Design and build a working query log analysis platform using available components and custom development, able to predict, detect and report on common attack and abuse patterns in an

  • pen architecture, allowing for future

growth and improvement.

"

slide-7
SLIDE 7

Anomaly Detection


Challenges

  • Huge data volume
  • efficiency and scalability !
  • easy to stay under the hood
  • No labelled or clean training data
  • Wide range of attacks, under constant evolution
  • Specific nature of DNS traffic
  • periodicity and trends
  • few (typically two) packets per flow

slide-8
SLIDE 8

Anomaly Detection


Goal

Design and implement a query log analysis platform that:

  • is able to detect suspicious behaviour and a wide range of attacks
  • is efficient enough to scan high volume traffic
  • can detect low volume anomalies
  • does not need any initial knowledge about the analysed traffic
  • is tuned to the unique nature of DNS traffic
  • allows for future growth and improvement 

slide-9
SLIDE 9

The QLAD System

Results Conclusion Goal and Context

The QLAD System

slide-10
SLIDE 10

QLAD


Query Log Anomaly Detection

  • Focus on anomalies (≠ attacks/abuses)
  • Statistical techniques
  • Inspiration from network anomaly detection
slide-11
SLIDE 11

QLAD


System Overview

ENTRADA DSC QLAD-flow QLAD-global QLAD-UI

  • - OR --

DATA TRANSFORMATION ANOMALY 
 DETECTION PRESENTATION

slide-12
SLIDE 12

Data Transformation


ENTRADA vs DSC

ENTRADA DSC + convert archive SQL aggregate archive MonogDB API

slide-13
SLIDE 13

Data Transformation


ENTRADA vs DSC

ENTRADA DSC

  • Stores all traffic
  • Allows a detailed analysis
  • SQL interface
  • Storage cost and infrastructure
  • Lightweight
  • No additional infrastructure
  • No detailed (log level) analysis

"ClientAddr": [ { "val": "195.238.24.111", "count": 1014 }, { "val": "195.238.25.53", "count": 70 }, { "val": "195.238.25.99", "count": 63 }, { "val": "195.238.24.117", "count": 61 }, { "val": "194.78.30.189", "count": 59 }, { "val": "42.236.23.92", "count": 55 }, { "val": "195.238.25.108", "count": 55 }, { "val": "42.236.23.91", "count": 54 }, { "val": "193.58.1.131", "count": 52 },

slide-14
SLIDE 14

QLAD-flow


Algorithm

h₁

Dewaele, G., Fukuda, K., Borgnat, P., Abry, P., & Cho, K. (2007). Extracting Hidden Anomalies using Sketch and Non Gaussian Multiresolution Statistical Detection Procedures. Proc. ACM SIGCOMM Workshop on Large-Scale Attack Defense (LSAD’07), 1–8.

slide-15
SLIDE 15

α₁, β₁ Level 1 2 1 3 2 1 3 1 2 1 α₂, β₂ Level 2 3 3 3 4 2 α₃, β₃ Level 3 6 7 2

QLAD-flow


Algorithm

slide-16
SLIDE 16

α₁ β₁ α₁ β₁ α₁ β₁ + + + Level 1 Level 2 Level 3 Avg Distance

QLAD-flow


Algorithm

Anomalous sketch

slide-17
SLIDE 17

QLAD-flow


Algorithm

h1 h2 h3

slide-18
SLIDE 18
  • Designed to analyse the whole TCP/IP traffic.
  • Works with TCP/IP connection identifiers (src/dst port/address).
  • CZ.NIC extended it to meet DNS traffic specifics.
  • Hash keys:
  • IP address
  • Uses the source IP address.
  • Helps finding suspicious traffic sources.
  • Query name
  • First domain name of the query is extracted.
  • Helps finding suspicious traffic from legitimate sources.
  • ASN
  • Uses network identifier.
  • Helps finding traffic from suspicious networks.

QLAD-flow


Algorithm

[1] Dewaele, G., Fukuda, K., Borgnat, P., Abry, P., & Cho, K. (2007). Extracting Hidden Anomalies using Sketch and Non Gaussian Multiresolution Statistical Detection Procedures. Proc. ACM SIGCOMM Workshop on Large-Scale Attack Defense (LSAD’07), 1–8. [2] Mikle, O., Slany, K., Vesely, J., Janousek, T., & Survy, O. (2011). Detecting Hidden Anomalies in DNS Communication. [1] [2]

slide-19
SLIDE 19

Some attacks span a lot of flows


e.g. DoS with spoofed IP address

QLAD-flow


Shortcomings

QLAD-flow is unable to detect these

slide-20
SLIDE 20

QLAD-global


Algorithm

Observation: each traffic anomaly causes changes in the distribution of one or more traffic features

slide-21
SLIDE 21

QLAD-global


Algorithm

ENTRADA DSC GET NEW ENTROPIES UPDATE MODELS RUN DETECTOR

  • EMA
  • Kalman
  • ...

REPORT ANOMALIES

  • timestamp
  • features with

anomaly

TLD SLD qtype rcode client ASN country response size TLD SLD qtype rcode client ASN country response size

  • - OR --

1 2 4 3

slide-22
SLIDE 22

QLAD-UI


Rationale

  • Automatic classification is

challenging

  • wide range of anomalies
  • subtle differences
  • Rely on user
  • Visualise anomalies with relevant

traffic

slide-23
SLIDE 23

Node.js API

QLAD-UI


Implementation

DATABASE DATA API USER INTERFACE React + Flux + Grommet + D3.js HDFS MongoDB

staging warehouse

Thrift API Mongoose

slide-24
SLIDE 24

The QLAD System

Results Conclusion Goal and Context Results

slide-25
SLIDE 25

Data


Description of the evaluation dataset

Sunday 12 to Monday 13 February 2017

58,345,819 queries 42 GB 1

server

slide-26
SLIDE 26

Results


Detected anomalies

QLAD-flow (source IP) QLAD-flow (query name) QLAD-global Total (unique) Caching resolver 12 2 12 Bening anomaly 1 2 3 Email marketing 8 2 8 Spam sender 3 3 Domain enumeration 5 2 5 Reflection attack 1 1 1 Broken resolver or script 1 1 DoS attack 3 2 1 3 Unknown 1 1 1 False Positive 11 TOTAL 35 15 9 36

slide-27
SLIDE 27

The QLAD System

Results Conclusion Goal and Context Conclusion

slide-28
SLIDE 28

Conclusion


Achieved results

QLAD

  • ENTRADA / DSC
  • QLAD-flow
  • QLAD-global
  • QLAD-UI

is a winning combination!

slide-29
SLIDE 29

Conclusion


Future (ongoing) work

  • Anomaly ≠ attack / abuse => filtering needed

Can this be automated?

  • Additional / alternative algorithms
  • rule based
  • clustering
  • Student job
slide-30
SLIDE 30

Thanks!

Any questions?

slide-31
SLIDE 31

Appendix


Gamma Distribution

The shape parameter α controls the evolution of Γα,β from a highly asymmetric stretched exponential shape (α → 0) to a Gaussian shape (α → +∞).

  • -> 1/α can be read as a measure of the departure of Γα,β from the normal distribution

N(αβ, αβ2) 
 The scale parameter β mostly acts as a multiplicative factor (if X is Γα,β , then γ X is simply Γα,γ β ).