Integrating Human and Synthetic Reasoning Via Model-Based Analysis - - PowerPoint PPT Presentation

integrating human and synthetic reasoning via model based
SMART_READER_LITE
LIVE PREVIEW

Integrating Human and Synthetic Reasoning Via Model-Based Analysis - - PowerPoint PPT Presentation

Integrating Human and Synthetic Reasoning Via Model-Based Analysis Introduction and Explanation This is an experimental idea and very rough Glue together very tame AI and user interface through some fault trees To capture knowledge


slide-1
SLIDE 1

Integrating Human and Synthetic Reasoning Via Model-Based Analysis

slide-2
SLIDE 2

Introduction and Explanation

  • This is an experimental idea and very rough

– Glue together very tame AI and user interface through some fault trees

  • To capture knowledge
  • Improve efficiency
  • Overview of work
  • (My) QuesBons!
slide-3
SLIDE 3

If you haven’t seen this slide, you haven’t attended any of my talks

slide-4
SLIDE 4

How much do we know about network traffic?

0.1 1 10 100 1000 d06h00 d06h02 d06h04 d06h06 d06h08 d06h10 d06h12 d06h14 d06h16 d06h18 d06h20 d06h22 d07h00 Activity (GB/5min) Time All TCP activity Identifiable File Transfers Identifiable Control Scanning 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 d06h00 d06h02 d06h04 d06h06 d06h08 d06h10 d06h12 d06h14 d06h16 d06h18 d06h20 d06h22 d07h00 Activity (Flows/5min) Time All TCP activity Identifiable File Transfers Identifiable Control Scanning

slide-5
SLIDE 5

Basic problem

  • We don’t know what we know

and we don’t know what we don’t know

  • Most valuable resource

available is analyst head Bme

  • Lots of repeBBve mindless

aHacks

  • Lots of low‐risk, high‐threat

aHacks

  • Have to automate
  • Also have to ensure

automa(on isn’t self defea(ng

slide-6
SLIDE 6

A Metric For Knowledge

  • Every day we receive k‐billion flows

– We can understand and accurately tag x% of them – As x approaches 100%, the beHer

  • We improve x:

– Hiring more analysts – Reducing traffic into the network – AutomaBng the process – Describing mulBple flows at 1 Bme

slide-7
SLIDE 7

Prototype System Diagram

Flow DPI AI AI AI Models

Admission Of Ignorance

Maps Reports Alerts

Historical Maps

Conflict

slide-8
SLIDE 8

What is an AI?

  • An AI is a system that reads in network data and outputs:

– A domain – Some models – Alerts – Inventory data

AI DPI Maps Scans Important Scans Systems Scanned

slide-9
SLIDE 9

Complementary AIs

  • Accurate
  • Predictable
  • Unambiguous human/

machine communicaBon

  • Humans serve as the

final judge

  • Don’t overwhelm with

trivia

slide-10
SLIDE 10

Accuracy

  • Control ambiguity

– ROC curves provide us with a measure of accuracy – But we’ve generally been unsure about what TP to use

  • AIs will not guess in
  • rder to avoid a bad

guess

50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50

True positive rate (percentage) False positive rate (percentage)

=2 =6

slide-11
SLIDE 11

Predictable

  • There isn’t much we can do…

– Reports: periodic and predictable informaBon on the state of the system (e.g., scanning) – Alerts: When an ac(onable event occurs, a noBce of the event and a recommended strategy (alter fw rules, take down machine, send people with guns) – Internal Intelligence: maps of the inside of the network – External Intelligence: maps of the outside world

  • Inventory is central
slide-12
SLIDE 12

Conflict Resolution

  • We know that something is something

– By fiat (“It’s my webserver”) – By published reference (port 80 is hHp) – Deep packet inspecBon (HTTP/1.0…) – Behaviorally (short requests, big transfers)

  • Hierarchy of certainty

– DPI >> Fiat >> Behavioral >> Published reference

slide-13
SLIDE 13

Managing Conflict

DPI Fiat Behavioral Published Result A ‐ ‐ ‐ Map as A, alert on lack of published info A !A X X Map as A, alert on conflict A A X X Map as A A ‐ ‐ !A Map as A, alert on masquerade ‐ A ‐ A Map as A ‐ A !A A Report anomaly, Map as A ‐ A ‐ ‐ Map as A

slide-14
SLIDE 14

Human/Machine Communication

  • AIs don’t raise alerts on normal behavior

– Reports are for that

  • AIs raise alerts on ac(onable anomalies

– Provide diagnosBcs, inventory and history

  • AIs raise alerts on conflicts

– Rely on the user to resolve the conflict and move

  • n
slide-15
SLIDE 15

User Controls

  • Everyone controls domains: sip, dip, sport, dport,

Bme and protocol value

– Domains have wildcards

  • Agents mark or subscribe domain:

– Mark: this happened in the past and I can infer what happened

  • For AI’s, Mark indicates “I recognize this”

– Subscribe: I will control and worry about this from now onto the future

  • For users, subscripBon says “This is my territory”
slide-16
SLIDE 16

Models

  • AIs don’t output flow data

– They mark off some segment of flows and group them together as a separate structure

  • For example:

– A “scan” – A “BiHorrent Network” – A “Surfing session”

  • These models, in turn, have quesBons and structures

that are more relevant to analysis

– Who did the scan hit? – How much traffic was transferred in BT?

slide-17
SLIDE 17

A Really Ugly UI

slide-18
SLIDE 18

What that is

  • Certainly not a testament to my visualizaBon

skills

  • Prototype using two systems

– Simple scan detecBon – BitTorrent detecBon

  • The black is what’s lei
slide-19
SLIDE 19

Problems

  • As I said, this is all very rough right now
  • Problems remaining:

– ApplicaBon/Knowledge Layering – Model Taxonomy – User experience – Backtracking – Metadata

slide-20
SLIDE 20

Application Layering

  • Make judgments at different levels of the stack
  • Different inferenBal resoluBon:

– Does this IP exist? – Does this IP communicate? – What does this IP communciate? – Is this IP significant in its network?

slide-21
SLIDE 21

Model Taxonomy

  • Models replace flows with more compact descripBons of

phenomena

– E.g., “A Scan” is a list of the scanned IP’s, and anything that responded

  • Trying to begin with broad behavioral descripBons and move down

from there

Flood Service RouBng Scanning BackscaHer Worm DDoS Xfer ChaHy

slide-22
SLIDE 22

Unsolved Problems

  • Weirdometer metrics

– Flows/IPS/bytes/IP pairs?

  • Backtracking

– How much do we want to see flow vs. model vs. map?

  • Response Mechanism

– What can a CSIRT do?

  • Meta metrics

– How much of the traffic do we understand?