Integrating Human and Synthetic Reasoning Via Model-Based Analysis - - PowerPoint PPT Presentation

▶

May 29, 2023 380 likes •617 views

Integrating Human and Synthetic Reasoning Via Model-Based Analysis Introduction and Explanation This is an experimental idea and very rough Glue together very tame AI and user interface through some fault trees To capture knowledge

SLIDE 1

Integrating Human and Synthetic Reasoning Via Model-Based Analysis

SLIDE 2

Introduction and Explanation

This is an experimental idea and very rough

– Glue together very tame AI and user interface through some fault trees

To capture knowledge
Improve efficiency
Overview of work
(My) QuesBons!

SLIDE 3

If you haven’t seen this slide, you haven’t attended any of my talks

SLIDE 4

How much do we know about network traffic?

0.1 1 10 100 1000 d06h00 d06h02 d06h04 d06h06 d06h08 d06h10 d06h12 d06h14 d06h16 d06h18 d06h20 d06h22 d07h00 Activity (GB/5min) Time All TCP activity Identifiable File Transfers Identifiable Control Scanning 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 d06h00 d06h02 d06h04 d06h06 d06h08 d06h10 d06h12 d06h14 d06h16 d06h18 d06h20 d06h22 d07h00 Activity (Flows/5min) Time All TCP activity Identifiable File Transfers Identifiable Control Scanning

SLIDE 5

Basic problem

We don’t know what we know

and we don’t know what we don’t know

Most valuable resource

available is analyst head Bme

Lots of repeBBve mindless

aHacks

Lots of low‐risk, high‐threat

aHacks

Have to automate
Also have to ensure

automa(on isn’t self defea(ng

SLIDE 6

A Metric For Knowledge

Every day we receive k‐billion flows

– We can understand and accurately tag x% of them – As x approaches 100%, the beHer

We improve x:

– Hiring more analysts – Reducing traffic into the network – AutomaBng the process – Describing mulBple flows at 1 Bme

SLIDE 7

Prototype System Diagram

Flow DPI AI AI AI Models

Admission Of Ignorance

Maps Reports Alerts

Historical Maps

Conflict

SLIDE 8

What is an AI?

An AI is a system that reads in network data and outputs:

– A domain – Some models – Alerts – Inventory data

AI DPI Maps Scans Important Scans Systems Scanned

SLIDE 9

Complementary AIs

Accurate
Predictable
Unambiguous human/

machine communicaBon

Humans serve as the

final judge

Don’t overwhelm with

trivia

SLIDE 10

Accuracy

Control ambiguity

– ROC curves provide us with a measure of accuracy – But we’ve generally been unsure about what TP to use

AIs will not guess in
rder to avoid a bad

guess

50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50

True positive rate (percentage) False positive rate (percentage)

=2 =6

SLIDE 11

Predictable

There isn’t much we can do…

– Reports: periodic and predictable informaBon on the state of the system (e.g., scanning) – Alerts: When an ac(onable event occurs, a noBce of the event and a recommended strategy (alter fw rules, take down machine, send people with guns) – Internal Intelligence: maps of the inside of the network – External Intelligence: maps of the outside world

Inventory is central

SLIDE 12

Conflict Resolution

We know that something is something

– By fiat (“It’s my webserver”) – By published reference (port 80 is hHp) – Deep packet inspecBon (HTTP/1.0…) – Behaviorally (short requests, big transfers)

Hierarchy of certainty

– DPI >> Fiat >> Behavioral >> Published reference

SLIDE 13

Managing Conflict

DPI Fiat Behavioral Published Result A ‐ ‐ ‐ Map as A, alert on lack of published info A !A X X Map as A, alert on conflict A A X X Map as A A ‐ ‐ !A Map as A, alert on masquerade ‐ A ‐ A Map as A ‐ A !A A Report anomaly, Map as A ‐ A ‐ ‐ Map as A

SLIDE 14

Human/Machine Communication

AIs don’t raise alerts on normal behavior

– Reports are for that

AIs raise alerts on ac(onable anomalies

– Provide diagnosBcs, inventory and history

AIs raise alerts on conflicts

– Rely on the user to resolve the conflict and move

SLIDE 15

User Controls

Everyone controls domains: sip, dip, sport, dport,

Bme and protocol value

– Domains have wildcards

Agents mark or subscribe domain:

– Mark: this happened in the past and I can infer what happened

For AI’s, Mark indicates “I recognize this”

– Subscribe: I will control and worry about this from now onto the future

For users, subscripBon says “This is my territory”

SLIDE 16

Models

AIs don’t output flow data

– They mark off some segment of flows and group them together as a separate structure

For example:

– A “scan” – A “BiHorrent Network” – A “Surfing session”

These models, in turn, have quesBons and structures

that are more relevant to analysis

– Who did the scan hit? – How much traffic was transferred in BT?

SLIDE 17

A Really Ugly UI

SLIDE 18

What that is

Certainly not a testament to my visualizaBon

skills

Prototype using two systems

– Simple scan detecBon – BitTorrent detecBon

The black is what’s lei

SLIDE 19

Problems

As I said, this is all very rough right now
Problems remaining:

– ApplicaBon/Knowledge Layering – Model Taxonomy – User experience – Backtracking – Metadata

SLIDE 20

Application Layering

Make judgments at different levels of the stack
Different inferenBal resoluBon:

– Does this IP exist? – Does this IP communicate? – What does this IP communciate? – Is this IP significant in its network?

SLIDE 21

Model Taxonomy

Models replace flows with more compact descripBons of

phenomena

– E.g., “A Scan” is a list of the scanned IP’s, and anything that responded

Trying to begin with broad behavioral descripBons and move down

from there

Flood Service RouBng Scanning BackscaHer Worm DDoS Xfer ChaHy

SLIDE 22

Unsolved Problems

Weirdometer metrics

– Flows/IPS/bytes/IP pairs?

Backtracking

– How much do we want to see flow vs. model vs. map?

Response Mechanism

– What can a CSIRT do?

Meta metrics

Integrating Human and Synthetic Reasoning Via Model-Based Analysis

Introduction and Explanation

– Glue together very tame AI and user interface through some fault trees

If you haven’t seen this slide, you haven’t attended any of my talks

How much do we know about network traffic?

Basic problem

and we don’t know what we don’t know

available is analyst head Bme

aHacks

aHacks

automa(on isn’t self defea(ng

A Metric For Knowledge

– We can understand and accurately tag x% of them – As x approaches 100%, the beHer

– Hiring more analysts – Reducing traffic into the network – AutomaBng the process – Describing mulBple flows at 1 Bme

Prototype System Diagram

Flow DPI AI AI AI Models

Maps Reports Alerts

Historical Maps

Conflict

What is an AI?

– A domain – Some models – Alerts – Inventory data

AI DPI Maps Scans Important Scans Systems Scanned

Complementary AIs

machine communicaBon

final judge

trivia

Accuracy

– ROC curves provide us with a measure of accuracy – But we’ve generally been unsure about what TP to use

guess

Predictable

Conflict Resolution

– By fiat (“It’s my webserver”) – By published reference (port 80 is hHp) – Deep packet inspecBon (HTTP/1.0…) – Behaviorally (short requests, big transfers)

– DPI >> Fiat >> Behavioral >> Published reference

Managing Conflict

DPI Fiat Behavioral Published Result A ‐ ‐ ‐ Map as A, alert on lack of published info A !A X X Map as A, alert on conflict A A X X Map as A A ‐ ‐ !A Map as A, alert on masquerade ‐ A ‐ A Map as A ‐ A !A A Report anomaly, Map as A ‐ A ‐ ‐ Map as A

Human/Machine Communication

– Reports are for that

– Provide diagnosBcs, inventory and history

– Rely on the user to resolve the conflict and move

User Controls

Bme and protocol value

– Domains have wildcards

– Mark: this happened in the past and I can infer what happened

– Subscribe: I will control and worry about this from now onto the future

Models

– They mark off some segment of flows and group them together as a separate structure

– A “scan” – A “BiHorrent Network” – A “Surfing session”

that are more relevant to analysis

– Who did the scan hit? – How much traffic was transferred in BT?

A Really Ugly UI

What that is

skills

– Simple scan detecBon – BitTorrent detecBon

Problems

– ApplicaBon/Knowledge Layering – Model Taxonomy – User experience – Backtracking – Metadata

Application Layering

– Does this IP exist? – Does this IP communicate? – What does this IP communciate? – Is this IP significant in its network?

Model Taxonomy

phenomena

– E.g., “A Scan” is a list of the scanned IP’s, and anything that responded

from there

Unsolved Problems

– Flows/IPS/bytes/IP pairs?

– How much do we want to see flow vs. model vs. map?

– What can a CSIRT do?

– How much of the traffic do we understand?