Semantic Flow Augmentation for the Automated Discovery of - PowerPoint PPT Presentation

Semantic Flow Augmentation for the Automated Discovery of Organizational Relationships Chris Strasburg*, Harris T Lin, Nikolas Kinkel The Ames Laboratory {cstras,htlin,nskinkel}@ameslab.gov * - Presenting

Relationship Discovery – Why does it matter? What is the impact of disrupting communication associated with flow set ‘F’? •

Relationship Discovery – Why does it matter? Which alarms are most critical to manually investigate? •

What is Semantic Flow Augmentation

What is Semantic Flow Augmentation • Semantic – Of or relating to meaning…

Why Semantic Augmentation

Why Semantic Augmentation Is it mission related?

Statistical Features • Flow Statistics • Timeseries Analysis – # of Flows – First seen – # of Bytes – Last seen – Peer count – Fourier Transform Coefficient

Semantic Features • Lexical Analysis • Service Distribution (Mallet) – Interactive / Authenticated (SSH, IMAP, POP) – Cluster according to web – Interactive / Non- page contents from: Authenticated (STMP, • Reverse DNS Lookups HTTP/S) • WHOIS Org Searches – Non-Interactive (NTP, DNS) • Session Metadata – Requested URLs

Semantic 1 Features (2) • Bi-clique Grouping 2 – Red = Internal – Green = External – Edges pruned – LP & BRIM Algorithm** 3 **Liu, Xin, and Tsuyoshi Murata. "Community detection in large-scale bipartite networks.” Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT'09. IEEE/WIC/ACM International Joint Conferences on . Vol. 1. IET, 2009. *Gephi http://gephi.org/

Architecture Overview

How to Label / Train Anecdotal Human Process Time consuming!

Kick Start Labeling Feature Classifier Classifier Labels Train Train Initial rank Assign labels New rank New rank All IPs Assign labels Iteration 1 Iteration 2

Anecdotal Validation – Ames Data Gathering Data • – One month of NetFlow data in Ames Lab Preprocessing • – 4 sets of features: simple NetFlow statistics, time series features, lexical analysis features (document topic distributions), biclique community features Labeling • 4242 IPs (801 white / 3441 black) – Testing / verifying classifier • – Weka (Logistic Regression, SVM, Bayesian Network, Decision Tree) – 10 cross-fold validation

Performance Results 100 100 90 90 80 80 70 70 60 60 Lexical 50 50 CC,Service,Biclique 40 40 Netflow 30 30 20 20 10 10 0 0 Precision Recall AUC Precision Recall AUC Decision Tree (C4.5) Logistic Regression

Info Gain by Features Lexical Topic Country Code Lexical Topic Conf Total Bytes Total Records Total Dest Port Total Source Port Community Focus Community Ext/Int Size Latest Endtime Access Hours Workhour Ratio Service Access Days Earliest Starttime Peer Count Community Size Fourier Weekly Fourier Daily 0 0.05 0.1 0.15 0.2

Lexical = Science? Y N Country = Lexical US? Conf Y N < > Service = ssh? Y N Total Bytes < > Service = Lexical = pop/imap? Reference? Y N Y N

Implementation at Ames Laboratory

Challenges / Future Work • Majority of IPs don’t have a • Production ‘burn-in’ web page – Feedback from analysts into a growing set of labels – Automated query for WHOIS Organization • Integration with other systems – Use of AMP data; actual HTTP – BroIDS Module? resources • Mining of graphical data • Speed / Streaming – Second derivative clusters (clusters – Slow to gather features; of clusters) currently batched daily. – Internal resource categorization • Searching – Search engines w/ free API (Faroo?)

Summary • Flow provides ‘how much’; a bit of semantics is required for mission relevance. • Public tools: – SiLK – Flow Statistics – Crawler4J + Mallet – Lexical Analysis – Weka – Machine Learning SAK – Apache Commons Math – (Timeseries transforms) – A sprinkle of Java and a dash of Python

Semantic Flow Augmentation for the Automated Discovery of - PowerPoint PPT Presentation

Semantic Flow Augmentation for the Automated Discovery of Organizational Relationships Chris Strasburg, Harris T Lin, Nikolas Kinkel The Ames Laboratory {cstras,htlin,nskinkel}@ameslab.gov - Presenting Relationship Discovery Why does

Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we need Data Augmentation?

Population Based Augmentation Efficient Learning of Augmentation Policy Schedules Daniel Ho , Eric

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Galileo Local Element Augmentation System Galileo Local Element Augmentation System (GALILEA)

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

Federal Aviation Administration Overview Wide Area Augmentation System (WAAS) Status

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Tricks for Statistical Semantic Tricks for Statistical Semantic Knowledge Discovery: Knowledge

Overview of Automated Bus Consortium Program Accelerating automated technology for transit

Automated Reasoning: Some Successes and New Challenges Predrag Jani ci c

Russian energy policy in North-East Asia: prospects for

The Administrations Proposal SB 21 Barry Pulliam Managing Director Econ One Research, Inc.

POLICY ADVISORY COMMITTEE ENVIRONMENTAL ANALYSIS PHASE Agenda 1. Welcome and introductions. 2.

E NVIRONMENTAL I SOTOPIC S TUDIES ON M AJOR K ARST A QUIFER IN L AKE V AN B ASIN (T URKEY ) Harun

rsstmma Sdiptech 20 juni, 2017 Strictly private and confidential Delivery example

On Decidability of Prebisimulation for Timed Automata Shibashis Guha , Chinmay Narayan, S.

ETUC REFLECTION ON CHANGING ENVIRONMENT FOR PENSIONS 1 12/01/2015 85 MEMBER

Thoresen Thai Agencies Plc. Annual General Meeting of Shareholders No. 1/2011 2 March 2011

Semantic Flow Augmentation for the Automated Discovery of - PowerPoint PPT Presentation

Semantic Flow Augmentation for the Automated Discovery of Organizational Relationships Chris Strasburg*, Harris T Lin, Nikolas Kinkel The Ames Laboratory {cstras,htlin,nskinkel}@ameslab.gov * - Presenting Relationship Discovery Why does

Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we need Data Augmentation?

Population Based Augmentation Efficient Learning of Augmentation Policy Schedules Daniel Ho , Eric

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Galileo Local Element Augmentation System Galileo Local Element Augmentation System (GALILEA)

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

Federal Aviation Administration Overview Wide Area Augmentation System (WAAS) Status

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Tricks for Statistical Semantic Tricks for Statistical Semantic Knowledge Discovery: Knowledge

Overview of Automated Bus Consortium Program Accelerating automated technology for transit

Automated Reasoning: Some Successes and New Challenges Predrag Jani ci c

Russian energy policy in North-East Asia: prospects for

The Administrations Proposal SB 21 Barry Pulliam Managing Director Econ One Research, Inc.

POLICY ADVISORY COMMITTEE ENVIRONMENTAL ANALYSIS PHASE Agenda 1. Welcome and introductions. 2.

E NVIRONMENTAL I SOTOPIC S TUDIES ON M AJOR K ARST A QUIFER IN L AKE V AN B ASIN (T URKEY ) Harun

rsstmma Sdiptech 20 juni, 2017 Strictly private and confidential Delivery example

On Decidability of Prebisimulation for Timed Automata Shibashis Guha , Chinmay Narayan, S.

ETUC REFLECTION ON CHANGING ENVIRONMENT FOR PENSIONS 1 12/01/2015 85 MEMBER

Thoresen Thai Agencies Plc. Annual General Meeting of Shareholders No. 1/2011 2 March 2011

Semantic Flow Augmentation for the Automated Discovery of Organizational Relationships Chris Strasburg, Harris T Lin, Nikolas Kinkel The Ames Laboratory {cstras,htlin,nskinkel}@ameslab.gov - Presenting Relationship Discovery Why does