Correlating Low-Level Events To Identify High-Level Bot Behaviors - PowerPoint PPT Presentation

Correlating Low-Level Events To Identify High-Level Bot Behaviors Liz Stinson Matt Fredrikson Somesh Jha John Mitchell University of Wisconsin Stanford University Lorenzo Martignoni University of Milan

Our anti-inspirations • “Personal firewalls”: identify when an app is Too ambiguous connecting to the network • Host-level methods that inundate us with information (all registry accesses/changes, Too noisy; devoid of meaning file accesses/changes) without providing a higher-level assessment of what’s going on

Problem Statement • >5M “distinct, active” bot-infected machines detected between January - June, 2007 – “active”: carried out at least one attack – Symantec Threat Report, Volume XII • The *best* anti-virus signature scanners fail to detect anywhere from 30% to 50% of malware samples seen in the wild – NB: The best AV scanners may not be who you think they are…

Problematic Asymmetry Malware writers know they have the work(create_sig) >> work(create_variant) advantage here and they exploit it. • AV companies decide which undetected Tens of thousands of novel malware to create sigs for using triage; must malware variants created annually exceed some prevalence threshold

Existing behavior-based detection • Identify simple, mostly stateless “features” May identify incidental , rather (process execution characteristics); e.g. than fundamental behaviors – Which dir(s) does app live in? write to? App = shadow? – App survives reboot? Spawns/terminates other For ML-based approaches, may be processes? Is orphan? Hides? Its image has changed? � Traits malware have adapted to evade AV detect other ways to achieve same end (i.e. ways not included in model) • Statefully scan network packet contents • More general characterizations – Abstract: spyware monitors/reports user actions – Concrete: rootkits that load kernel modules

Broad spectrum. How to evaluate? • How effectively does this method distinguish malicious behavior from benign? • How thoroughly is target behavior captured? • How complex is the identified behavior? • How fundamental is the behavior to the malware’s purpose?

Goals • We want to identify high-level behaviors Sample bot commands – “downloading and executing a program” – “acting like a TCP server” http.execute <URL> <local_path> – “acting like a proxy” harvest.registry <reg_key> redirect <lport> <rhost> <rport> – “leaking sensitive data” startkeylogger • Bot-command-level actions • Via monitoring process execution • Distinguish malicious from benign instances of above by identifying if remotely initiated

tcp connection Example: Acting like a proxy tcp connection

Identifies ordering dependencies Not shown here edge constraints die operations socket duplication intervening irrelevant ops

Including parameters and constraints Constraints can be pre-conditions or post-conditions

tcp_client

We’ll focus on this Refining

(send_buf == recv_buf) • Too constrained; really want to express: the buffer that is sent is derived from a buffer that is received • Augment (add action to): on_match of net_recv set_tainted( recv_buf, sd2 /*taint label*/ ) • Change condition to: tainted( send_buf, sd2 /*taint label*/ )

Modified graph

.redirect <loc_port> <rem_host> <rem_port> Add constraints

“Language” our system exports • Set of high-level primitives that can be combined to describe interesting behaviors – tcp_client , tcp_server , net_send , net_r ecv , create_exec_file , … • Using these, we can detect: – Leak private data (reg key values, file contents, system info, …) – Download and execute a program – Send email – Proxy – Keystroke logging

Challenges • Posed by proprietary-OS environment – Opacity; identifying operations & constraints – Replicating OS semantics • Posed by syscall interposition generally • Posed by hypothetical attempts to evade – Split behavior across processes or across runs of the same application – Expropriate kernel functionality • e.g. raw sockets

Summary � Target the behaviors that make bots useful � Identify the essential ops in those behaviors � Use data-flow analysis info variously � Good initial results against bots o Including: rbot, agobot, dsnxbot, spybot, ... o Use bot commands as inspiration o Resilient to encryption of bot communications � Good initial results against benign progs o When testing against specifications that encode remote-control requirement o Performing user-input tracking

Correlating Low-Level Events To Identify High-Level Bot Behaviors - PowerPoint PPT Presentation

Correlating Low-Level Events To Identify High-Level Bot Behaviors Liz Stinson Matt Fredrikson Somesh Jha John Mitchell University of Wisconsin Stanford University Lorenzo Martignoni University of Milan Our anti-inspirations

How machine learning is used in www.coach-bot.de processing text Fabian Reich www.coach-bot.de

Hello World! The Microsoft Bot Ecosystem Bot Service / Bot Builder SDK Bot Builder SDK

Homework 1 Perl programming - TA bot release and demo attention Irc bot fighting screen shot

noribo fun engaging endearing sushi delivery bot dining out experience matters novelty

Correlating GSM and 802.11 Hardware Identifiers LCDR Jeremy Martin, LT Danny Rhame, Dr. Robert

Holometer Holometer results and status in correlating twin 40m interferometers results and

Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables Marius Granns

Correlating Events with Time Series for Incident Diagnosis Ricardo Reimao Idea: Identifying

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

No CDN On-net Off-net Deep off-net User Experience Low Medium High Very High

Internet Society Chapters Advisory Council Topics brought before the BoT BoT meeting, Panama

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

1/37 Lesson: How I Learned to Stop Worrying and Love the Bot 2/37 Lesson: How I Learned to Stop

Designing Empathetic Responses Example Bot : ...This will create an OS to best fit your needs.

ANATOMY OF A SERVERLESS GITHUB BOT How we built a serverless GitHub bot using Azure for the

Real Real- -Time Systems Time Systems Low- Low -level programming level programming Low-

Fast Parallel Longest Common Subsequence with General Integer Scoring Support Adnan Ozsoy , Arun

Unit 5 State Machines 5.2 What is state? You see a DPS officer approaching you. Are you

A Broad View of the Ecosystem of Socially Engineered Exploit Documents Stevens Le Blond,

Closing Town Hall Courtney Shurtleff | Director, Alumni Relations Merritt Crowley | Vice

Welcome Clients of Mariner Wealth Advisors Cybersecurity Education Series Securing Personal Data

Sound in the broader urban context Ronny Klboe Background 1980s Norwegian

Engagement & Motivation Across Learning Environments Benjamin Bell, Ph.D. Benjamin Nye, Ph.D.

CVD Congressional Visit Day (Spring Semester) What? Travel to D.C Meet with Senators

Correlating Low-Level Events To Identify High-Level Bot Behaviors - PowerPoint PPT Presentation

Correlating Low-Level Events To Identify High-Level Bot Behaviors Liz Stinson Matt Fredrikson Somesh Jha John Mitchell University of Wisconsin Stanford University Lorenzo Martignoni University of Milan Our anti-inspirations

How machine learning is used in www.coach-bot.de processing text Fabian Reich www.coach-bot.de

Hello World! The Microsoft Bot Ecosystem Bot Service / Bot Builder SDK Bot Builder SDK

Homework 1 Perl programming - TA bot release and demo attention Irc bot fighting screen shot

noribo fun engaging endearing sushi delivery bot dining out experience matters novelty

Correlating GSM and 802.11 Hardware Identifiers LCDR Jeremy Martin, LT Danny Rhame, Dr. Robert

Holometer Holometer results and status in correlating twin 40m interferometers results and

Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables Marius Granns

Correlating Events with Time Series for Incident Diagnosis Ricardo Reimao Idea: Identifying

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

No CDN On-net Off-net Deep off-net User Experience Low Medium High Very High

Internet Society Chapters Advisory Council Topics brought before the BoT BoT meeting, Panama

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

1/37 Lesson: How I Learned to Stop Worrying and Love the Bot 2/37 Lesson: How I Learned to Stop

Designing Empathetic Responses Example Bot : ...This will create an OS to best fit your needs.

ANATOMY OF A SERVERLESS GITHUB BOT How we built a serverless GitHub bot using Azure for the

Real Real- -Time Systems Time Systems Low- Low -level programming level programming Low-

Fast Parallel Longest Common Subsequence with General Integer Scoring Support Adnan Ozsoy , Arun

Unit 5 State Machines 5.2 What is state? You see a DPS officer approaching you. Are you

A Broad View of the Ecosystem of Socially Engineered Exploit Documents Stevens Le Blond,

Closing Town Hall Courtney Shurtleff | Director, Alumni Relations Merritt Crowley | Vice

Welcome Clients of Mariner Wealth Advisors Cybersecurity Education Series Securing Personal Data

Sound in the broader urban context Ronny Klboe Background 1980s Norwegian

Engagement &amp; Motivation Across Learning Environments Benjamin Bell, Ph.D. Benjamin Nye, Ph.D.

CVD Congressional Visit Day (Spring Semester) What? Travel to D.C Meet with Senators

Engagement & Motivation Across Learning Environments Benjamin Bell, Ph.D. Benjamin Nye, Ph.D.