BotGraph: Large Scale Spamming Botnet Detec5on Yao Zhao Yinglian - PowerPoint PPT Presentation

BotGraph: Large Scale Spamming Botnet Detec5on Yao Zhao Yinglian Xie * , Fang Yu * , Qifa Ke * , Yuan Yu * , Yan Chen and Eliot Gillum ‡ EECS Department, Northwestern University MicrosoK Research Silicon Valley * MicrosoK Coopera5on ‡ 1

Web‐Account Abuse ARack Zombie Spammer’s (Compromised host) Server User/Pwd Captcha solver RDSXXTD3 2

Problems and Challenges • Detect Web‐account Abuse with Hotmail Logs – Input: user ac5vity traces (signup, login, email‐sending records) – Goal: stop aggressive account signup, limit outgoing spam • Algorithmic challenge: – ARack is stealthy: individual account detec5on difficult – ARack is large scale: finding correlated ac5vi5es – Low false posi5ve and false nega5ve rate • Engineering challenge: – Large user popula5on: >500 million accounts – Large data volume: 300GB‐400GB data per month 3

The BotGraph System • A graph‐based approach to a@ack detecBon – A large user‐user graph to capture bot‐account correla5ons – Iden5fy 26M bot‐accounts with a low false posi5ve rate in two months • Efficient implementaBon using Dryad/DryadLINQ – Graph construc5on/analysis is not easily parallelizable – hundreds of millions of nodes, hundreds of billions of edges – Process 200GB‐300GB data in 1.5 hours with a 240‐machine cluster The first to provide a systemaBc soluBon to the new a@ack 4

System Architecture 1. History based algorithm to detect aggressive signups EWMA based change detection Aggressive Signup Signup signups botnets data Verification (ID, IP, time) & prune Sendmail (ID, time, # of recipients) data 2. Graph-based algorithm to find correlations Verification & prune Random graph Graph (ID, IP, time) generation based clustering Login Spamming Suspicious Login graph botnets clusters data 3. Parallel algorithm on 5 DryadLINQ clusters

Detect Aggressive Signups Large 25 predic5on Number of Signup Accounts Signup Count error 20 EWMA Prediction 15 Back to normal 10 5 Date 1-Jul 2-Jul 3-Jul 4-Jul 5-Jul 6-Jul 7-Jul 8-Jul 9-Jul • Simple and efficient • Detect 20 million malicious accounts in 2 months 6

System Architecture 1. History based algorithm on Signup detection EWMA based change detection Aggressive Signup Signup signups botnets data Verification (ID, IP, time) & prune Sendmail (ID, time, # of recipients) data 2. Graph-based algorithm on login detection Verification & prune Random graph Graph (ID, IP, time) generation based clustering Login Spamming Suspicious Login graph botnets clusters data 3. Parallelel Algorithm 7 on DryadLinq clusters

Detect Stealthy Accounts by Graphs • Observa5on: bot‐accounts work collabora5vely A user‐user graph to model behavior similariBes • Normal Users – Share IP addresses in one AS with DHCP assignment • Bot‐users 8

Detect Stealthy Accounts by Graphs • Observa5on: bot‐accounts work collabora5vely A user‐user graph to model behavior similariBes • Normal Users – Share IP addresses in one AS with DHCP assignment • Bot‐users – Likely to share different IPs across ASes 9

User‐user Graph User3 • Node: Hotmail account 2 ASes User1 • Edge weight: # of ASes of the shared IP addresses 4 ASes 5 ASes – Consider edges with weight>1 3 ASes User4 • Key Observa5ons User2 – Bot‐users form a giant connected‐component while User5 normal users do not 1 AS – Interpreted by the random User6 graph theory 10

Random Graph Theory • Random Graph G ( n , p ) – n nodes and each pair of nodes has an edge with probability p and average degree d = ( n ‐1) ∙ p • Theorem – If d < 1 , then with high probability the largest component in the graph has size less than O(log n ) No large connected subgraph – If d > 1, with high probability the graph will contain a giant component with size at the order of O( n ) Most nodes are in one connected subgraph 11

Graph‐based Bot‐user Detec5on • Step 1: detect giant connected‐components from the user‐user graph • Step 2: hierarchical algorithm to iden5fy the correct groupings – Different bot‐user groups may be mixed – Difficult to choose a fixed edge‐threshold – Easier valida5on with correct group sta5s5cs • Step 3: prune normal‐user groups – Due to na5onal proxies, cell phone users, facebook applica5ons, etc. 12

Hierarchical Bot‐Group Extrac5on G T=2 1st group 3rd group A B T=3 C D T=4 2nd E group 13

System Architecture 1. History based algorithm on Signup detection EWMA based change detection Aggressive Signup Signup signups botnets data Verification (ID, IP, time) & prune Sendmail (ID, time, # of recipients) data 2. Graph-based algorithm on login detection Verification & prune Random graph Graph (ID, IP, time) generation based clustering Login Spamming Suspicious Login graph botnets clusters data 3. Parallelel Algorithm 14 on DryadLINQ clusters

Parallel Implementa5on on DryadLINQ • EWMA‐based Signup Abuse Detec5on – Par55on data by IP – Can achieve real‐Bme detecBon • User‐User Graph Construc5on – Two algorithms and op5miza5ons – Process 200GB‐300GB data in 1.5 hours with 240 machines • Connected Component Extrac5on – Divide and conquer – Process a graph of 8.6 billion edges in 7 minutes

Graph Construc5on 1: Simple Data Parallelism � • Poten5al Edges – Select ID group by IP (Map) – Generate poten5al edges ( ID i , ID j , IP k ) (Reduce) • Edge Weights – Select IP group by ID pair (Map) – Calculate edge weight (Reduce) • Problem – Weight 1 edge is two orders of magnitude more than others – Their computaBon/communicaBon is unnecessary �

Graph Construc5on 2: Selec5ve Filtering 17

Comparison of Two Algorithms • Method 1 – Simple and scalable • Method 2 – Op5mized to filter out weight 1 edges – U5lize Join func5onality, data compression and broadcast op5miza5on 18

Detec5on Results • Data descrip5on – Two datasets • Jun 2007 and Jan 2008 – Three types of data • Signup log (IP, ID, Time) • Login log (IP, ID, Time) – 500M users and 200~300GB data per month • Sendmail log (ID, 5me, # of recipients) 19

Detec5on of Signup Abuse 20

Detec5on by User‐user Graph 21

Valida5ons • Manual Check – Sampled groups verified by the Hotmail team – Almost no false posi5ves • Comparison with Known Spamming Users – Detect 86% of complained accounts – Up to 54% of detected accounts are our new findings • Email Sending Sizes per Group – Most groups have a sharp peak – The remaining contain several peaks • False Posi5ve Es5ma5on – Naming paRern (0.44%) – Signup 5me (0.13%) 22

Possible to Evade BotGraph? • Evade signup detec5on: Be stealthy • Evade graph‐based detec5on – Fixed IP/AS binding • Low u5liza5on rate • Bot‐accounts bound to one host are easy to be grouped – Be stealthy (sending as few emails as normal user) Severely limit a@ackers’ spam throughput 23

Conclusions • A graph‐based approach to a@ack detecBon – Iden5fy 26M bot‐accounts with a low false posi5ve rate in two months • Efficient implementaBon using Dryad/DryadLINQ – Process 200GB‐300GB data in 1.5 hours with a 240‐ machine cluster Large‐scale data‐mining for network security is effecBve and pracBcal 24

Q & A? Thanks! 25

BotGraph: Large Scale Spamming Botnet Detec5on Yao Zhao Yinglian - PowerPoint PPT Presentation

BotGraph: Large Scale Spamming Botnet Detec5on Yao Zhao Yinglian Xie * , Fang Yu * , Qifa Ke * , Yuan Yu * , Yan Chen and Eliot Gillum EECS Department, Northwestern University MicrosoK Research Silicon Valley * MicrosoK Coopera5on 1

BotGraph: Large Scale Spamming Botnet Detection Web-account abuse attack recent spamming technic

MetaNet A botnet with Metasploit integration By : Matan Ramrazker, Guy Gelber What is a Botnet

Welcome to Storm ! The Storm botnet Reachability check Overnet (UDP) The Storm botnet

Jeffrey D. Ullman Stanford University Spamming = any deliberate action intended solely to

Botnets Leonidas Stylianou CS 682 23/04/2020 Lifecycle of a bot Infected host Botnet malware

An Open Botnet Analysis Framework for An Open Botnet Analysis Framework for Automatic Tracking

A Date with Data Botnet Command and Control Through Tinder A Date with Data Botnet Command and

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Complete(Genome(analysis:( Structural(varia5on(detec5on( ( SVDetect(tutorial(

& 1st large scale oauth stealing botnet & Secure delegation mechanism De-facto

Working Group 7: Botnet Remediation March 22, 2012 Michael OReirdan (MAAWG) Chair Peter

Take a deep breath: a Stealthy, Resilient and Cost-Effective Botnet Using Skype Antonio Nappa -

Dawn Song dawnsong@cs.berkeley.edu 1 What is a botnet? An army of compromised hosts

Challenges in Experimenting with Botnet Detection Systems Adam J. Aviv Andreas Haeberlen

BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic Guofei Gu, Junjie

TSO-Atom icity: TSO Enforcem ent for A Aggressive Program Optim ization i P O ti i ti

Monthly Webinar Series January 2020 Todays Agenda Announcements & Trial

Monthly Webinar Series July 2020 Todays Agenda Trial Updates/Reminders Sandi Cassard PCORI

Hi. Im Michelle Rise UP Michelle Parise With Purpose Coaching s h i t s i s m t

UCL PRO/CON Debate: Aggressive versus progressive therapeutic approach Dr. Shahin Moledina -

GCC/Clang Optimizations for Embedded Linux Khem Raj, Comcast Embedded Linux Conference &

Preventa(ve)Measures)for)School)Bullying Roz)Myers,)JD,)MA IIRP)Conference) October)2013 1

LOCKING CS 2550 / Spring 2006 Principles of Database Systems 10 Locking Alexandros

BotGraph: Large Scale Spamming Botnet Detec5on Yao Zhao Yinglian - PowerPoint PPT Presentation

BotGraph: Large Scale Spamming Botnet Detec5on Yao Zhao Yinglian Xie * , Fang Yu * , Qifa Ke * , Yuan Yu * , Yan Chen and Eliot Gillum EECS Department, Northwestern University MicrosoK Research Silicon Valley * MicrosoK Coopera5on 1

BotGraph: Large Scale Spamming Botnet Detection Web-account abuse attack recent spamming technic

MetaNet A botnet with Metasploit integration By : Matan Ramrazker, Guy Gelber What is a Botnet

Welcome to Storm ! The Storm botnet Reachability check Overnet (UDP) The Storm botnet

Jeffrey D. Ullman Stanford University Spamming = any deliberate action intended solely to

Botnets Leonidas Stylianou CS 682 23/04/2020 Lifecycle of a bot Infected host Botnet malware

An Open Botnet Analysis Framework for An Open Botnet Analysis Framework for Automatic Tracking

A Date with Data Botnet Command and Control Through Tinder A Date with Data Botnet Command and

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Complete(Genome(analysis:( Structural(varia5on(detec5on( ( SVDetect(tutorial(

&amp; 1st large scale oauth stealing botnet &amp; Secure delegation mechanism De-facto

Working Group 7: Botnet Remediation March 22, 2012 Michael OReirdan (MAAWG) Chair Peter

Take a deep breath: a Stealthy, Resilient and Cost-Effective Botnet Using Skype Antonio Nappa -

Dawn Song dawnsong@cs.berkeley.edu 1 What is a botnet? An army of compromised hosts

Challenges in Experimenting with Botnet Detection Systems Adam J. Aviv Andreas Haeberlen

BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic Guofei Gu, Junjie

TSO-Atom icity: TSO Enforcem ent for A Aggressive Program Optim ization i P O ti i ti

Monthly Webinar Series January 2020 Todays Agenda Announcements &amp; Trial

Monthly Webinar Series July 2020 Todays Agenda Trial Updates/Reminders Sandi Cassard PCORI

Hi. Im Michelle Rise UP Michelle Parise With Purpose Coaching s h i t s i s m t

UCL PRO/CON Debate: Aggressive versus progressive therapeutic approach Dr. Shahin Moledina -

GCC/Clang Optimizations for Embedded Linux Khem Raj, Comcast Embedded Linux Conference &amp;

Preventa(ve)Measures)for)School)Bullying Roz)Myers,)JD,)MA IIRP)Conference) October)2013 1

LOCKING CS 2550 / Spring 2006 Principles of Database Systems 10 Locking Alexandros

& 1st large scale oauth stealing botnet & Secure delegation mechanism De-facto

Monthly Webinar Series January 2020 Todays Agenda Announcements & Trial

GCC/Clang Optimizations for Embedded Linux Khem Raj, Comcast Embedded Linux Conference &