Auto-learning of SMTP TCP Transport-Layer Features for Spam and - PowerPoint PPT Presentation

Auto-learning of SMTP TCP Transport-Layer Features for Spam and Abusive Message Detection Georgios Kakavelakis, Robert Beverly, Joel Young Center for Measurement and Analysis of Network Data Naval Postgraduate School, Dept. Computer Science {gkakavel,rbeverly,jdyoung}@cmand.org December 8, 2011 USENIX LISA 2011 Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 1 / 39

Motivation Outline Motivation 1 Detecting Bot-Generated Spam 2 SpamFlow Architecture 3 SpamFlow Results 4 Conclusions 5 Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 2 / 39

Motivation Background Background 2011Q3 MAAWG email metrics: 89% of email is abusive. Huge volumes of spam, spammers quickly adapt to defenses. Whether user, provider, or vendor, spam is still a problem! Our Prior SpamFlow Work Asked: What is the transport (TCP/IP packet stream) character of spam? Are there differences between spam and ham flows? How to exploit differences in a way which spammers cannot easily evade? Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 3 / 39

Motivation Background Understanding SpamFlow } SMTP Content Not looking at IP header (reputation) data Filtering Not looking at data (conent) SpamFlow: TCP stream, incl timing FINs, RSTs, Duplicates, OOO pkts, } 3WHS timing, packet jitter, receive TCP SpamFlow window, maximum idle time, etc. (20 features in total) } Reputation IP Analysis Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 4 / 39

Motivation Background SpamFlow, previous work “ Exploiting Transport-Level Characteristics of Spam ” [BS08]: Utilize statistical machine learning methods Offline analysis Demonstrate > 90% accuracy, precision, recall (w/o content or reputation!) Correctly identify ≃ 78% of false negatives from content filtering alone Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 5 / 39

Motivation Background Obstacles to Deployment But ... Obstacles to Deployment: Lots of “plumbing,” i.e. exposing transport-features to higher layers Must be real-time Must be on-line Training a supervised learner USENIX LISA 2011 Contributions: Tackle these deployment issues, did the “hard” work Built an opensource SpamFlow plugin for SpamAssassin (And show performance numbers – it really works!) Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 6 / 39

Detecting Bot-Generated Spam Outline Motivation 1 Detecting Bot-Generated Spam 2 SpamFlow Architecture 3 SpamFlow Results 4 Conclusions 5 Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 7 / 39

Detecting Bot-Generated Spam Transport Behavior Transport-Level Characteristics of Spam Why does SpamFlow work? Two Observations on Spam Low Penetration: 1 due to existing filters, user ambivalence → huge volumes of spam Sending Method: 2 Botnets, dialup, etc. → Low asymmetric bandwidth, widely distributed Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 8 / 39

Detecting Bot-Generated Spam Transport Behavior Transport-Level Characteristics of Spam Combining Observations: Low Penetration + Sending Methods Volume + Methods + Economics → link/host resource contention MX MX MX aDSL BOT MX MX Congestion/Loss/Reordering MX MX Contention: Contention manifests as TCP/IP loss, retransmission, reordering, jitter, flow control, etc. Particularly with the large buffers in consumer cable/DSL modems. Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 9 / 39

Detecting Bot-Generated Spam TCP and SMTP Transport SMTP and TCP Transmission Control Protocol: mx.alice.com mx.bob.com EHLO mx.alice.com 200 Hellow Alice MAIL FROM: alice@alice.com 200 OK DATA: Simple Mail Transport Protocol (SMTP) uses TCP for transport Sequence of SMTP commands between Mail Transport Agents (MTAs) Mail contents are packetized How do Spam Connections Behave? Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 10 / 39

Detecting Bot-Generated Spam Building intuition How do Spam Connections Behave? ...or, a quick look at netstat RcvQ SndQ Local Foreign Addr State 0 0 srv:25 92.47.129.89:49014 SYN_RECV 0 0 srv:25 ppp83-237-106-114.:29081 SYN_RECV 0 0 srv:25 88.200.227.123:25068 SYN_RECV 0 0 srv:25 92.47.129.89:49014 SYN_RECV 0 0 srv:25 ppp83-237-106-114.:29084 SYN_RECV 0 0 srv:25 88.200.227.123:25068 SYN_RECV 0 0 srv:25 88.200.227.123:25069 SYN_RECV 0 0 srv:25 88.200.227.123:25070 SYN_RECV 0 0 srv:25 88.200.227.123:25074 SYN_RECV 0 0 srv:25 84.255.150.15:4232 SYN_RECV 0 25 srv:25 222.123.147.41:50282 LAST_ACK 0 28 srv:25 adsl-pool-222.123.:1720 LAST_ACK 0 31 srv:25 222.123.147.41:50152 LAST_ACK 0 15 srv:25 222.123.147.41:50889 LAST_ACK 0 9 srv:25 88.245.3.19:venus LAST_ACK 0 25 srv:25 78.184.155.70:1854 FIN_WAIT1 0 23 srv:25 190-48-30-225.spe:50920 FIN_WAIT1 0 23 srv:25 dsl.dynamic812132:48154 FIN_WAIT1 0 23 srv:25 ip-85-160-91-16.e:48093 FIN_WAIT1 0 23 srv:25 88.234.141.158:48389 FIN_WAIT1 0 23 srv:25 p5B0FBB5D.dip.t-d:11965 FIN_WAIT1 ... Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 11 / 39

Detecting Bot-Generated Spam Building intuition How do Spam Connections Behave? ...or, a quick look at netstat RcvQ SndQ Local Foreign Addr State 0 0 srv:25 92.47.129.89:49014 SYN_RECV 0 0 srv:25 ppp83-237-106-114.:29081 SYN_RECV 0 0 srv:25 88.200.227.123:25068 SYN_RECV TCP Stuck in States 0 0 srv:25 92.47.129.89:49014 SYN_RECV 0 0 srv:25 ppp83-237-106-114.:29084 SYN_RECV Stays in these states for 0 0 srv:25 88.200.227.123:25068 SYN_RECV 0 0 srv:25 88.200.227.123:25069 SYN_RECV minutes 0 0 srv:25 88.200.227.123:25070 SYN_RECV 0 0 srv:25 88.200.227.123:25074 SYN_RECV Half-open connections 0 0 srv:25 84.255.150.15:4232 SYN_RECV 0 25 srv:25 222.123.147.41:50282 LAST_ACK 0 28 srv:25 adsl-pool-222.123.:1720 LAST_ACK Remote MTAs that 0 31 srv:25 222.123.147.41:50152 LAST_ACK 0 15 srv:25 222.123.147.41:50889 “disappear” mid-connection LAST_ACK 0 9 srv:25 88.245.3.19:venus LAST_ACK 0 25 srv:25 78.184.155.70:1854 FIN_WAIT1 Remote MTAs that send 0 23 srv:25 190-48-30-225.spe:50920 FIN_WAIT1 0 23 srv:25 dsl.dynamic812132:48154 FIN and disappear FIN_WAIT1 0 23 srv:25 ip-85-160-91-16.e:48093 FIN_WAIT1 0 23 srv:25 88.234.141.158:48389 FIN_WAIT1 0 23 srv:25 p5B0FBB5D.dip.t-d:11965 FIN_WAIT1 ... Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 11 / 39

Detecting Bot-Generated Spam Building intuition What about RTT? ...building more intuition Received: from vms044pub.verizon.net Received: from unknown (59.9.86.75) From: "Dr. Beverly, MD" < b@ex.com > From: Erich Shoemaker < ried@ex.com > Subject: thoughts Subject: Repl1ca for you Dear Robert, A T4g Heuer w4tch is a luxury statement I hope you have had a great week! on its own. In Prest1ge Repl1cas, any T4g Heuer... Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 12 / 39

SpamFlow Architecture Outline Motivation 1 Detecting Bot-Generated Spam 2 SpamFlow Architecture 3 SpamFlow Results 4 Conclusions 5 Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 13 / 39

SpamFlow Architecture Plugin SpamAssassin Plugin So... we built it. Moving from research to production: MTA email Spam (postfix) Assassin msgid score SMTP features Traffic Classifier SF Plugin prediction msgid features pcap SpamFlow Model packets Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 14 / 39

SpamFlow Architecture Entering Traffic SpamAssassin Plugin Architecture: MTA email Spam (postfix) Assassin Email traffic enters the system, MTA passes to SMTP SpamAssassin. Traffic Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 15 / 39

SpamFlow Architecture Collecting Features SpamAssassin Plugin Architecture: MTA email Spam (postfix) Assassin Concurrently, SpamFlow daemon collects packets and SMTP produces per-flow Traffic features. pcap SpamFlow packets Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 16 / 39

SpamFlow Architecture Matching Emails and Flows SpamAssassin Plugin Architecture: MTA email Spam (postfix) Assassin SpamFlow plugin takes msgid a msg ID. SMTP Traffic SF Plugin pcap SpamFlow packets Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 17 / 39

SpamFlow Architecture Matching Emails and Flows SpamAssassin Plugin Architecture: MTA email Spam (postfix) Assassin Plugin communicates with SpamFlow msgid daemon via XML-RPC SMTP to query for msg ID. Traffic SF Plugin msgid pcap SpamFlow packets Kakavelakis, Beverly, Young (NPS) Auto-learning SMTP TCP Features for Spam LISA 2011 18 / 39

Auto-learning of SMTP TCP Transport-Layer Features for Spam and - PowerPoint PPT Presentation

Auto-learning of SMTP TCP Transport-Layer Features for Spam and Abusive Message Detection Georgios Kakavelakis, Robert Beverly, Joel Young Center for Measurement and Analysis of Network Data Naval Postgraduate School, Dept. Computer Science

Model for SMTP Use User SMTP Sender Receiver Commands/Replies SMTP SMTP File File and

DA(e)NEn lgen nicht Patrick Ben Koetter Carsten Strotmann TLS und SMTP 2 TLS und SMTP

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

The Transport Layer: TCP and UDP Jean Yves Le Boudec 2017 1 Contents 1. The transport layer,

11 Application Layer Application Layer DNS: Domain Name System Most commonly used names

Attacks on TCP 1 Outline What is TCP protocol? How the TCP Protocol Works SYN

TCP Pacing in Data Center Networks Monia Ghobadi, Yashar Ganjali Department of Computer

1 Transport Layer Transport Layer RTT Estimation RTT Estimation Basic Idea SampleRTT :

Transport Layer (TCP/UDP) Where we are in the Course Moving on up to the Transport Layer!

Transport Layer (TCP/UDP) Where we are in the Course Moving on up to the Transport Layer!

Transport Layer (TCP/UDP) Where we are in the Course Moving on up to the Transport Layer!

THE TRANSPORT LAYER Outline Transport layer in the Internet Multiplexing and

Transport Layer How TCP, UDP, and Ports fit into IP Layer 4: the Transport Layer Responsibilities

1 Mail access protocols DNS: Domain Name System SMTP SMTP access Domain Name System: People:

Chapter 3 Transport Layer Chapter 3: Transport Layer Our goals: learn about transport l

Chapter 8 Communication Networks and Services Transport Layer Protocols: UDP and TCP SYSC5201

Detecting Product Review Spammers using Rating Behaviors Itay Dressler What is Spam? Why

for Microsoft Office 365 Agenda Product introduction Features and benefits How it works

Webs of Trust in Distributed Environments Bringing Trust to Email Communication BSc.

Exploiting Gnther Bayler, Christopher Kruegel, Redundancy in & Engin Kirda Natural

Economics of Abuse Operations: Application to Hosting Matthew C. Stith September 28, 2016 San

Guidance for Macros in PowerPoints We use macros within PowerPoints to increase the interactivity

Disclaimer This presentation has been prepared by Commission staff to provide general information

Privacy and your business: An introduction to the Personal Information Protection and Electronic

Auto-learning of SMTP TCP Transport-Layer Features for Spam and - PowerPoint PPT Presentation

Auto-learning of SMTP TCP Transport-Layer Features for Spam and Abusive Message Detection Georgios Kakavelakis, Robert Beverly, Joel Young Center for Measurement and Analysis of Network Data Naval Postgraduate School, Dept. Computer Science

Model for SMTP Use User SMTP Sender Receiver Commands/Replies SMTP SMTP File File and

DA(e)NEn lgen nicht Patrick Ben Koetter Carsten Strotmann TLS und SMTP 2 TLS und SMTP

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

The Transport Layer: TCP and UDP Jean Yves Le Boudec 2017 1 Contents 1. The transport layer,

11 Application Layer Application Layer DNS: Domain Name System Most commonly used names

Attacks on TCP 1 Outline What is TCP protocol? How the TCP Protocol Works SYN

TCP Pacing in Data Center Networks Monia Ghobadi, Yashar Ganjali Department of Computer

1 Transport Layer Transport Layer RTT Estimation RTT Estimation Basic Idea SampleRTT :

Transport Layer (TCP/UDP) Where we are in the Course Moving on up to the Transport Layer!

Transport Layer (TCP/UDP) Where we are in the Course Moving on up to the Transport Layer!

Transport Layer (TCP/UDP) Where we are in the Course Moving on up to the Transport Layer!

THE TRANSPORT LAYER Outline Transport layer in the Internet Multiplexing and

Transport Layer How TCP, UDP, and Ports fit into IP Layer 4: the Transport Layer Responsibilities

1 Mail access protocols DNS: Domain Name System SMTP SMTP access Domain Name System: People:

Chapter 3 Transport Layer Chapter 3: Transport Layer Our goals: learn about transport l

Chapter 8 Communication Networks and Services Transport Layer Protocols: UDP and TCP SYSC5201

Detecting Product Review Spammers using Rating Behaviors Itay Dressler What is Spam? Why

for Microsoft Office 365 Agenda Product introduction Features and benefits How it works

Webs of Trust in Distributed Environments Bringing Trust to Email Communication BSc.

Exploiting Gnther Bayler, Christopher Kruegel, Redundancy in &amp; Engin Kirda Natural

Economics of Abuse Operations: Application to Hosting Matthew C. Stith September 28, 2016 San

Guidance for Macros in PowerPoints We use macros within PowerPoints to increase the interactivity

Disclaimer This presentation has been prepared by Commission staff to provide general information

Privacy and your business: An introduction to the Personal Information Protection and Electronic

Exploiting Gnther Bayler, Christopher Kruegel, Redundancy in & Engin Kirda Natural