EPL682 82 Advance ced d Security ty Topics Paper Reviews Name: - - PowerPoint PPT Presentation

epl682 82 advance ced d security ty topics
SMART_READER_LITE
LIVE PREVIEW

EPL682 82 Advance ced d Security ty Topics Paper Reviews Name: - - PowerPoint PPT Presentation

EPL682 82 Advance ced d Security ty Topics Paper Reviews Name: Ioannis Yiangou Instructor: Dr. Elias Athanasopoulos Date: 27 February 2020 Term: rm: Irrelevant online content sent to numerous users Forms: rms: Emails, Social


slide-1
SLIDE 1

EPL682 82 – Advance ced d Security ty Topics Paper Reviews

Name: Ioannis Yiangou Instructor:

  • Dr. Elias Athanasopoulos

Date: 27 February 2020

slide-2
SLIDE 2

 Term:

rm: Irrelevant online content sent to numerous users

 Forms:

rms: Emails, Social Media posts

 Goal

al: Lure unsuspecting users to “read” it

 Purpose:

pose: Advertising, spreading malware, phishing

 Moti

tivation ation: : Great research interest in studying SPAM

  • mechanisms, defenses, behavior, trends etc.

 2 papers studying SPAM, as propagated by two major mediums:

  • “Spamalytics: An Empirical Analysis of Spam Marketing Conversion”

 Email il SPAM AM

  • “@spam: The Underground on 140 Characters or Less”

 Soci cial Media ia SPAM (post sts/twe /tweets ets on

  • n Twit

itter) ter)

slide-3
SLIDE 3

Focus: E-mail SPAM

slide-4
SLIDE 4

Spam structure is “unclear”:

  • Not much is known about cost to send, conversion rate or profit
  • REASON

ON: “Underground” nature

 No transaction evidences  Spammers do not fill formal financial reports anywhere  Campaigns act entirely online etc.

  • SOLUTION:

UTION: Become a spammer yourself!

 Build an e-commerce site & market it via spam  Record sales  Conversion rate (how many “ads” turned to “purchases”)  Become a convincing spammer  Use technologies used by spammers  utilize botnets for email distribution  affect proxy responses etc.

  • Botne

net Infiltrati tration

  • n:

 Authors used an existing botnet to use a part of its spam  Redirected users to their own (harmless) servers, instead  Took measurements

slide-5
SLIDE 5

 “Storm” Bot

  • tnet
  • Peer-to

to-peer eer botnet with available spam agents

  • Propagat

agates es spam: directs users to download executables from a specified web site

  • Hiera

rarch chy: y:

 Worker bots:

 Request works from higher levels  Receive orders & send spam

 Proxy bots:

 Link Workers w/ Master servers  Give status reports

 Master Servers:

 Directed by the Bot Master  Give commands, workloads  Interpret status reports

slide-6
SLIDE 6

 Spam

am workloa kload models: dels:

  • “Orders” given to Worker bots, by Master servers
  • Forwarded by proxies
  • Characteristics:

 Spam m templ plat ates

 polymorphic messages  can bypass spam filtering. Written in a macro language, loaded with info such as “target mail addresses”, “IP addresses”, date & time etc.

 Delivery ivery list of email addresses  targets  Diction ionaries aries with info needed for “spam templates”

 IDEA:

A: Botnet Infiltr ltrat atio ion n

 Gain access into “Command & Control” (C&C) network  C&C channels: s:

 Located between Workers & Proxies  All spam requests & delivery reports pass from those channels first  Opport

  • rtun

unity ty to monit itor

  • r & proces

cess s spam m acti tivit ity

slide-7
SLIDE 7

 APPROACH: Rewrite

write C&C Protoco

  • col:
  • Elements:

 Click-ba based sed netwo work k element nt:

 Adds a destination header to flowing messages, in C&C  Message destination changed to IP address given by authors

 User-space ce Proxy Server er :

 Impersonates a valid proxy bot  Receives connections for specified address  Forwards those connections to the Click-based element  From here, C&C traffic can be parsed & processed, as wished

slide-8
SLIDE 8
  • Created many email

l accounts nts at different ent commercial providers ers:

 e.g. Yahoo

  • Tested spam delivery:

 Filte ltered red them them using Spam Filtering products:

 Ensure e spam can be passed successfully

 Set Storm m worker bots to send spam to them:

 Append those accounts to the delivery list of every workload  Remove references to those accounts from every report  Ensure Bot master does not notice authors’ changes

 Check eck accoun unts ts for spam m messa sages ges received by authors’ campaigns

slide-9
SLIDE 9

“Do users visit sites advertised in spam? If yes, how often?”

Authors launched two different spam campaigns  created one site for each

Monitored activity (i.e. “visits”) to find out

CAMPAIGN GN #1: “Pharmaceutical Campaign”

 Design:

ign: Identical to the original one

  • Same naming convention + identifier at

the end of URL

  • Similar UI

 No functi

tional

  • nality:

ty: protect clients

 Log all accesses

  • 1 purchase attempt = 1 conversion
slide-10
SLIDE 10

CAMPA MPAIGN IGN #2: “Malware Self-Propagation Campaign”

Decoys ys into downloading “postcard reader” software  Hidden Malware

Desig ign: Looks & feels like a legitimate site

No No functiona ctionality lity: protect clients  Links direct to harmless executables

Services ices: 3 (harmless) executables to download

  • If run: Send HTTP POST request to authors’ server

Log access sses es: HTTP POST lets authors know if downloaded file was executed

  • 1 execu

cution ion = 1 conv nvers ersion

  • n
  • Users might download but not execute file
  • Anti-viruses might block execution etc.
slide-11
SLIDE 11

 Not all visits to the Web sites are conversions  Automated & semi-automated processes visit:

  • Pure Web crawl

wlers: visiting without interacting

  • “Honey-client” system

ems: collect info

  • Securit

rity researc rcher ers: working on identifying new malware

 SOLUTIO

OLUTION: Filter out such visits

  • Heurist

ristics: Identify visits with “crawler” behavi

avior

  • r

 E.g. trying to: perform well, enhance spam defense, ensure quality measurements, retrieve info

  • Blacklist

st them!

slide-12
SLIDE 12

 Heuris

uristi tica cal Black ckli listin sting:

  • Hosts blacklisted for doing the following:

 Accessing Pharmacy site with URL missing the identifier  Accessing Web crawling instruction files (e.g. txt containing URLs)  Attempting to exploit sites for information retrieval  Disabling costly features (e.g. Javascript, embedded images)  Paying targeted visits

 e.g. visiting the pharmacy site, with the same IP, multiple times using different unique identifiers  Could be taking measurements/studying spam mechanisms

 Tracking updates

 e.g. downloading post-card files more than 10 times

 Accessing workload delivery lists & visiting all featured IP addresses

slide-13
SLIDE 13

E-mails s sent

(per hour, for each campaign) * “April Fool” campaign is similar to the postcard one (self-prop ropaga gation ion

  • f malwa

ware re using ng a spam postcard tcard site, te, but only near r the 1st

st of April

ril).

E-mail mail message sages send d & Workers rs used

(per hour, on each day of campaign)

slide-14
SLIDE 14

 8 proxie

ies s used

 Most workers

ers only connecte cted:

  • Once to proxies
  • To a single proxy

 Few cases (90 workers) connected to all

proxi xies

 Most connec

ections tions to proxie ies, s, from a single le work rker: er: 269

  • An academic network in North Carolina, USA 

“Infected” 19 times

 Average

age Connec ection tion Duration ion: : 40 minutes

 Many Connection cases (40%) did not even

exceed 1 minut ute

 Longe

gest st Connection ction Duration: ion: 81 hours

3 most targeted ted domains

slide-15
SLIDE 15

 Shows the whole process of spam distribution  From workers receiving target e-mail addresses, to user conversion  Shows how

  • w many

any of

  • f the

the in init itial ally ly in inten tende ded targe argets ts wi will rem remain ain un un- filtered red in in all stages, up up until their conversion

  • n
slide-16
SLIDE 16

 STAGE

GE A:

  • Action: Worker bots receive target e-mail addresses
  • Filter: Some addresses might be invalid or blacklisted
  • Problem: Such addresses will not receive spam messages
slide-17
SLIDE 17

 STAGE

GE B: B:

  • Action: Target e-mail addresses receive spam messages from worker bots
  • Filter: Anti-Spam Mechanisms of E-Mail Provider
  • Problem: Many of those spam e-mails are blocked
slide-18
SLIDE 18

 STAGE

GE C:

  • Action: Mails which survived anti-spam filtering end up to inbox
  • Filter: User may ignore or delete spam mails
  • Problem: Many emails failed to persuade users to visit/”convert”
slide-19
SLIDE 19

 STAGE

GE D:

  • Action: User opens spam e-mails and visits advertised URLs
  • Filter: Many visiting users will not purchase anything
  • Problem: No conversion, despite visiting
slide-20
SLIDE 20

 STAGE

GE E:

  • Action: User “converts”  buys from Pharmacy/ executes Postcard malware
  • Filter: Many users are “crawlers”  no real intention to “convert”
  • Problem: Many of the final conversions are not real “conversions”
slide-21
SLIDE 21

 OBSE

SERVATI VATIONS ONS:

  • Many spam messages are filtere

red by each stage of the pipeline

  • Way too small number of spam messages “survive

ve” the full Pipeline process

  • Conver

ersio sion Rates: Extremely mely low (less than 0.0001%, in all campaigns)

slide-22
SLIDE 22

 Time:

  • From

From: when a worker bot sends spam to a user

  • To

To: when user “clicks” on the spam URL & visits the site

 Importance

nce:

  • Time-to-click lets us know “how long”

scam must stay available

 Longer it takes users to act  Longer it needs to stay available to profit

Observ rvat ations ions & & Conclus clusions ions

  • Similar behavior of “Users” &

“Converters” lines

  • Cannot conclude about user

conversion, using time-to-click (little correlation)

  • 10% of times-to-click take a week or

month

  • Spam sites must be available for

long periods of time for good profit

slide-23
SLIDE 23

 Compos

  • mposit

ite e Bloc

  • cking

king List (CBL): L):

  • Over 4 million addresses which send spam because of an infection
  • Constant Update (every 30m)  up to date
  • Monitored to determine:

 Blacklisted worker bots  Relationship between blacklisting & spam activity

 OBSERVA

ERVATI TIONS ONS for Wo Workers rs:

  • Many blackliste

isted d before re even receiving ving orders (workload load)

  • Many who sent

t success ssful l delive very ry reports rts got blackliste klisted:

 Average time needed = 1.5hrs too soon

  • Many not blackliste

listed, did not sent success ssfu ful l delive ivery ry reports rts

 CONCL

NCLUSIO USIONS: NS:

  • Workers’ Botnet spamming

mming activi ivity ty, did lead to their early detecti ction

  • n & blacklistin

isting

  • Spam

m Campaign aign targets affect ct the blacklisting isting process

  • e.g. successful spam delivery

triggers the blacklisting process

slide-24
SLIDE 24

 RATIO

ATIO IMPOR ORTAN ANCE: E:

  • Some domains use powerful anti-spam filters  spam cannot get to the

domain, even when not blacklisted

  Small Per-Do Domain n Deliver very Rate

  • Many domains do not take effective measures against spam

  Large Per-Do Domain Deliver ery Rate

slide-25
SLIDE 25

 Authors studied:

  • Possible factors influencing response to spam
  • Worldwide trends of responses to spam

 Gene

neral al Conclu nclusion sion: : Users worldwide DO respond to spam

slide-26
SLIDE 26

 Responded-to-Sent Ratio (by target)

  • Most

st targets ts: USA (hotmail.com, etc.)

  • Most

st respon

  • nse

ses: s: India, France, USA

  • Highe

hest st Respon

  • nse

se Rates: s: India, Pakistan, Bulgaria  respo pond nd almost st to all & anythi hing ng they receive ive

  • Lowest

st Respon

  • nse

se Rate: USA, Japan, Taiwan  respo pond nd to a really y small ll fracti ction

  • n of spam

m emails ils they receive ive

  • Interes

resting ng fact about t USA: Many resp spon

  • nses

s to spam (in numbers) + Lowest st Respon

  • nse

se Rate (in percentages)

  • The many respon
  • nse

ses s are still too few, compared ared to the volume of receive eived d spam m that gets s ignored red

Observations ervations & Concl nclusio usions: ns:

slide-27
SLIDE 27

 Responses to the 2 campaigns, per country

Observ ervations ations & Concl nclusi usions: ns:

  • Equal

al Response: Most of the countries (approaching median)

  • Highest

st Response se (Self-Prop Propagat agation ion Spam): USA (lower part, largest distance from median)

  • Highest

st Response se (Pharmace armaceuti tical cal Spam): France (upper part, largest distance from median)

slide-28
SLIDE 28

 Different response rates in countries worldwide occur:

  • BECAU

CAUSE SE OF:

 Structural causes

 e.g. quality/ availability of spam filtering, public anti-spam education & awareness

  • NOT

OT BECAUS AUSE E OF:

 Interests of cultural/national nature, for specific products advertised by spam

slide-29
SLIDE 29

1st large-scale quantitative study of spam delivery & conversion

RESULTS: LTS:

  • 26 Days  350 million spam messages sent  ONLY 28 “conversions”:
  • Conversion Rate <= 0.00001%
  • Avg. Purchase Price close to $100
  • Total (26 days)= $2,731.88  more than $100/day  respectable
  • Authors used small percentage of Storm (only 1.5% of workers)

 Actual total amount is much bigger

Costs ts are pretty high : Domain Registration, Hosting Fees, Spam Distribution Costs etc.

Storm says it profits: NO use of 3rd party services  no commissions

Main reason Spam Distributio tions s are expensive: sive: Involvement of 3rd party services

Making good profit t require res s organizat zation

  • ns that cover all s

stages s of spam distributi tion

  • n
  • Eliminate need for 3rd-party services

Profit can be very limited  spammers must ensure they:

  • Know every

ryth thing ing about the status of their campaigns

  • Are always ready to respond, economically, to any new anti-spam measures
slide-30
SLIDE 30

Focus: Twitter ter SPAM

slide-31
SLIDE 31

 Over 106 million users  Over 1 billion posts/month  2 million URLs posted  blackl

klis isted ted

 Suc

ucce cessf sful ul spam m campai paigns gns in Twitter er requir ire: e:

  • A Twitter Account
  • Enough unique URLs to:

 Post the same spam links excessively  Avoid spam detection measures

 Relatively high

gh click ckth throu rough gh rate (0.13%)

  • REASON:

ASON: no filte ters rs util iliz ized ed

  • Users are more vulnerable on Twitter, than on Email

 Acti

tions ns limit ited ed to:

  • Prevention
  • Spam activity measurement
  • User feedback (e.g. reporting)
  • Deletion of accounts & messages suspected as “spam”
slide-32
SLIDE 32
  • Twe

weets ets:

Status update post

  • Followers

lowers: Set of users who receive a tweet

  • nce it is posted
  • Friend

ends:

Set of users an account subscribes to, to access their status updates

  • Menti

tions

  • ns: References a user directly (e.g.

@username, in a tweet)

  • Retwee

eets ts: “RT @username” or “@username” 

text from another user’s profile

  • Hashta

htags: Tags to arbitrary topics (“#topic”)

 If a topic becomes popular:  Appears in “trending” list  Tweets  syndicated to all of Twitter

  • User

r timelin elines es: : History of tweets posted by

friends

slide-33
SLIDE 33

 Authors used:

  • URL blacklists

ts for spam identification

  • Custom Twitter

ter monitorin ing g infrastru tructure cture

 Analyzes spamming techniques:

 Generation of click traffic  Attraction of victims  Masking of suspicious spam pages from users

 How

  • w Twitter

itter Monitori nitoring Infrast astrac acture ture Works: ks:

  • Tweet appears in URL stream  collects URL
  • Web crawling: from collected URL, to final landing page
  • Record: path + URLs

“Unmasks” URLs  Reve veals ls final landing page

slide-34
SLIDE 34

 Tweets

weets coll llec ected: ted: 200-plus million  3-plus million spams

  • 7 million per day, for 1 month

 Total

  • tal URLs

s crawled wled: 25 million (2 million blacklisted)

 Samp

ample les from m Twitt tter er Stream: am:

  • Rando

dom m samp mples les: used to find how many tweets contain URLs

  • Collectio

tion n of all tweets ts w/ URLs: s: used for all other measurements

 Coll

llectio ction of complet mplete e Histor

  • ry of over 120,000 public accounts:

 Coll

llectio ction of Click ck-Throu rough gh statisti tics cs & C Click ck Stream am data:

  • Downloaded from URL shortening services  records of served URLs
  • Used for identifying successful spam campaigns & their traffic
  • Successful tweet = very likely to be clicked by a follower
slide-35
SLIDE 35

 URLs collected & checked against 3 popula

ular black ckli lists ts:

  • Google

le Safebro browsin sing: for malware & phishing pages

  • URIBL:

BL: for domains found in spam emails

  • Joewein: similar activity to URIBL

 If page marked

ed as spam: : Involved spam tweets & users are suspicious

 Othe

her techni hnique ques:

  • Whitelists

elists: verify blacklisted pages which are not entirely spam hosts

  • Manu

nual al Classifi sification cation: click & decide if its spam, based on content

slide-36
SLIDE 36

 Classification into spam categories (if possible)

  • Roughly 50% :

NOT categorized  not popular

  • Remaining 50%:

categorized successfully

 RESU

ESULT LTS:

Twitter-specific spam categories exist

slide-37
SLIDE 37

 Call

ll outs/M s/Mentio ions (@):

  • Refer to a person directly
  • More likely to click in a personal message
  • Spammers use them to communicate with non-followers
  • Least popular

 Retwe

tweets ets (RT): ):

  • 4 sources:

 Purchased by spammers from respected Twitter members  Spam accounts retweeting other spam  Hijacked  retweet content, after injecting it with spam URLs  Unintentional spam retweeting

slide-38
SLIDE 38

 Tweet

weet hijacking: acking:

  • Hijack tweets of others, load with spam URLs & retweet
  • Exploitation of user trust  hijack famous users & retweet
  • Mostly: phishing/malware retweets, less frequently for spam

 Trend

end sett tting: ing:

  • Hashtags (#) simplify content searching
  • Enough tweets of the same hashtag  “trend”
  • Spammers attempt to create “trends”  14% set by spammers
slide-39
SLIDE 39

 Trend

end hijac acking king:

  • Spammers can utilize currently trending topics:

 Attachment of “trends” to a spam message  Searching for the topic  see spam message  Mixed with other posts  spam goes unnoticed

  • No need for followers  “trending” topics find crowds
  • 86% of trends used to spam also appear in benign tweets
slide-40
SLIDE 40
  • GOAL

AL:

 Analyze use of Twitter- specific features in spamming

  • REQ

EQUIRE UIRES: S:

 Distinguishing use in spam tweets, compared to regular tweets

  • RESULTS

ESULTS:

Spam Blacklist VS Tweets with URL No significant difference  Tweets ts with URL are most most likely spam! m!

slide-41
SLIDE 41

 Clickthrough statistics of URL shortening services

  • How Clickthrough is related to features like followers & tweet behavior?

 Most of URLs receive no clicks  Those that do, get over 1.6 million visitors  Spam on Twitter r is success ssfu ful! l!

slide-42
SLIDE 42

Relation of Clickthrough & features (‘ρ’):

 Small ll ‘ρ’ ratio  Barel ely Related  Big ‘ρ’ ratio  Largel ely Related

Large Correlations (ρ > 0.5)

  • Account

nts involve lved: d:

Use many accounts  more clicks

  • Followers
  • wers receivin

iving a link:

Send to more people  more clicks

  • Use Hashtags “#”:

Involve/create “trends”  more clicks

  • Retweets w/ Hashtags “#, RT”: Notify more people w/ a “trend”  more clicks

UNEXPECTED: Small Correlation (ρ = .28):

  • # of times

s spam m is tweete ted: d:

Re-posting more does NOT also mean more clicks

slide-43
SLIDE 43

Effectiveness ectiveness of attracting followers = Clicks occurring, out of all the “exposures”

R = T T x x F F, where:  R: R: reach, total number of exposu sure res to spam  T: T: total tweets sen sent  F: F: number of followers rs exposed ed to each tweet

Ratio of clicks ‘ρ’ / Reach ‘R’ = Effectiveness

 0.13% of spam tweets get visits  much higher, compared to email (MAX 0.006%)

 Twitter: better “spam” spreading platform, than email

Why Twitter might be more effective than email, for spamming?

  • Users only see 140 characters in a tweet  decide if URL is spam: Lack of info

fo

  • Users implicitly trust their Twitter friends: Naivety

ty

  • Less

ss time e cost st: : Only a server involved (unlike email which uses many bots)

  • Weak spam filt

lteri ering ng

slide-44
SLIDE 44

Two categories:

  • Career

r spamm mming ng account nt

  • Compro

promise mised d account nt

 Career

r spamming g accounts unts:

  • Created solely for promoting spam
  • 2 Tests to identify career spamming accounts out of all accounts
  • One that analyzes tweet timing

 “χ2 test on timestamp”

  • One that measures the entropy (=amount of useful information)

 “Tweet text & link entropy test”

slide-45
SLIDE 45

“χ2 test on timestamp”

  • IDEA:

A: Legitimate account tweets follow a uniform (Poisson) process

  • STEPS

PS:

 Examine tweet timestamps for posting patterns (mins & secs)  Check if posting behavior is a uniform distribution  If YES : Likely an automatio ion  Posts s at standard ard times  Likely y to be a career r spam account

 “Tweet text & link entropy test”

  • IDEA:

A: Users who consistently tweet same content  probably spammers

  • STEPS

PS:

 Examine tweet history of each spam account  “Binning” of text & URLs (similar to hashing)  distribution  Cases of binning:  No repetit itions s = High Entropy py  Important/new information  Probab bably y not a spammer r account nt  Strong ng repetit itio ions s = Low Entropy py  Unimportant/old information  Probab bably y a spammer r account  Uniform repetitions = Average Entropy  No conclusions

slide-46
SLIDE 46

 Running χ2 & Entropy

ropy tests

 RESULTS

TS (for sampled accounts) nts):

  • If only 2 spam messages in total  small spam activity
  • Rest of their tweets:

 harmless URLs OR OR  text without links  no “career” at spamming

  • Tweets posted at random intervals  no automations used
  • CONCLUSIO

CLUSION: N:

 Majority

  • rity of account

unts s tweeting spam URLs are not career r spamm mmer ers, but more likely ely, , comprom

  • mis

ised ed accoun unts ts

 Spammers prefer to steal accounts & tweet from there, instead of creating accounts exclusively for running spam campaigns

slide-47
SLIDE 47

 Comprom

romised ed accounts nts:

  • Initially created by a legitimate user, but later stolen

en through fraud (e.g. phishing, malware, password guessing)

  • Benefit

efits spammers:

 Exploit user trust

 Use a victim’s reputation to promote spam to followers

 Avoid need of creating new account

 Build reputation & concentrate followers

  • Intenti

tion

  • nal VS. Unintention

tional

 Sometimes, users mistakenly tweet URLs, unaware that they were spam  Non-career spammers tweet at least 20 spam URLs  Too much + too frequen uent t to be “mistakes”  compromi romise sed accou count nt

  • Combating Account

t Compromisi sing: g:

 Twitter detects it & notifies the legitimate owner  Owner identifies suspicious activity early & acts

slide-48
SLIDE 48

 Automa

  • mation

tion tools s (e.g. HootSuite3, twitterfeed4)

  • Pre-scheduling of tweets at specific intervals

 Thir

ird-Par Party ty Access applications tions

  • Make spamming possible, by using

compromised accounts

 Typica

ical l Desktop Applications tions (e.g. Twitter application)

 Many spam tweets posted from

Third-Party Access Applications

  • Strengthens findings:

 Most spammers use compromised accounts

slide-49
SLIDE 49

 Spam

m Campai aign: gn:

  • Accoun
  • unts

ts with same spam-related goals ls

 Ident ntificati fication

  • n: post blacklisted pages in common

 Most st usual l setup up: 1 Twitter Account + 1 Landing Page  Clusteri

stering ng URLs into campaigns aigns:

  • Impo

porta rtance: nce: Finds pages and accounts participating in each campaign

  • Algo

gorit rithm:

 Find all blacklisted landing pages posted by each account  Consider each account as an individual campaign  For each pair of differen rent campaigns: check for at least 1 same page

 If exists: Cluster in the same campaign

 Repeat until: No other same pages found between each pair of campaigns  Return: pages & accounts for each campaign

slide-50
SLIDE 50

 Clusterin

ustering g Results: ults:

 Obser

servation ations: s:

  • More than 11% of campaigns  more than 1 account collaborating
  • More than 13% of campaigns  more than 1 landing page involved

Final nal Clust sters rs  Init itia ial l Pages es 

slide-51
SLIDE 51

 Interest

terestin ing g campaig mpaigns identifie ntified by clust usteri ering:

  • Phishi

hing ng for followers: :

 Servi vice: ce: Websites supposedly provide users with followers  Requi uirem rement nt: users give account credentials  Resu sult: lt: compromise accounts  Analysis: ysis:  21,284 accounts tweeted any of the 1,210 spam URLs  12 different domains  Users using the same hashtags  80% from each domain (Sub-campaigns)  x2 & entropy tests results: 88% of accounts  compromised

  • Neste

ted URL shortening ng

 Continuous shortening of already shortened URLs  “URL compression”  Ident ntified fied by Cluste teri ring: ng: Multiple links  finally clustered at the same landing page  Use of multip iple le redirecto ctor r servic vices:

 Crawler visited many shortening sites before reaching the landing scam page  8 different URL shortening sites found to have been used

 Unclear lear motivat ivation

  • n: maybe to avoid spam filtering on shortening services
slide-52
SLIDE 52

 Interest

terestin ing g campaign mpaigns identifie ntified by clust usteri ering:

  • Personal

nalized ized mentions

  • ns:

 One spam campaign (host: http://twitprize.com)  Clustering results: 1,850 accounts & 2,552 unique affiliate URLs, involved  Target gets victims ims by:  Personally mentioning the user  Adjusting URLs to personalize the greeting message  Promising prizes  Featu atures res:  URLs: s: shortened & customized to fit victim’s info  directly to landing page  99% of tweets ts: retweets or mentions & contain usernames  Personalization  Accounts success ssful ully ly pass s entrop

  • py tests

ts  REASON: “perso sonal nalizat ization ion”  Different usernames in each tweet  NO repeated tweets = NO spam  Link customized for each user  NO repeated URLs = NO spam

slide-53
SLIDE 53

 Interest

terestin ing g campaig mpaigns identifie ntified by clust usteri ering:

  • Buying

ng Retwe weets ts:

 Services sell access to followers by retweeting messages  Not a scam, as a service BUT employed by spammers  55 accou

  • unt

nts: s: re-tweeted both malware & scams  x2 test t resul ults ts: 84% of accounts subscribed to the service  career reer spamm mmers ers

  • Distribu

buti ting ng malware: re:

 Clus uste teri ring ng resul ult: t: Largest malware distribution campaign found  113 accounts involved  57 unique URLs  Featur ures es:

 No large account base, following or number of tweets  Accounts actually belong to career spammers: No comprom

  • misi

ising ng account nts  Multiple redirects to mask the landing page:  Google Safebrowsing API, used by Twitter & URL shortening services can filter pages of at most 2 hops

 (e.g. page1  page2  malware landing site)

 SOLUTION: UTION: Redirect to more pages to bypass filters

slide-54
SLIDE 54

 Black

ackli list st effect ectiv iven eness ess: : impor porta tant!

 Twitter uses Google’s SafeBrowsing API

  • Flaw: Blacklists URLs upon posting

 Old URLs go undisturbed  Examined Blackli

cklist t charac racteri eristi tics

  • Delay
  • Evasi

sion

  • n suscept

ptibil bility ty

  • Limitatio

tions

slide-55
SLIDE 55

 Black

ackli list st Delay ay

  • Timesta

tamps ps Analysis: s:

 Analyzed history timestamp mps for each tweet w/ blacklisted URLs  Measured delay between a tweet’s posting & blackli listin ing times  Problem: m: spam is active ve until blackli list sted ed  Millions of users might click  blacklist failed to prevent

OBSERVA ERVATI TION: ON: Most of spam tweets appear on

Twitter many days prior to being blacklisted

  • Clickthrou

hrough gh Analysis: s:

  • Authors measured rate of clicks arrived for 20,000

random spam links

  • Obse

servatio rvations ns:

  • 80% of clicks  1st day of appearing on Twitter
  • 90% of clicks  within first 2 days
  • Conclus

usion

  • ns:

s:

  • Spam links don’t get too old, before clicked
  • For effective Blacklisting in Twitter:
  • Delay must be 0, to prevent clicking
slide-56
SLIDE 56

Blacklist klist Evasi sion

  • n
  • Blacklists rely on detecting reuse

se of spam m domai ains ns

 PROBLEM M IF: emails/tweets contained unique domains  URL shortening services  “mimic” new domains (w/ shortened URLs) with small cost

  • Redire

rects ts: threat to blacklists

 After a point of redirecting from site to site:  Spam links appear from non-blacklisted page, although landing page is blacklisted  Evades blacklisting  55% of blacklisted URLs cross a domain border  SOLUTION ION: Keep crawling until reaching landing page  blacklist it!

Domai main n Blacklist list Limitat tation ions

  • Some blacklists mark domains, not URLS (e.g. URIBL, Joewein)
  • PROBL

BLEM EM: False Positiv tives

 Entire domains blacklisted, although not affiliated with spam campaigns

  • REASON

ON: Many domains offer content uploading services

 Abused by spammers  Spam activity found in domains, BUT most accounts  legitimate users

  • SOLUTION

UTIONS: S:

 Remove sources of spam from domains  Use blacklists that go beyond domains (e.g. target full URLs)

slide-57
SLIDE 57

1st

st ever study

y of spam on Twitt tter er

  • Spam behavior, Clickthrough, Blacklisting Effectiveness
  • 400 million messages & 25 million URLs from public Twitter data

8% of uniqu ique e Twitter ter links ks point t to spam

Sp Spam m accoun unts ts:

  • 16% automated bots
  • 84% compromised accounts

Clus usterin tering Method

  • d:

: Easy to identify spam campai aign gns

Twitter tter spam: : more success cessfu ful than email  Clickthrough ckthrough rate = 0.13%

Meas asur uring ing delay before re blackl klistin isting g URLs, s, we find that:

  • If blacklists were used by Twitter, they will still not protect

ct all users

  • Exte

tens nsive ive URL shorte tenin ning g masks ks spam m URLs  renders blacklists ineffe fectiv tive

  • To improve defens

nses s & solve Redirect rection

  • n problem

lem:

 Crawl URLs keep redirecting until reachi ching ng the fina nal l landi ding ng page e for blacklis acklisti ting ng

  • Blacklists

klists Delay y is a problem

  • Retroactiv

roactive blackl klistin isting

 Suspends accounts which spam for long periods  finds old activities  Forces spammers to get new accounts & followers  prohibitive due to large costs

slide-58
SLIDE 58

https://gracenet.org/gnblog/%E2%80%A2-how-to-deal-with-spam-part-1/

https://cyprus-mail.com/2019/02/17/us-still-seeking-extradition-of-teen-hacker/

https://contactyahoohelpdesk.blogspot.com/2019/03/how-to-set-up-yahoo-mail- account.html

https://www.interactivesearchmarketing.com/web-crawlers-search-engines/

https://siliconangle.com/2013/05/29/twitter-shoots-down-tweetadder-in-war-on-spam/

https://www.semplaza.com/uribl-review/

https://www.manageengine.com/browser-security/safe-browsing.html

http://mrmattyoung.co.uk/wp-content/uploads/2015/12/Screen-Shot-2015-12-03-at- 11.59.38.png

https://variety.com/2017/digital/news/twitter-ad-transparency-center-1202598380/

https://ccn.waag.org/navigator/tool/mini-campaign-challenge

https://www.bankinfosecurity.com/compromised-rdp-server-tally-from-xdedic-may-be- higher-a-9218

http://thetechnews.com/2018/04/18/googles-safe-browsing-now-by-default-comes-with- android-apps/

https://davejsteele.wordpress.com/2012/09/15/google-safe-browsing-diagnostic-page/

https://zapier.com/blog/best-url-shorteners/