SLIDE 1
MadDroid: Characterizing and Detecting Devious Ad Contents for Android Apps
2020 International World Wide Web Conference (i3w)
SLIDE 2 MOTIVATION
The perspective of mobile advertisers themselves, who provide ad content and pay ad networks, has been rarely studied.
AD NETWORK ADVERTIS ER USER
Pay Distribute View
SLIDE 3 AD CONTENT TYPE
- 1. Re-direction Link towards landing page
- 2. Deep link to switch current page to Google Play Store
- 3. Automatically downloading of a file
Landing Page 952634.cn
SLIDE 4
CONTENT TYPE
A click-deceptive Image
SLIDE 5
5 CATEGORIES DEVIOUS AD CONTENT
A click-deceptive Image
SLIDE 6
SLIDE 7 TCM: TRAFFIC COLLECTION MODULE
- Challenge: Not all traffics are ad traffic
- Solution: Click on Main UI and Exit UI [61], and Click on Webview,
ImageView, and ViewFlipper
- Implementation: BFS, Base on Attribute
Figure 5: An example of a view tree
SLIDE 8 CONTENT EXTRACT MODULE
- Purpose: Get images and executable scripts,
- Method: Fiddler + Http function hook
- Challenge: 1. How to determine the domain is Ad domain?
- 2. Given Ad libs, the domain may change.
Input: Host and Ad libs Output: Host-lib Mapping Solution: Iteratively find libs
SLIDE 9 DEVIOUSNESS DETECTION MODULE
- 5 categories -> 5 dedicated parts
- Click deceiving image: Object recognition
- Censored Image: Google API
- Gambling: OCR
- Malicious App, Script, redirection link: Online antivirus platform
SLIDE 10
EVALUATION: RESEARCH QUESTION
SLIDE 11
RQ1: Can MadDroid detect devious mobile ad contents
SLIDE 12
RQ1: Can MadDroid detect devious mobile ad contents
SLIDE 13
RQ1: Can MadDroid detect devious mobile ad contents
SLIDE 14
RQ1: Can MadDroid detect devious mobile ad contents
SLIDE 15
RQ2: How effective is the HTTP hooking approach (in the CEM module) in locating ad traffic from general network traffic?
Host Only: Input host Lib Only: Input Lib Host&lib: Input both host name and lib
SLIDE 16
RQ3: ACCURACY
Click Deceptive: 97.51% recall, 97.99% accuracy Censored Image: 100% The rest: not specified
SLIDE 17
THANK YOU
SLIDE 18 COMPARISON: TCM: TRAFFIC COLLECTION MODULE
- Purpose: Generate Ad Traffic only
- Method: BFS, Base on Attribute
My framework: Collecting traffic after launch for 20s 这样做有⾜够的理论基础吗 是否还需要做其他⽅向的收集 Generalization
SLIDE 19 COMPARISON: CONTENT EXTRACT MODULE
- Purpose: Get images and executable scripts
- Method: Fiddler + Http function hook
My Framework text/Html: 赌博⽂字 得到买⽅域名和使⽤的函数名 mapping 可以⽤来做campaign⽣态研究 还可以⼲什么 制作⽅-libraries-domain(ip)
SLIDE 20 COMPARISON: DEVIOUSNESS DETECTION MODULE
- 5 categories -> 5 dedicated parts
- Click deceiving image: Object recognition
- Censored Image: Google API
- Malicious App, Script, redirection link: Online antivirus platform
Gambling,OCR My framework 检测赌博⽂字⽤关键词,赌博图⽚可以采⽤OCR NLP可能⽤不上 发现新的关键词,参考Tsinghua Duan
SLIDE 21
EXPERIMENT
JSBUNDLE: 2 Pop-up:18 Embedded: 17 Normal: 3 HTML: Webkit CFNetwork Finding 基本所有正常的APP包含服务器返回url,并之后访问此url的情况。 devious App全是这种情况。 不能依据这⼀点做判断
SLIDE 22
TO-DO
通过动态调试⼿段,观察app调⽤函数的相同点 iOS新的challenge: ⽭盾点: ⼤量app基于webview加载 ⽆法判断是否是灰产 Parallel webview类似占⽐ 第三⽅库