 
              System Security: From Discovery to Innovation XiaoFeng Wang James H. Rudy Professor Indiana University at Bloomington xw7@Indiana.edu http://www.informatics.indiana.edu/xw7/
System Security Research  Inherently Interdisciplinary and Multi-dimensional
Follow the Tech Trends v
System Security Research  Inherent Interdisciplinary and Multi-dimensional  Discovery-driven, utility centric
Sources for Security Innovations  Software security  E.g., memory attack, jump to libraries => ASLR  Mobile security  Malware infection => app sandbox + app store vetting  OS security  E.g., OS-level attacks => TEE (such as Intel’s SGX)  Network security  E.g., DDoS attacks => Syn-cookie, combined detection and blocking (e.g., AWS shield)  Browser security  E.g., Cross-origin attacks (such as XSS) => Chrome’s site isolation  Data privacy  Inference attack => Differential privacy, as integrated in iOS  Others:  Side channels on mobile systems => closing Procfs on Android; data perturbation on iOS  Credential attacks => multi-factor authentication …
Destructive Research Security research needs wreckers !!! Build a better system Fix the cracks Understand the cracks and the fundamentality Find the cracks Wreck “secure” systems
How to Innovate in Security Research  Follow the technical tide  Current: Mobile security  Emerging: IoT/CPS security  Future: ML Security, Genome Privacy  Understanding new technologies  Finding weaknesses  Finding utilities and constraints  Asking big questions  Fundamental causes of the problem?  How to do better (under the constraints)?
Examples: Destructive Research on Mobile and IoT Security CCS’13, Oakland’15, NDSS’14, 15, CCS’17
NO Bugs in apps NO implementation flaws in system What can a zero-permission app still learn?
Android Public Resource Usability Application Framework Adversary Goals Model Public APIs (Audio Usage, CPU Usage, Running application list) Linux Kernel Public files (procfs, sysfs)
Finding Your Location Adversary controlled Deliver BSSID through web-server browser Zero-permission app monitoring /proc/net/arp
Why is BSSID Sensitive? GPS BSSID to GPS Dataset BSSID BSSID BSSID
Coverage
Evaluation
Another Example: Identity Inference  Per-app mobile data usage: yet another piece of public data Tweet 580-720B Download 541-544B
Attack People who People who tweeted at tweeted at Timestamp2±60s Timestamp1 Timestamp1±60s Timestamp2 Timestamp3 Timestamp4 People who Timestamp5 tweeted at Timestamp3±60s
Identity Recovery Manual analysis of approx. 4000 twitter accounts First and last name 79% Location 32% Bio 21%
Why Identity is Important
Other Findings  Your health/financial information  Mobile data usage of Yahoo! Finance and WebMD  Your driving routes  Monitor the speaker status (on or off) when running Navigator  Stealthiness  Monitor running apps  Send data through browser when LCD is off
Our Solution  A new policy enforcement framework  Each app can specify the permissions for disclosing its mobile data usage  Four settings: NO_Access, Rounding, Aggregation and NO_Protection  Enforced by Android framework  Rounding: round the usage to the multiple of a fixed size (e.g., 256B)  Aggregation: release the total usage every hour, day or week
App Guardian Demo: http://sit.soic.indiana.edu/en/2015/ 09/11/app-guardian-oarland/ App: https://play.google.com/store/apps/ details?id=edu.iub.seclab.appguar dian
IoT Devices  What you know  What are new
Sensitive Data  Those medical devices are in FDA-approved Category II  In the same category of X-ray machine, infusion pump, …  The data they collect are highly sensitive  But can Android protect them?
What Goes Wrong here?  Android is not designed to protect its external devices  No device-app authentication ⇒ misbinding threat
Our Solution: SEACAT Policy DAC Policy Manager Service Manager BT stack Fast Resource-Type Cache AVC DAC MAC Policy Module
Security by Construction: What is the problem and How to make it work
What We Learned
What need to be done  Communication  Find out whether expected protection has been provided by the system  Challenges: limited documentation, default assumptions, etc.  Evolution  Individualize policy settings for apps with different protection demands  How to make this happen is a million-dollar question
A Step Further: Automate Security Analysis  Security requirements, utility constraints?  Attacker’s resources, information?  Vulnerability discovery in complicated systems?
Towards Data-Driven, Intelligent Security  Automatic understanding of the system  Knowledge discovery from documents  Automatic building of system model  Automatic determination of security requirements  Automatic analysis of the adversary  Cyber threat intelligent gathering and analysis  Intelligent vulnerability discovery  Knowledge-driven system analysis
A Baby Step: Semantics-based Fuzzing
Toward Automated Vulnerability Discovery  First an easier problem: Can we recover a Known vulnerability automatically?  Why important?  Patching delay => Attack Window  Security Implications of Public Bug Information
Why Hard?  Complicated bugs cannot be patched by adding a check whole chunk of code is replaced difficult to formulize how the patch works  Limitations of symbolic execution and constraint solving  path explosion  limited formula solving capability
How About Auxiliary Information?  Various sources of vulnerability information  How experienced attackers benefit from auxiliary information?  Question: is it possible to automate this process?
Semantics-Driven Fuzzing  Basic idea: Retrieve Guide SemFuzz Exploits  Target program: Linux kernel 4.0+  Information sources: CVE reports, Linux git logs  Results: 16 vulnerability types beyond input validation 18 successful exploits, 2 unknown vulnerabilities
Guidance for CVE-2017-6347
Workflow Stage 1 Stage 2
Retrieving Critical Variables Symbol Table Type Name Type Name struct sk int offset sock struct skb unsignedi len sk_buff nt …... …... Parse Tree
Retrieving System Calls  Identifying system call names is insufficient match syscall name MSG_MORE UDP loopback ==========> syscall: socket, sendto  Building a knowledge base  goal: keywords in descriptions ==> system call and parameter values  source: Linux Programmer Manual (LPM)  result: 1082 LPM pages, 373 system calls, 2000+ keywords
MSG_MORE ==> sendto(flags = MSG_MORE) r0 = socket(AF_INET, SOCK_DGRAM, 0) loopback ==> sendto(dest_addr = {INADDR_LOOPBACK}) sendto(r0, ..., MSG_MORE, {INADDR_LOOPBACK}, …) UDP ==> socket(socket_type = SOCK_DGRAM)
Effective of Semantics-based Fuzzing  Result  16% (18/112) trigger the target vulnerability  49% (46/94) reach the vulnerable functions  20% (19/94) reach the patched basic blocks  Zero-day vulnerability  found when fuzzing CVE-2016-4794  new vulnerability appears around the known flaws  reported and confirmed  Undisclosed vulnerability  found when fuzzing CVE-2016-3841  similar problems inside equivalent components  patched before we reported, but no reports disclosed
Performance  Trigger vulnerability  count: 18 (SemFuzz) v.s. 7 (Syzkaller)  time: 13.2h (SemFuzz) v.s. 33.9h (Syzkaller)  Reach vulnerable functions  count: 18 (SemFuzz) v.s. 14 (Syzkaller)  time: 1.8h (SemFuzz) v.s. 5.2h (Syzkaller)
Future of System Security Research
Where Technologies Go, Opportunities Follow  Machine Learning and Security  Adversarial learning => secure ML  Inference attacks on ML models => privacy-preserving ML  Security in Smart Things and CPS  Smart-home/smart-city security  Industrial control security  Smart grid security  Biomedical Data Privacy  Genomic data privacy (www.humangenomeprivacy.org)  Other Omics privacy  Others (e.g., blockchain)
Riding the New Tech Wave  Data-centric, Intelligent Security  NLP-enhanced protection (e.g., CTI gathering, analysis)  AI (ML/reasoning) based protection (e.g., Intelligent CTF)  Hardware enhanced protection  Scalable TEE-based protection
Moving Forward  Learn  Understand it, analyze it and crack it  Think  Ask BIG question, seek deep insight  Do  Protect What need to protect  Build What will be used
Data-Centric Intelligent Security
Recommend
More recommend