How risky is the software you use?
https://shmoo18.cyber-itl.org
Cyber Independent Testing Lab { Sarah Zatko , Tim Carstens , Patrick Stach , Parker Thompson , mudge } @ CITL
How risky is the Cyber Independent Testing Lab software you use? { - - PowerPoint PPT Presentation
How risky is the Cyber Independent Testing Lab software you use? { Sarah Zatko , Tim Carstens , Patrick Stach , Parker Thompson , mudge } @ CITL https://shmoo18.cyber-itl.org A non-profit organization based in USA Founded by Sarah
Cyber Independent Testing Lab { Sarah Zatko , Tim Carstens , Patrick Stach , Parker Thompson , mudge } @ CITL
security by providing the public with accurate reporting on the security of popular software
https://www.consumerreports.org
& The Digital Standard
https://thedigitalstandard.org
Hardened Gentoo Ubuntu 16 LTS Samsung UN55KS9000 LG 55UH8500
Visio LG Samsung Ubuntu P55-E1
49UJ7700 UN55KS9000
16.04 # binaries 504 1740 4243 4991 aslr 98% 67% 80% 100% stack DEP 99% 99%* 99%* 99% 64 bit 0% 0% 0% 98% RELRO 100% 4% 9% 96% stack guards 68% 1% 57% 79% fully fortified 7% 0% 6% 11% partial fort 43% 1% 37% 42% has good 3% 3% 25% 4% has risky 68% 66% 67% 67% has bad 28% 34% 23% 28% has ick 3% 5% 5% 3%
1. Remain independent of vendor influence 2. Automated, comparable, quantitative analysis 3. Act as a user watchdog
vendors
Information Theory Perspective
Develop an heuristic which can efficiently answer (1) but not necessarily (2)
Years in the field give us a good starting point – look for the same things we’d look at when trying to pick a soft target to exploit. But, this field doesn’t know enough about impact/effectiveness of best practices.
Browser “Underground” Exploit Price Microsoft Edge $80,000 Google Chrome $80,000 Apple Safari $50,000 Mozilla Firefox $30,000
software’s “security”
exploitable
heuristics
P(s is secure)
Ph,k = P(h units of fuzzing against s yields < k unique crashes)
Ph,k(M) = P(h fuzzing on s yields < k unique crashes | M is true of s )
Which M have Ph,k(M) > 0.5 for large log(h)/k ? Which indicators (M) can be used to predict fuzzing performance?
Indicators might not be causal, and that’s OK:
teams who ship reliable software
Want to find:
Look for:
(Moha112100 @ Wikipedia)
(Rob Lavinsky)
(Weinrich Minerals, Inc.)
Static (Prototype) Static (Extensible) AFL CITL-fuzz NEW FUZZER Today First Data First reports Final Model & Reports
good usage of CFI
than OSX
worse than OSX’s
Ubuntu Windows OSX 16.04 10 10.13.1 64 bit 97% 66% 77% aslr 100% 99% 100% dep 99% 98% 100% stack_guards 79% 40% 73% fully fortified 11% 2% partial fort 42% 33% cfi 92% good 4% 19% 29% risky 67% 30% 60% bad 28% 3% 24% ick 3% 0% 2%
by a nose in static analysis
score yet
Chrome Firefox Opera version 63.0.3239.13 57.0.4 50.0.2762.4 64bit 100% 100% 100% aslr 100% 100% 100% dep 100% 100% 100% relro 86% 100% 11% stack_guards 86% 87% 100% partial fortification 29% 70% 56% functions good 12% 4% 22% risky 86% 91% 100% bad 62% 61% 89% scores 5th % 35 64 43 50th % 58 78 48 95th % 71 86 65
bit with ASLR, Stack DEP
guards and fortification
Heap protection flag
much
percentile for High Sierra (71)
Chrome Firefox Opera Safari 63.0.3239.13 57.0.4 50.0.2762.45 11.0.1 count 9 19 8 25 64bit 89% 100% 100% 88% aslr 89% 100% 100% 100% dep 100% 100% 100% 100% heap 11% 0% 0% 0% stack_guards 78% 95% 88% 68% partial fortification 33% 47% 38% 4% good 33% 37% 25% 8% risky 89% 95% 100% 44% bad 44% 68% 38% 8% scores 5th % 33 43 38 24 50th % 51 56 51 51 95th % 63 71 63 64
a hair
the others
guards
neither, has more risky functions
Chrome Edge Firefox Opera version 63.0.3239 41.16299 57.0.4 50.0.2762 count 31 7 31 16 64bit 62% 100% 94% 100% dep 100% 100% 100% 100% aslr 100% 100% 100% 100% cfi 13% 100% 13% 38% stack guards 94% 57% 61% 94% functions good 0% 0% 3% 0% risky 9% 0% 16% 0% bad 9% 0% 0% 0% scores 5th % 23 44 7.5 44 50th % 44 64 44 44 95th % 64 64 44 64
OSX OSX OSX OSX total 10.10.5 10.11.6 10.12.6 10.13.1 change # binaries 6449 6456 7017 6622 64bit 69% 71% 73% 77% +8 aslr 99% 99% 100%* 100%* +1 heap 5% 5% 4% 4% -1 stack_guards 71% 71% 72% 73% +2 good functions 27% 27% 27% 29% +2 risky functions 62% 62% 60% 60% -2 bad functions 25% 25% 24% 24% -1
performance
Safari total in OSX 10.10.5 10.11.6 10.12.6 10.13.1 change # binaries 9 13 22 25 64bit 83% 92% 86% 88% +5* stack_guards 67% 69% 73% 68% +1 partial fortification 0% 0% 0% 4% +4 good functions 17% 15% 9% 8% -9 risky 50% 36% 38% 44% -6* bad 0% 8% 5% 8% +8
https://www.consumerreports.org
https://thedigitalstandard.org
compute the surrogate security scores at-scale
already making surprising discoveries: see our recent talks at DEFCON/Blackhat
Make sure your software employs every exploit mitigation our TAB has ever heard of!
{ Sarah Zatko , Tim Carstens , Parker Thompson , Patrick Stach , mudge } @ CITL