SLIDE 1 Information Leakage
CS 161: Computer Security
TAs: Jethro Beekman, Mobin Javed, Antonio Lupher, Paul Pearce & Matthias Vallentin
http://inst.eecs.berkeley.edu/~cs161/
April 25, 2013
SLIDE 2 Announcements / Goals
- HKN surveys at the end of next
Thursday’s lecture (May 2nd)
- Next Thursday’s lecture will be course
review: flag what you’d liked covered!
- Today’s topic: information leakage
– Sneaky ways of communicating – Sneaky ways of extracting information – Privacy: ways in which sites track information about users
SLIDE 3 Covert Channels
- Communication between two cooperating
parties that uses a hidden (secret) channel
- Goal: evade inspection by a reference
monitor (“warden”)
– Warden doesn’t realize communication is possible
- Main requirement is agreement between sender
and receiver (established in advance)
- Example: suppose (unprivileged) process A wants
to send 128 bits of secret data to (unprivileged) process B …
– But can’t use pipes, sockets, signals, or shared memory; and can only read files, can’t write them
SLIDE 4 Covert Channels, con’t
- Method #1: A syslog’s data, B reads via /var/log/…
- Method #2: select 128 files in advance. A opens for
read only those corresponding to 1-bit’s in secret.
– B recovers bit values by inspecting access times on files
- Method #3: divide A’s running time up into 128
- slots. A either runs CPU-bound - or idle - in a slot
depending on corresponding bit in the secret. B monitors A’s CPU usage in each slot.
- Method #4: Suppose A can run 128 times. Each
time it either exits after 2 seconds (0 bit) or after 5 seconds (1 bit).
– There are zillions of Method #5’s!
SLIDE 5 Covert Channels, con’t
- Defenses?
- #1 challenge is identifying the channels
– Can then prevent sender or receiver from accessing them
- Some mechanisms can be very hard to completely
remove
– E.g., duration of program execution
- Fundamental issue is the covert channel’s capacity
– Bits (or bit-rate) that adversary can obtain using it
- Crucial for defenders to consider their threat model
– (also true for Side Channels as we’ll discuss next)
- Usual assumption is that Attacker Wins (can’t
effectively stop communication, esp. if very low rate)
SLIDE 6 Side Channels
- Inferring information meant to be hidden /
private by exploiting how system is structured
– Note: unlike for steganography & covert channels, here we do not assume a cooperating sender / receiver
- Can be difficult to recognize because often
system builders “abstract away” seemingly irrelevant elements of system structure
- Side channels can arise from physical
structure …
SLIDE 7
SLIDE 8 Side Channels
- Inferring information meant to be hidden /
private by exploiting how system is structured
– Note: unlike for steganography & covert channels, here we do not assume a cooperating sender / receiver
- Can be difficult to recognize because often
system builders “abstract away” seemingly irrelevant elements of system structure
- Side channel can arise from physical
structure …
– … or higher-layer abstractions
SLIDE 9 /* ¡Returns ¡true ¡if ¡the ¡password ¡from ¡the ¡* ¡user, ¡'p', ¡matches ¡the ¡correct ¡master ¡* ¡password. ¡*/ bool ¡check_password(char ¡*p) { static ¡char ¡*master_pw ¡= ¡"T0p$eCRET"; int ¡i; for(i=0; ¡p[i] ¡&& ¡master_pw[i]; ¡++i) if(p[i] ¡!= ¡master_pw[i]) return ¡FALSE; /* ¡Ensure ¡both ¡strings ¡are ¡same ¡len. ¡*/ return ¡p[i] ¡== ¡master_pw[i]; }
Attacker knows code, but not this value
SLIDE 10 Inferring Password via Side Channel
- Suppose the attacker’s code can call
check_password many times (but not billions/trillions)
– But attacker can’t breakpoint or inspect the code
- How could the attacker infer the master
password using side channel information?
- Consider layout of p in memory:
wildGUe$s ... if(check_password(p)) BINGO(); ...
SLIDE 11 wildGUe$s Spread p across different memory pages:
Arrange for this page to be paged out
If master password doesn’t start with ‘w’, then loop exits on first iteration (i=0): for(i=0; ¡p[i] ¡&& ¡master_pw[i]; ¡++i) if(p[i] ¡!= ¡master_pw[i]) return ¡FALSE; If it does start with ‘w’, then loop proceeds to next iteration, generating a page fault that the caller can observe
SLIDE 12 Ajunk.... Bjunk.... Tjunk.... … …
No page fault Page fault! No page fault
TAunk....
No page fault
TBunk....
No page fault
T0Ank....
No page fault …
T0unk....
Page fault!
T0p$eCRET ?
Fix?
SLIDE 13
bool ¡check_password2(char ¡*p) { static ¡char ¡*master_pw ¡= ¡"T0p$eCRET”; int ¡i; bool ¡is_correct ¡= ¡TRUE; for(i=0; ¡p[i] ¡&& ¡master_pw[i]; ¡++i) if(p[i] ¡!= ¡master_pw[i]) is_correct ¡= ¡FALSE; ¡ if(p[i] ¡!= ¡master_pw[i]) is_correct ¡= ¡FALSE; return ¡is_correct; }
Note: still leaks length of master password
SLIDE 14 Exploiting Side Channels For Stealth Scanning
- Can attacker using system A scan victim V’s
system to see what services V runs …
- … without V being able to learn A’s IP
address?
- Seems impossible: how can A receive the
results of probes A sends to V, unless probes include A’s IP address for V’s replies?
SLIDE 15 IP Header Side Channel
4-bit Version 4-bit Header Length 8-bit Type of Service (TOS)
16-bit Total Length (Bytes) 16-bit Identification
3-bit Flags
13-bit Fragment Offset
8-bit Time to Live (TTL)
8-bit Protocol 16-bit Header Checksum 32-bit Source IP Address 32-bit Destination IP Address Payload
ID field is supposed to be unique per IP packet. One easy way to do this: increment it each time system sends a new packet.
SLIDE 23 SYN-ACK
Upon receiving RST, Patsy ignores it and does nothing, per TCP spec.
SLIDE 30 UI Side Channel Snooping
- Scenario: Ann the Attacker works in a
building across the street from Victor the
- Victim. Late one night Ann can see Victor
hard at work in his office, but can’t see his CRT display, just the glow of it on his face.
- Can Ann still somehow snoop on what
Victor’s display is showing?
SLIDE 31
SLIDE 32
CRT display is made up of an array of phosphor pixels
640x480 (say)
SLIDE 33 Electron gun sweeps across row
- f pixels, illuminating each that
should be lit one after the other
SLIDE 34
When done with row, proceeds to next. When done with screen, starts over.
SLIDE 35
Thus, if image isn’t changing, each pixel is periodically illuminated at its own unique time
SLIDE 36
Illumination is actually short-lived (100s of nsec).
SLIDE 37
So if Ann can synchronize a high-precision clock with when the beam starts up here …
SLIDE 38
Then by looking for changes in light level (flicker) matched with high-precision timing, she can tell whether say this pixel is on or off …
SLIDE 39
… or for that matter, the values of all of the pixels
SLIDE 40
Photomultiplier + high-precision timing + deconvolution to remove noise
SLIDE 41
SLIDE 42 Information Leakage via Inducing Faults
- Suppose there’s a sealed black box that performs
RSA decryption:
– X → → Y Y = Xd mod N (N = pq)
- Attacker gets access to box, can play with it freely
– Knows N …. but not d, p or q – Can repeatedly feed it X’s, observe corresponding Y’s
- Suppose for efficiency box computes Xd mod N
using Chinese Remainder Theorem (CRT)
– Number theory trick that’s faster than repeated exponentiation – (Note, this is a common performance approach)
SLIDE 43 Inducing Faults, con’t
- CRT works by first computing:
– y1 = (X mod p)d mod (p-1) – y2 = (X mod q)d mod (q-1)
- Given that, CRT provides a cheap function f
so that for Y = f(y1, y2) we have:
– Y = y1 mod p; Y = y2 mod q
- … and that gives us our goal, Y = Xd mod N
- Suppose now attacker repeatedly feeds the
same X into the box, observing resulting Y …
– … but can induce the box to sometimes glitch (causes one computation step to work incorrectly)
SLIDE 44 Inducing Faults, con’t
- Assume glitch induces a random fault
- Most likely it occurs during computation of
either y1 = (X mod p)d mod (p-1)
- r y2 = (X mod q)d mod (q-1)
- Attacker tell glitch occurs since will observe
box produce Y' != Y
- Suppose glitch occurs when computing y1 …
- Then Y' is incorrect mod p …
– … but correct mod q (since y2 okay)
SLIDE 45 Inducing Faults, con’t
- Attacker has Y' != Y mod p, Y' = Y mod q
– Y-Y' is a multiple of q but not p
- Attacker computes Z = GCD(Y-Y', N) (fast!)
- Z = ?
– Well, must be either 1, p, q, or N (since N = pq) – But Y-Y' is a multiple of q, so it’s either q or N – But Y-Y' is not a multiple of p, so it’s q
– Attacker just factored N!
– Box could check that Ye mod N = X
SLIDE 46
Information Leakage: Tracking Web Usage
SLIDE 47 Tracking Your Web Surfing
- The sites you visit learn:
– The URLs you’re interested in
- Google/Bing also learns what you’re searching for
– Your IP address
- Thus, your service provider & geo-location
- Can often link you to other activity including at other
sites
– Your browser’s capabilities, which OS you run, which language you prefer – Which URL you looked at that took you there
SLIDE 48 Tracking Your Web Surfing, con’t
- Oh and also cookies.
- Cookies = state that server tells browser to
store locally
– Name/value pair, plus expiration date
- Browser returns the state any time visiting
the same site
- Where’s the harm in that?
And are these used much anyway?
SLIDE 49 Let’s remove all
SLIDE 50
Cool, no web site is tracking us …
SLIDE 51
We do a search on “private browsing”
SLIDE 52
SLIDE 53 Google has stored a couple of cookies
SLIDE 54
Goodness knows what info they decided to put in the cookie
SLIDE 55
But it lasts for 6 months …
SLIDE 56
We click on the top result
SLIDE 57
Note that this mode is privacy from your family, not from web sites!
SLIDE 58
Ironically, we’ve gained a bunch of cookies in the process
SLIDE 59
This one sticks around for two years.
SLIDE 60
How did YouTube enter the picture??
SLIDE 61
YouTube is remembering the version of Flash I’m running … for the next 10 years!
SLIDE 62
We navigate to The New York Times …
SLIDE 63
SLIDE 64
What a lot of yummy cookies!
SLIDE 65
Here are the ones from the website itself …
SLIDE 66 This one tracks the details
SLIDE 67
This one tracks my IP address
SLIDE 68
doubleclick.net - who’s that? And how did it get there from visiting www.nytimes.com?
SLIDE 69 Third-Party Cookies
- How can a web site enable a third party to plant
cookies in your browser & later retrieve them?
– Answer: using a “web bug” – Include on the site’s page (for example):
- <img ¡src="http://doubleclick.net/ad.gif" ¡width=1
height=1>
- Why would a site do that?
– Site has a business relationship w/ DoubleClick – Now DoubleClick sees all of your activity that involves their web sites (each of them includes the web bug)
- Because your browser dutifully sends them their cookies for
any web page that has that web bug
- Identifier in cookie ties together activity as = YOU
*
* Owned by Google, by the way
SLIDE 70
Remember this 2-year Mozilla cookie?
SLIDE 71 Google Analytics
- Any web site can (anonymously) register with
Google to instrument their site for analytics
– Gather information about who visits, what they do when they visit
- To do so, site adds a small Javascript snippet
that loads http://www.google-analytics.com/ga.js
– You can see sites that do this because they introduce a "__utma" cookie
- Code ships off to Google information associated
with your visit to the web site
– Shipped by fetching a GIF w/ values encoded in URL – Web site can use it to analyze their ad “campaigns” – Not a small amount of info …
SLIDE 72
SLIDE 73
Values Reportable via Google Analytics
SLIDE 74 Still More Tracking Techniques …
- Any scenario where browsers execute
programs that manage persistent state can support tracking by cookies
– Such as …. Flash ?
SLIDE 75 My browser had Flash cookies from 67 sites! Sure, this is where you’d think to look to analyze what Flash cookies are stored on your machine
Some Flash cookies “respawn” regular browser cookies that you previously deleted!
SLIDE 76 Tracking - What’s the Big Deal?
- Cookies etc. form the core of how Internet
advertising works today
– Without them, arguably you’d have to pay for content up front a lot more
- (and payment would mean you’d lose anonymity anyway)
– A “better ad experience” is not necessarily bad
- Ads that reflect your interests; not seeing repeated ads
- But: ease of gathering so much data so easily ⇒
concern of losing control how it’s used
– Content shared with friends doesn’t just stay with friends …
SLIDE 77
When you interview, they Know What You’ve Posted
SLIDE 78
SLIDE 79 Privacy - What’s the Big Deal?
- Cookies etc. form the core of how Internet
advertising works today
– Without them, arguably you’d have to pay for content up front a lot more
- (and payment would mean you’d lose anonymity anyway)
– A “better ad experience” is not necessarily bad
- Ads that reflect your interests; not seeing repeated ads
- But: ease of gathering so much data so easily ⇒
concern of losing control how it’s used
– Content shared with friends doesn’t just stay with friends … – You really don’t have a good sense of just what you’re giving away …