Information Leakage CS 161: Computer Security Prof. Vern Paxson - - PowerPoint PPT Presentation

information leakage
SMART_READER_LITE
LIVE PREVIEW

Information Leakage CS 161: Computer Security Prof. Vern Paxson - - PowerPoint PPT Presentation

Information Leakage CS 161: Computer Security Prof. Vern Paxson TAs: Jethro Beekman, Mobin Javed, Antonio Lupher, Paul Pearce & Matthias Vallentin http://inst.eecs.berkeley.edu/~cs161/ April 25, 2013 Announcements / Goals HKN


slide-1
SLIDE 1

Information Leakage

CS 161: Computer Security

  • Prof. Vern Paxson

TAs: Jethro Beekman, Mobin Javed, Antonio Lupher, Paul Pearce & Matthias Vallentin

http://inst.eecs.berkeley.edu/~cs161/

April 25, 2013

slide-2
SLIDE 2

Announcements / Goals

  • HKN surveys at the end of next

Thursday’s lecture (May 2nd)

  • Next Thursday’s lecture will be course

review: flag what you’d liked covered!

  • Today’s topic: information leakage

– Sneaky ways of communicating – Sneaky ways of extracting information – Privacy: ways in which sites track information about users

slide-3
SLIDE 3

Covert Channels

  • Communication between two cooperating

parties that uses a hidden (secret) channel

  • Goal: evade inspection by a reference

monitor (“warden”)

– Warden doesn’t realize communication is possible

  • Main requirement is agreement between sender

and receiver (established in advance)

  • Example: suppose (unprivileged) process A wants

to send 128 bits of secret data to (unprivileged) process B …

– But can’t use pipes, sockets, signals, or shared memory; and can only read files, can’t write them

slide-4
SLIDE 4

Covert Channels, con’t

  • Method #1: A syslog’s data, B reads via /var/log/…
  • Method #2: select 128 files in advance. A opens for

read only those corresponding to 1-bit’s in secret.

– B recovers bit values by inspecting access times on files

  • Method #3: divide A’s running time up into 128
  • slots. A either runs CPU-bound - or idle - in a slot

depending on corresponding bit in the secret. B monitors A’s CPU usage in each slot.

  • Method #4: Suppose A can run 128 times. Each

time it either exits after 2 seconds (0 bit) or after 5 seconds (1 bit).

  • Method #5: …

– There are zillions of Method #5’s!

slide-5
SLIDE 5

Covert Channels, con’t

  • Defenses?
  • #1 challenge is identifying the channels

– Can then prevent sender or receiver from accessing them

  • Some mechanisms can be very hard to completely

remove

– E.g., duration of program execution

  • Fundamental issue is the covert channel’s capacity

– Bits (or bit-rate) that adversary can obtain using it

  • Crucial for defenders to consider their threat model

– (also true for Side Channels as we’ll discuss next)

  • Usual assumption is that Attacker Wins (can’t

effectively stop communication, esp. if very low rate)

slide-6
SLIDE 6

Side Channels

  • Inferring information meant to be hidden /

private by exploiting how system is structured

– Note: unlike for steganography & covert channels, here we do not assume a cooperating sender / receiver

  • Can be difficult to recognize because often

system builders “abstract away” seemingly irrelevant elements of system structure

  • Side channels can arise from physical

structure …

slide-7
SLIDE 7
slide-8
SLIDE 8

Side Channels

  • Inferring information meant to be hidden /

private by exploiting how system is structured

– Note: unlike for steganography & covert channels, here we do not assume a cooperating sender / receiver

  • Can be difficult to recognize because often

system builders “abstract away” seemingly irrelevant elements of system structure

  • Side channel can arise from physical

structure …

– … or higher-layer abstractions

slide-9
SLIDE 9

/* ¡Returns ¡true ¡if ¡the ¡password ¡from ¡the ¡* ¡user, ¡'p', ¡matches ¡the ¡correct ¡master ¡* ¡password. ¡*/ bool ¡check_password(char ¡*p) { static ¡char ¡*master_pw ¡= ¡"T0p$eCRET"; int ¡i; for(i=0; ¡p[i] ¡&& ¡master_pw[i]; ¡++i) if(p[i] ¡!= ¡master_pw[i]) return ¡FALSE; /* ¡Ensure ¡both ¡strings ¡are ¡same ¡len. ¡*/ return ¡p[i] ¡== ¡master_pw[i]; }

Attacker knows code, but not this value

slide-10
SLIDE 10

Inferring Password via Side Channel

  • Suppose the attacker’s code can call

check_password many times (but not billions/trillions)

– But attacker can’t breakpoint or inspect the code

  • How could the attacker infer the master

password using side channel information?

  • Consider layout of p in memory:

wildGUe$s ... if(check_password(p)) BINGO(); ...

slide-11
SLIDE 11

wildGUe$s Spread p across different memory pages:

Arrange for this page to be paged out

If master password doesn’t start with ‘w’, then loop exits on first iteration (i=0): for(i=0; ¡p[i] ¡&& ¡master_pw[i]; ¡++i) if(p[i] ¡!= ¡master_pw[i]) return ¡FALSE; If it does start with ‘w’, then loop proceeds to next iteration, generating a page fault that the caller can observe

slide-12
SLIDE 12

Ajunk.... Bjunk.... Tjunk.... … …

No page fault Page fault! No page fault

TAunk....

No page fault

TBunk....

No page fault

T0Ank....

No page fault …

T0unk....

Page fault!

T0p$eCRET ?

Fix?

slide-13
SLIDE 13

bool ¡check_password2(char ¡*p) { static ¡char ¡*master_pw ¡= ¡"T0p$eCRET”; int ¡i; bool ¡is_correct ¡= ¡TRUE; for(i=0; ¡p[i] ¡&& ¡master_pw[i]; ¡++i) if(p[i] ¡!= ¡master_pw[i]) is_correct ¡= ¡FALSE; ¡ if(p[i] ¡!= ¡master_pw[i]) is_correct ¡= ¡FALSE; return ¡is_correct; }

Note: still leaks length of master password

slide-14
SLIDE 14

Exploiting Side Channels For Stealth Scanning

  • Can attacker using system A scan victim V’s

system to see what services V runs …

  • … without V being able to learn A’s IP

address?

  • Seems impossible: how can A receive the

results of probes A sends to V, unless probes include A’s IP address for V’s replies?

slide-15
SLIDE 15

IP Header Side Channel

4-bit Version 4-bit Header Length 8-bit Type of Service (TOS)

16-bit Total Length (Bytes) 16-bit Identification

3-bit Flags

13-bit Fragment Offset

8-bit Time to Live (TTL)

8-bit Protocol 16-bit Header Checksum 32-bit Source IP Address 32-bit Destination IP Address Payload

ID field is supposed to be unique per IP packet. One easy way to do this: increment it each time system sends a new packet.

slide-16
SLIDE 16

SYN-ACK

slide-17
SLIDE 17

SYN-ACK

slide-18
SLIDE 18

SYN-ACK

slide-19
SLIDE 19

SYN-ACK

slide-20
SLIDE 20

SYN-ACK

Spoofed

slide-21
SLIDE 21

SYN-ACK

slide-22
SLIDE 22

SYN-ACK

slide-23
SLIDE 23

SYN-ACK

Upon receiving RST, Patsy ignores it and does nothing, per TCP spec.

slide-24
SLIDE 24

SYN-ACK

slide-25
SLIDE 25

SYN-ACK

slide-26
SLIDE 26

SYN-ACK

slide-27
SLIDE 27

SYN-ACK

Spoofed

slide-28
SLIDE 28

SYN-ACK

slide-29
SLIDE 29

SYN-ACK

slide-30
SLIDE 30

UI Side Channel Snooping

  • Scenario: Ann the Attacker works in a

building across the street from Victor the

  • Victim. Late one night Ann can see Victor

hard at work in his office, but can’t see his CRT display, just the glow of it on his face.

  • Can Ann still somehow snoop on what

Victor’s display is showing?

slide-31
SLIDE 31
slide-32
SLIDE 32

CRT display is made up of an array of phosphor pixels

640x480 (say)

slide-33
SLIDE 33

Electron gun sweeps across row

  • f pixels, illuminating each that

should be lit one after the other

slide-34
SLIDE 34

When done with row, proceeds to next. When done with screen, starts over.

slide-35
SLIDE 35

Thus, if image isn’t changing, each pixel is periodically illuminated at its own unique time

slide-36
SLIDE 36

Illumination is actually short-lived (100s of nsec).

slide-37
SLIDE 37

So if Ann can synchronize a high-precision clock with when the beam starts up here …

slide-38
SLIDE 38

Then by looking for changes in light level (flicker) matched with high-precision timing, she can tell whether say this pixel is on or off …

slide-39
SLIDE 39

… or for that matter, the values of all of the pixels

slide-40
SLIDE 40

Photomultiplier + high-precision timing + deconvolution to remove noise

slide-41
SLIDE 41
slide-42
SLIDE 42

Information Leakage via Inducing Faults

  • Suppose there’s a sealed black box that performs

RSA decryption:

– X → → Y Y = Xd mod N (N = pq)

  • Attacker gets access to box, can play with it freely

– Knows N …. but not d, p or q – Can repeatedly feed it X’s, observe corresponding Y’s

  • Suppose for efficiency box computes Xd mod N

using Chinese Remainder Theorem (CRT)

– Number theory trick that’s faster than repeated exponentiation – (Note, this is a common performance approach)

slide-43
SLIDE 43

Inducing Faults, con’t

  • CRT works by first computing:

– y1 = (X mod p)d mod (p-1) – y2 = (X mod q)d mod (q-1)

  • Given that, CRT provides a cheap function f

so that for Y = f(y1, y2) we have:

– Y = y1 mod p; Y = y2 mod q

  • … and that gives us our goal, Y = Xd mod N
  • Suppose now attacker repeatedly feeds the

same X into the box, observing resulting Y …

– … but can induce the box to sometimes glitch (causes one computation step to work incorrectly)

slide-44
SLIDE 44

Inducing Faults, con’t

  • Assume glitch induces a random fault
  • Most likely it occurs during computation of

either y1 = (X mod p)d mod (p-1)

  • r y2 = (X mod q)d mod (q-1)
  • Attacker tell glitch occurs since will observe

box produce Y' != Y

  • Suppose glitch occurs when computing y1 …
  • Then Y' is incorrect mod p …

– … but correct mod q (since y2 okay)

slide-45
SLIDE 45

Inducing Faults, con’t

  • Attacker has Y' != Y mod p, Y' = Y mod q

– Y-Y' is a multiple of q but not p

  • Attacker computes Z = GCD(Y-Y', N) (fast!)
  • Z = ?

– Well, must be either 1, p, q, or N (since N = pq) – But Y-Y' is a multiple of q, so it’s either q or N – But Y-Y' is not a multiple of p, so it’s q

  • Whoops!

– Attacker just factored N!

  • Fix?

– Box could check that Ye mod N = X

slide-46
SLIDE 46

Information Leakage: Tracking Web Usage

slide-47
SLIDE 47

Tracking Your Web Surfing

  • The sites you visit learn:

– The URLs you’re interested in

  • Google/Bing also learns what you’re searching for

– Your IP address

  • Thus, your service provider & geo-location
  • Can often link you to other activity including at other

sites

– Your browser’s capabilities, which OS you run, which language you prefer – Which URL you looked at that took you there

  • Via “Referer” header
slide-48
SLIDE 48

Tracking Your Web Surfing, con’t

  • Oh and also cookies.
  • Cookies = state that server tells browser to

store locally

– Name/value pair, plus expiration date

  • Browser returns the state any time visiting

the same site

  • Where’s the harm in that?

And are these used much anyway?

slide-49
SLIDE 49

Let’s remove all

  • f our cookies
slide-50
SLIDE 50

Cool, no web site is tracking us …

slide-51
SLIDE 51

We do a search on “private browsing”

slide-52
SLIDE 52
slide-53
SLIDE 53

Google has stored a couple of cookies

  • n our system
slide-54
SLIDE 54

Goodness knows what info they decided to put in the cookie

slide-55
SLIDE 55

But it lasts for 6 months …

slide-56
SLIDE 56

We click on the top result

slide-57
SLIDE 57

Note that this mode is privacy from your family, not from web sites!

slide-58
SLIDE 58

Ironically, we’ve gained a bunch of cookies in the process

slide-59
SLIDE 59

This one sticks around for two years.

slide-60
SLIDE 60

How did YouTube enter the picture??

slide-61
SLIDE 61

YouTube is remembering the version of Flash I’m running … for the next 10 years!

slide-62
SLIDE 62

We navigate to The New York Times …

slide-63
SLIDE 63
slide-64
SLIDE 64

What a lot of yummy cookies!

slide-65
SLIDE 65

Here are the ones from the website itself …

slide-66
SLIDE 66

This one tracks the details

  • f my system & browser
slide-67
SLIDE 67

This one tracks my IP address

slide-68
SLIDE 68

doubleclick.net - who’s that? And how did it get there from visiting www.nytimes.com?

slide-69
SLIDE 69

Third-Party Cookies

  • How can a web site enable a third party to plant

cookies in your browser & later retrieve them?

– Answer: using a “web bug” – Include on the site’s page (for example):

  • <img ¡src="http://doubleclick.net/ad.gif" ¡width=1

height=1>

  • Why would a site do that?

– Site has a business relationship w/ DoubleClick – Now DoubleClick sees all of your activity that involves their web sites (each of them includes the web bug)

  • Because your browser dutifully sends them their cookies for

any web page that has that web bug

  • Identifier in cookie ties together activity as = YOU

*

* Owned by Google, by the way

slide-70
SLIDE 70

Remember this 2-year Mozilla cookie?

slide-71
SLIDE 71

Google Analytics

  • Any web site can (anonymously) register with

Google to instrument their site for analytics

– Gather information about who visits, what they do when they visit

  • To do so, site adds a small Javascript snippet

that loads http://www.google-analytics.com/ga.js

– You can see sites that do this because they introduce a "__utma" cookie

  • Code ships off to Google information associated

with your visit to the web site

– Shipped by fetching a GIF w/ values encoded in URL – Web site can use it to analyze their ad “campaigns” – Not a small amount of info …

slide-72
SLIDE 72
slide-73
SLIDE 73

Values Reportable via Google Analytics

slide-74
SLIDE 74

Still More Tracking Techniques …

  • Any scenario where browsers execute

programs that manage persistent state can support tracking by cookies

– Such as …. Flash ?

slide-75
SLIDE 75

My browser had Flash cookies from 67 sites! Sure, this is where you’d think to look to analyze what Flash cookies are stored on your machine

Some Flash cookies “respawn” regular browser cookies that you previously deleted!

slide-76
SLIDE 76

Tracking - What’s the Big Deal?

  • Cookies etc. form the core of how Internet

advertising works today

– Without them, arguably you’d have to pay for content up front a lot more

  • (and payment would mean you’d lose anonymity anyway)

– A “better ad experience” is not necessarily bad

  • Ads that reflect your interests; not seeing repeated ads
  • But: ease of gathering so much data so easily ⇒

concern of losing control how it’s used

– Content shared with friends doesn’t just stay with friends …

slide-77
SLIDE 77

When you interview, they Know What You’ve Posted

slide-78
SLIDE 78
slide-79
SLIDE 79

Privacy - What’s the Big Deal?

  • Cookies etc. form the core of how Internet

advertising works today

– Without them, arguably you’d have to pay for content up front a lot more

  • (and payment would mean you’d lose anonymity anyway)

– A “better ad experience” is not necessarily bad

  • Ads that reflect your interests; not seeing repeated ads
  • But: ease of gathering so much data so easily ⇒

concern of losing control how it’s used

– Content shared with friends doesn’t just stay with friends … – You really don’t have a good sense of just what you’re giving away …