Protecting Privacy by Spying on Users Andrew Patrick & - - PDF document

protecting privacy by spying on users
SMART_READER_LITE
LIVE PREVIEW

Protecting Privacy by Spying on Users Andrew Patrick & - - PDF document

Protecting Privacy by Spying on Users Andrew Patrick & Information Security Group (Larry Korba, Ronggong Song, George Yee, Scott Buffett, Yunli Wang, Liqiang Geng, Steve Marsh, Hongyu Liu) 1 data breaches caused by laptop theft, hacks 2


slide-1
SLIDE 1

1

Protecting Privacy by Spying on Users

Andrew Patrick

& Information Security Group

(Larry Korba, Ronggong Song, George Yee, Scott Buffett, Yunli Wang, Liqiang Geng, Steve Marsh, Hongyu Liu)

slide-2
SLIDE 2

2

data breaches caused by laptop theft, hacks

slide-3
SLIDE 3

3

Insider activities

slide-4
SLIDE 4

4

Privacy laws in Canada, and federal and provincial levels. No laws yet concerning informing the public about data breaches.

slide-5
SLIDE 5

5

U.S. Privacy Regulations

Numerous state laws requiring disclosure of data breaches. California just vetoed a law that would have made if an offense to retain credit card data. Minnesota is the

  • nly state that has such a law.
slide-6
SLIDE 6

6

Solutions: searches, traffic monitoring, encryption

slide-7
SLIDE 7

7

Social Network Analysis

Research using Social Network Analysis: “A few years back, we were conducting a Social Network Analysis (SNA) in one of IBM's global operating units with the goal to improve overall collaboration among the geographically dispersed teams that made up a world-wide organization. Next we colored the nodes in the network according to their departmental membership and asked InFlow to arrange the network based on the actual links. We were looking for the emergent organization – how work was really done – what the real structure of the organization was. Figure 1 below shows us how work was really accomplished in the organization. Two nodes/people are linked if they both confirm that they exchange and information and resources to get their jobs done. Each department involved in the study received a different color node.” See http://www.orgnet.com/emergent.html

slide-8
SLIDE 8

8

Social Network Analysis and Terrorism

Social Network Analysis has proven to be useful to understanding group behavior, communication patterns, etc. This example is a post-hoc analysis of relationships among the terrorists involved with the 9/11 attacks on the US. Uncloaking Terrorist Networks by Valdis E. Krebs First Monday, volume 7, number 4 (April 2002), URL: http://firstmonday.org/issues/issue7_4/krebs/index.html Image from http://www.orgnet.com/tnet.html

slide-9
SLIDE 9

9

SNA & E-mail Anomalies

  • A. J. O'Donnell, W. C. Mankowski, and J. Abrahamson. Using E-Mail Social

Network Analysis for Detecting Unauthorizedzed Accounts. In Conference on Email and Anti-Spam (CEAS), Mountain View, CA, July 2006.

slide-10
SLIDE 10

10

Action Privacy Policies Prescribed Workflow Action

Security Policies Access Control

Data Collection Resources

  • Databases
  • Applications
  • Files

Activities

  • Users
  • Clients

Context

  • Operation
  • User

Analysis Text

  • Personally-

Identifiable Data Discovery

  • Context
  • Semantics
  • Other Analysis

Metadata Social Network Analysis Policy Analysis

Display Audit Results

  • Dashboard
  • Meters

Log Interpretation

  • Search Tools
  • Correlation
  • Time
  • Data
  • Activity

Social Network Analysis

  • Workflow

Non-Compliance Highlighting

Action Reflexive

  • Prevention

Prescriptive

  • Warnings

Feedback

  • Workflow
  • Security
  • Privacy

Controls

Social Networks Applied to Privacy (SNAP)

Schematic of the SNAP architecture.

slide-11
SLIDE 11

11

Knowledge Level

Network

. . .

Raw System Calls Preprocessing Local Context Discovery PII Data Discovery Correlation Discovery File System Raw System Calls Preprocessing Local Context Discovery PII Data Discovery Correlation Discovery File System Raw System Calls Preprocessing Local Context Discovery PII Data Discovery Correlation Discovery File System

SNAP Agents

Schematic of the SNAP agents and the levels of operation.

slide-12
SLIDE 12

12

SNAP Prototype

Screen capture of the first crude interface prototypes. Notice the SNA diagram showing two people sharing some documents in common.

slide-13
SLIDE 13

13

While building SNAP, we also want to explore and demonstrate the value of SNA for privacy protection. So we looked for other examples of interesting social behavior related to security and privacy, and found Enron… Enron Enron filed for bankruptcy protection in the Southern District of New York in late 2001 and selected Weil, Gotshal & Manges as their bankruptcy counsel. Enron employed around 21,000 people (McLean & Elkind, 2003) and was one of the world's leading electricity, natural gas, pulp and paper, and communications companies, with claimed revenues of $111 billion in 2000. Fortune named Enron "America's Most Innovative Company" for six consecutive years. It achieved infamy at the end of 2001, when it was revealed that its reported financial condition was sustained mostly by institutionalized, systematic, and creatively planned accounting

  • fraud. Enron has since become a popular symbol of willful corporate fraud and

corruption. Wikipedia

slide-14
SLIDE 14

14

Enron email corpus is very popular for SNA studies. SNA is an exploratory tool whose goal is to detect and interpret patterns of social ties among people. http://jheer.org/enron/v1/

slide-15
SLIDE 15

15

Method

  • 517,431 email messages
  • headers parsed for From, To, Date …
  • alias substitution
  • clean duplicates left 250,641 unique messages from 31,718

email addresses

  • 63% of addresses only appear once
  • dataset scanned for password-related patterns:

– “password: *” – “password is *”

  • clean obvious non-passwords:

– “case” (case sensitive) – “your” (your birthday)

We developed our own methods to clean the data. http://www.cs.cmu.edu/~enron/ Enron Email Dataset This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.

slide-16
SLIDE 16

16

Password Sharing

  • 642 messages from 500 different addresses

contain passwords

  • 418 addresses appear only once
  • 500 different connections (arcs)
  • density = 0.2%

We ended up with 642 unique instances of password sharing. The network has 500 addresses (people) and 500 connections. It is a very sparse, network, containing

  • nly 0.2% of the connections that are possible.
slide-17
SLIDE 17

17

Internal and External

This diagram shows the overall structure of the password sharing network. Blue nodes are internal, red nodes are external.

slide-18
SLIDE 18

18

Password Factions

This analysis breaks the networks into factions based on number of links between

  • nodes. Nodes with strong links are placed within the same faction. Each color

represents a different faction in the network. The black nodes are a residual faction that represents nodes that are relatively unconnected, often involving pairs of

  • people. There is fairly good separation between the blue and red factions,

suggesting there are two fairly distinct groups involved with password sharing that we should investigate.

slide-19
SLIDE 19

19

Core and Periphery

This analysis uses different colors to represent each connected portion of the

  • network. It is clear that there is a main network of red nodes that is relatively well

connected, and dozens of small portions that are isolated. It is also clear, again, that there are two central areas in the core network.

slide-20
SLIDE 20

20

1 12

This is an analysis of the core, connected network. There are two fairly distinct components to this network centered around two key people. Node 12 is senior person in EnronOnline responsible for creating new accounts

  • - sent out passwords 74 times
  • - used only 20 different passwords
  • - some passwords used frequently
  • q#9M#npX = 30 times to 30 recipients
  • WELCOME! = 20 times to 15 recipients

Node 1 is “performance management” system

  • passwords send 62 times
  • WELCOME used 39 times, announcements about new round of evaluations
  • other passwords were random strings, e.g., KTDVWCCH, automated message reminding

Node 25 is senior manager of research

  • sent passwords 28 times
  • 14 times sent to his aol address, 5 times sent from his aol address
  • mostly self-sharing of third-party account information, e.g., subscriptions
  • password sharing of third party account with internal colleague
  • sent password and install instructions for software to colleague multiple times
  • July 16 2001, web access to Outlook mailboxes announced
  • about half of self-sharing occurred after this date
slide-21
SLIDE 21

21

Sharing Anomalies

This display highlights partitions of the network that are isolated and larger than 3

  • nodes. This might represent anomalies in sharing behavior…
slide-22
SLIDE 22

22

Case 1

Case 1: purple nodes at 11 o'clock 1: 117.118 val=2.0000 1: 117.206 val=1.0000 1: 117.246 val=1.0000

  • password protected memos and documents shared internally
slide-23
SLIDE 23

23

Case 2

Case 2: red nodes at 8 o'clock

  • login and password shared to diagnosis system problem
  • sharing of accounts for airline reservations (3 times)
  • informing new employee of id number and password (birth date in YYYYMMDD

format) 1: 465.68 val=1.0000 1: 68.69 val=3.0000 1: 68.315 val=1.0000

slide-24
SLIDE 24

24

Case 3

Case 3: green nodes at 12 o'clock Employee has a folder named "pswds"

  • assignment of id/password for 3rd party service
  • external - password for external service
  • external – receipt for auto parts purchase
  • external – receipt for conference registration
  • external – account for auto loan application
  • from personal MSN account to Enron account – account for discussion forum

about trucks * sharing of id/password for EnronOnline with colleague, possible policy violation 1: 109.110 val=1.0000 1: 116.110 val=1.0000 1: 156.110 val=1.0000 1: 253.110 val=1.0000 1: 265.110 val=1.0000 1: 495.110 val=1.0000 1: 110.273 val=1.0000

slide-25
SLIDE 25

25

Case 4

Case 4: blue nodes at 9 o'clock

  • external – account for third party service

* internal sharing of accounts for 3 third party services

  • sending account information for third party service to personal address X 4

1: 288.44 val=1.0000 1: 45.44 val=1.0000 1: 44.45 val=1.0000 1: 44.489 val=4.0000

slide-26
SLIDE 26

26

Conclusions

  • SNA useful for…

– identifying key creators/sharers of personal information (passwords) – identifying areas for remedial attention (e.g., password quality, offsite password storing) – finding possible policy violations (e.g., sharing of internal and external accounts) – finding possible insider collusion (e.g., “secret” documents)