1
Protecting Privacy by Spying on Users
Andrew Patrick
& Information Security Group
(Larry Korba, Ronggong Song, George Yee, Scott Buffett, Yunli Wang, Liqiang Geng, Steve Marsh, Hongyu Liu)
Protecting Privacy by Spying on Users Andrew Patrick & - - PDF document
Protecting Privacy by Spying on Users Andrew Patrick & Information Security Group (Larry Korba, Ronggong Song, George Yee, Scott Buffett, Yunli Wang, Liqiang Geng, Steve Marsh, Hongyu Liu) 1 data breaches caused by laptop theft, hacks 2
1
Andrew Patrick
& Information Security Group
(Larry Korba, Ronggong Song, George Yee, Scott Buffett, Yunli Wang, Liqiang Geng, Steve Marsh, Hongyu Liu)
2
data breaches caused by laptop theft, hacks
3
Insider activities
4
Privacy laws in Canada, and federal and provincial levels. No laws yet concerning informing the public about data breaches.
5
U.S. Privacy Regulations
Numerous state laws requiring disclosure of data breaches. California just vetoed a law that would have made if an offense to retain credit card data. Minnesota is the
6
Solutions: searches, traffic monitoring, encryption
7
Social Network Analysis
Research using Social Network Analysis: “A few years back, we were conducting a Social Network Analysis (SNA) in one of IBM's global operating units with the goal to improve overall collaboration among the geographically dispersed teams that made up a world-wide organization. Next we colored the nodes in the network according to their departmental membership and asked InFlow to arrange the network based on the actual links. We were looking for the emergent organization – how work was really done – what the real structure of the organization was. Figure 1 below shows us how work was really accomplished in the organization. Two nodes/people are linked if they both confirm that they exchange and information and resources to get their jobs done. Each department involved in the study received a different color node.” See http://www.orgnet.com/emergent.html
8
Social Network Analysis and Terrorism
Social Network Analysis has proven to be useful to understanding group behavior, communication patterns, etc. This example is a post-hoc analysis of relationships among the terrorists involved with the 9/11 attacks on the US. Uncloaking Terrorist Networks by Valdis E. Krebs First Monday, volume 7, number 4 (April 2002), URL: http://firstmonday.org/issues/issue7_4/krebs/index.html Image from http://www.orgnet.com/tnet.html
9
SNA & E-mail Anomalies
Network Analysis for Detecting Unauthorizedzed Accounts. In Conference on Email and Anti-Spam (CEAS), Mountain View, CA, July 2006.
10
Action Privacy Policies Prescribed Workflow Action
Security Policies Access Control
Data Collection Resources
Activities
Context
Analysis Text
Identifiable Data Discovery
Metadata Social Network Analysis Policy Analysis
Display Audit Results
Log Interpretation
Social Network Analysis
Non-Compliance Highlighting
Action Reflexive
Prescriptive
Feedback
Controls
Social Networks Applied to Privacy (SNAP)
Schematic of the SNAP architecture.
11
Knowledge Level
Network
Raw System Calls Preprocessing Local Context Discovery PII Data Discovery Correlation Discovery File System Raw System Calls Preprocessing Local Context Discovery PII Data Discovery Correlation Discovery File System Raw System Calls Preprocessing Local Context Discovery PII Data Discovery Correlation Discovery File System
SNAP Agents
Schematic of the SNAP agents and the levels of operation.
12
SNAP Prototype
Screen capture of the first crude interface prototypes. Notice the SNA diagram showing two people sharing some documents in common.
13
While building SNAP, we also want to explore and demonstrate the value of SNA for privacy protection. So we looked for other examples of interesting social behavior related to security and privacy, and found Enron… Enron Enron filed for bankruptcy protection in the Southern District of New York in late 2001 and selected Weil, Gotshal & Manges as their bankruptcy counsel. Enron employed around 21,000 people (McLean & Elkind, 2003) and was one of the world's leading electricity, natural gas, pulp and paper, and communications companies, with claimed revenues of $111 billion in 2000. Fortune named Enron "America's Most Innovative Company" for six consecutive years. It achieved infamy at the end of 2001, when it was revealed that its reported financial condition was sustained mostly by institutionalized, systematic, and creatively planned accounting
corruption. Wikipedia
14
Enron email corpus is very popular for SNA studies. SNA is an exploratory tool whose goal is to detect and interpret patterns of social ties among people. http://jheer.org/enron/v1/
15
Method
email addresses
– “password: *” – “password is *”
– “case” (case sensitive) – “your” (your birthday)
We developed our own methods to clean the data. http://www.cs.cmu.edu/~enron/ Enron Email Dataset This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.
16
Password Sharing
contain passwords
We ended up with 642 unique instances of password sharing. The network has 500 addresses (people) and 500 connections. It is a very sparse, network, containing
17
Internal and External
This diagram shows the overall structure of the password sharing network. Blue nodes are internal, red nodes are external.
18
Password Factions
This analysis breaks the networks into factions based on number of links between
represents a different faction in the network. The black nodes are a residual faction that represents nodes that are relatively unconnected, often involving pairs of
suggesting there are two fairly distinct groups involved with password sharing that we should investigate.
19
Core and Periphery
This analysis uses different colors to represent each connected portion of the
connected, and dozens of small portions that are isolated. It is also clear, again, that there are two central areas in the core network.
20
1 12
This is an analysis of the core, connected network. There are two fairly distinct components to this network centered around two key people. Node 12 is senior person in EnronOnline responsible for creating new accounts
Node 1 is “performance management” system
Node 25 is senior manager of research
21
Sharing Anomalies
This display highlights partitions of the network that are isolated and larger than 3
22
Case 1
Case 1: purple nodes at 11 o'clock 1: 117.118 val=2.0000 1: 117.206 val=1.0000 1: 117.246 val=1.0000
23
Case 2
Case 2: red nodes at 8 o'clock
format) 1: 465.68 val=1.0000 1: 68.69 val=3.0000 1: 68.315 val=1.0000
24
Case 3
Case 3: green nodes at 12 o'clock Employee has a folder named "pswds"
about trucks * sharing of id/password for EnronOnline with colleague, possible policy violation 1: 109.110 val=1.0000 1: 116.110 val=1.0000 1: 156.110 val=1.0000 1: 253.110 val=1.0000 1: 265.110 val=1.0000 1: 495.110 val=1.0000 1: 110.273 val=1.0000
25
Case 4
Case 4: blue nodes at 9 o'clock
* internal sharing of accounts for 3 third party services
1: 288.44 val=1.0000 1: 45.44 val=1.0000 1: 44.45 val=1.0000 1: 44.489 val=4.0000
26
Conclusions
– identifying key creators/sharers of personal information (passwords) – identifying areas for remedial attention (e.g., password quality, offsite password storing) – finding possible policy violations (e.g., sharing of internal and external accounts) – finding possible insider collusion (e.g., “secret” documents)