Implementing Database Access Control Policy from Unconstrained Natural Language Text
LAS Research Presentation John Slankas June 24th, 2015
1
Policy from Unconstrained Natural Language Text LAS Research - - PowerPoint PPT Presentation
Implementing Database Access Control Policy from Unconstrained Natural Language Text LAS Research Presentation John Slankas June 24 th , 2015 Relation Extraction slides are from Dan Jurafskys NLP Course on Coursera 1 Research Path &
1
2
Policy 2012 NaturiliSE 2013 ICSE Doctoral Symposium 2013 PASSAT 2013 ACSAC 2014 ESEM 20151 RE 20143 ESEM 20142 ASE Science Journal 2013
Feasibility Classification Access Control Extraction Database Model Extraction
1 to be submitted 2 2nd Author 3 3rd Author
3
Motivation Goal Related Work Solution Studies Limitations Future Work
[Peterson 2015]
[Bennett 2015] [Husain 2015] [Redhead 2015] [Westin 2015] 4
5
Motivation Goal Related Work Solution Studies Limitations Future Work
Motivation Goal Related Work Solution Studies Limitations Future Work
6
Motivation Goal Related Work Solution Studies Limitations Future Work
7
Company IBM Location New York Date June 16, 1911 Original-Name Computing-Tabulating-Recording Co.
The Leland Stanford Junior
Stanford EQ Leland Stanford Junior University Stanford LOC-IN California Stanford IS-A research university Stanford LOC-NEAR Palo Alto Stanford FOUNDED-IN 1891 Stanford FOUNDER Leland Stanford
(acted-in ?x “E.T.”)(is-a ?y actor)(granddaughter-of ?x ?y)
ARTIFACT GENERAL AFFILIATION ORG AFFILIATION PART- WHOLE PERSON- SOCIAL PHYSICAL Located Near Business Family Lasting Personal Citizen- Resident- Ethnicity- Religion Org-Location- Origin Founder Employment Membership Ownership Student-Alum Investor User-Owner-Inventor- Manufacturer Geographical Subsidiary Sports-Affiliation
17 relations from 2008 “Relation Extraction Task”
Relations extracted from Infobox Stanford state California Stanford motto “Die Luft der Freiheit weht” …
people/person/nationality, location/location/contains people/person/profession, people/person/place-of-birth biology/organism_higher_classification film/film/genre
Examples from the WordNet Thesaurus
X and other Y ...temples, treasuries, and other important civic buildings. X or other Y Bruises, wounds, broken bones or other injuries... Y such as X The bow lute, such as the Bambara ndang... Such Y as X ...such authors as Herrick, Goldsmith, and Shakespeare. Y including X ...common-law countries, including Canada and England... Y , especially X European countries, especially France, England, and Spain...
Airlines Wagner Airlines-Wagner
{American, Airlines, Tim, Wagner, American Airlines, Tim Wagner}
M2: -1 spokesman M2: +1 said
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said Mention 1 Mention 2
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said Mention 1 Mention 2
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said Mention 1 Mention 2
1 = 2PR
Motivation Goal Related Work Solution Studies Limitations Future Work
33
Motivation Goal Related Work Solution Studies Limitations Future Work
34
Role Extraction and Database Enforcement
Text Documents Database Design Domain Knowledge Generated SQL Commands for access control Completeness and Conflict Report Traceability Report 1) Parse natural language product artifacts 2) Classify sentence 3) Extract access control elements 4) Extract database model elements 5) Map data model to physical database schema 6) Implement access control
Motivation Goal Related Work Solution Studies Limitations Future Work
35
Named Entities: A action R resource S subject Parts of Speech: NN noun VB verb Relationships: dobj direct object nn noun compound modifier nsubj nominative subject prep_for prepositional modifier – for
Motivation Goal Related Work Solution Studies Limitations Future Work
36
Motivation Goal Related Work Solution Studies Limitations Future Work
37
Specific Action nsubj dobj VB A NN S * NN R *
Generate Seed Patterns Match Subject and Resources Apply Patterns Known Subjects & Resources Access Control Rules Subject & Resource Search Pattern Extraction and Transformation Classify Patterns Pattern Set Inject Patterns
Motivation Goal Related Work Solution Studies Limitations Future Work
38
Semantic Relations: (order, nurse, lab procedure) (order_for, nurse, patient) Relational Pattern:
lab procedure nsubj prep_for dobj NN VB A R nurse NN S patient NN R can aux MD
39
Semantic Relations: (order, nurse, lab procedure) (order_object_for, lab procedure, patient) Semantic Relational Pattern:
lab procedure nsubj prep_for dobj NN VB A R nurse NN S patient NN R can aux MD
Motivation Goal Related Work Solution Studies Limitations Future Work
Motivation Goal Related Work Solution Studies Limitations Future Work
40
Access Control Rules (nurse, order, lab procedure, create) (nurse, order_for, patient, read) Database Elements: Entities: lab procedure, patient Relationship: nurse orders lab procedure lab procedure for patient Physical Database Schema: lab_procedure_tbl patient_tbl lab_procedure_patient_tbl role: nurse_rl Merged ACRs (nurse, order, lab procedure, create) (nurse, order_for, patient, read) (nurse, order, lab procedure_patient, create) (nurse, order_for, lab procedure_patient, read) Database Access Rules (nurse, lab procedure, create) (nurse, patient, read) (nurse, lab procedure_patient, create/read)
Motivation Goal Related Work Solution Studies Limitations Future Work
create role nurse_rl; grant insert on lab_procedure_tbl to nurse_rl; grant select on patient_tbl to nurse_rl; grant insert, select on lab_procedure_patient_tbl to nurse_rl; 41
Motivation Goal Related Work Solution Studies Limitations Future Work
42
Motivation Goal Related Work Solution Studies Limitations Future Work
[NaturaliSE 2013]
43
44
Motivation Goal Related Work Solution Studies Limitations Future Work
45
Motivation Goal Related Work Solution Studies Limitations Future Work
Motivation Goal Related Work Solution Studies Limitations Future Work
[PASSAT 2013] [ASE Science Journal 2013] [ACSAC 2014]
46
Document Domain Document Type # of Sentences # of ACR Sentences # of ACRs Fleiss’ Kappa iTrust Healthcare Use Case 1160 550 2274 0.58 iTrust for Text2Policy Healthcare Use Case 471 418 1070 0.73 IBM Course Mgmt Education Use Case 401 169 375 0.82 CyberChair
139 386 0.71 Collected ACP Docs Multiple Sentences 142 114 258 n/a
47
Motivation Goal Related Work Solution Studies Limitations Future Work
iTrust iTrust_t2p IBM CM CyberChair Collected Text2Policy Pattern – Modal Verb 210 130 46 71 93 Text2Policy Pattern – Passive voice w/ to Infinitive 66 21 10 39 9 Text2Policy Pattern – Access Expression 32 7 5 1 18 Text2Policy Pattern – Ability Expression 45 21 14 11 3 Number of sentences with multiple types
383 146 77 105 36 Number of patterns appearing
680 173 162 184 97 ACRs with ambiguous subjects (e.g. “system”, “user”, etc.) 193 119 139 1 13 ACRs with blank subjects 557 206 29 187 5 ACRs with pronouns as subjects 109 28 5 11 11 ACRs with ambiguous objects (e.g., entry, list, name,etc.) 422 228 45 47 34 Total Number of ACR Sentences 550 418 169 139 114 Total Number of ACR Rules 2274 1070 375 386 258
48
Motivation Goal Related Work Solution Studies Limitations Future Work
49
Motivation Goal Related Work Solution Studies Limitations Future Work
50
Motivation Goal Related Work Solution Studies Limitations Future Work
Motivation Goal Related Work Solution Studies Limitations Future Work
[ESEM 2015 (to submit)]
51
52
Motivation Goal Related Work Solution Studies Limitations Future Work
53
Motivation Goal Related Work Solution Studies Limitations Future Work
Motivation Goal Related Work Solution Studies Limitations Future Work
54
55
Motivation Goal Related Work Solution Studies Limitations Future Work
Motivation Goal Related Work Solution Studies Limitations Future Work
56
1
1
1
1
57
Motivation Goal Related Work Solution Studies Limitations Future Work
* https://github.com/RealsearchGroup/REDE
58
[Bennett 2015] Bennett, Cory. Weak Login Security at Heart of Anthem Breach. http://thehill.com/policy/cybersecurity/232158-weak-login-security-at-heart-of-anthem-breach Accessed: 3/15/2015 [Chen 1983] Chen, Peter. English Sentence Structure and Entity-Relationship Diagrams. Information Series 29: 127-149. 1983. [He 2009] He, Q. and Antón, A.I., Requirements-based Access Control Analysis and Policy Specification (ReCAPS). Information and Software Technology, vol. 51, no. 6, pp 993-1009, 2009. [Husain 2015] Husain, Azam. What the Anthem Breach Teaches US About Access Control. http://www.healthitoutcomes.com/doc/what-the-anthem-breach-teaches-us-about-access-control-
[Omar 2004] Omar, Nazlia. Heuristics-Based Entity-Relationship Modelling through Natural Language Processing, PhD Dissertation, University of Ulster, 2004. [Peterson 2015] Peterson, Andrea. 2015 is already the year of the health-care hack – and it’s only going to get
[Redhead 2015] Redhead, C. Stephen. Anthem Data Breach: How Safe Is Health Information Under HIPAA, http://fas.org/sgp/crs/misc/IN10235.pdf. Congressional Research Service Report. Accessed 3/16/2015 [Sagar 2014] Sagar, Vidhu Bhala R. Vidya and Abrirami, S. Conceptual Modeling of Natural Language Functional Requirements, Journal of Systems and Software, v 88, 25-41, 2014 [Westin 2015] Westin, Ken. How Anthem Could be Breached. http://www.tripwire.com/state-of-security/incident- detection/how-the-anthem-breach-could-have-happened/. Accessed: 3/15/2015 [Xiao 2009] Xiao, X., Paradkar, A., Thummalapenta, S. and Xie, T. Automated Extraction of Security Policies from Natural-Language Software Documents. International Symposium on the Foundations of Software Engineering (FSE), Raleigh, North Carolina, USA, 2012.
59
[Slankas 2015] Slankas, John, and Williams, Laurie, "Relation Extraction for Inferring Database Models from Natural Language Artifacts" , 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2015) to be submitted. [Slankas 2014] Slankas, John, Xiao, Xushang, Williams, Laurie, and Xie, Tao, "Relation Extraction for Inferring Access Control Rules from Natural Language Artifacts" , 2014 Annual Computer Security Applications Conference (ACSAC 2014), New Orleans, LA. [Riaz 2014b] Riaz, Maria, Slankas, John, King, Jason, and Williams, Laurie, "Using Templates to Elicit Implied Security Requirements from Functional Requirements − A Controlled Experiment", ACM / IEEE 8th International Symposium on Empirical Software Engineering and Measurement (ESEM 2014), Torino, Italy, September 18-19, 2014 [Riaz 2014a] Riaz, Maria, King, Jason, Slankas, John, and Williams, Laurie, "Hidden in Plain Sight: Automatically Identifying Security Requirements from Natural Language Artifacts", 2014 Requirements Engineering (RE 2014), Karlskrona, Sweeden, August 25-29, 2014 [Slankas 2013d] Slankas, John and Williams, Laurie, 2013. Access Control Policy Identification and Extraction from Project Documentation, Academy of Science and Engineering Science Journal Volume 2, Issue 3. p145-159. [Slankas 2013c] Slankas, John and Williams, Laurie, "Access Control Policy Extraction from Unconstrained Natural Language Text", 2013 ASE/IEEE International Conference on Privacy, Security, Risk, and Trust (PASSAT 2013), Washington D.C., USA, September 8-14, 2013. [Slankas 2013b] Slankas, John and Williams, Laurie, "Automated Extraction of Non-functional Requirements in Available Documentation", 1st International Workshop on Natural Language Analysis in Software Engineering (NaturaLiSE 2013), San Francisco, CA. [Slankas 2013a] Slankas, John, "Implementing Database Access Control Policy from Unconstrained Natural Language Text", 35th International Conference on Software Engineering - Doctoral Symposium (ICSE DS 2013), San Francisco, CA. [Slankas 2012] Slankas, John and Williams, Laurie, "Classifying Natural Language Sentences for Policy", IEEE International Symposium on Policies for Distributed Systems and Networks (POLICY 2012)
60
61
Additional Information
62
Additional Information
63
Additional Information
64
Additional Information
65
Apache OpenNLP: http://opennlp.apache.org/ Berkeley Parser: http://nlp.cs.berkeley.edu/ BLLIP (Charniak-Johnson): http://bllip.cs.brown.edu/ GATE: https://gate.ac.uk MALLET: http://mallet.cs.umass.edu/ Python Natural Language Toolkit: http://www.nltk.org/ Stanford Natural Language Parser: http://nlp.stanford.edu/ Criteria:
Additional Information
66
POS Tagging: The/DT nurse/NN can/MD order/VB a/DT lab/NN procedure/NN for/IN a/DT patient/NN ./. Parse:
(ROOT (S (NP (DT The) (NN nurse)) (VP (MD can) (VP (VB order) (NP (DT a) (NN lab) (NN procedure)) (PP (IN for) (NP (DT a) (NN patient))))) (. .)))
Typed Dependency:
det(nurse-2, The-1) nsubj(order-4, nurse-2) aux(order-4, can-3) root(ROOT-0, order-4) det(procedure-7, a-5) nn(procedure-7, lab-6) dobj(order-4, procedure-7) prep(order-4, for-8) det(patient-10, a-9) pobj(for-8, patient-10)
Additional Information
67
Expected Classification Yes No Predicted Classification Yes True Positive False Negative No False Negative True Negative
Additional Information
𝑓
𝑓
68
Fleiss’ Kappa Agreement Interpretation <= 0 Less than chance 0.01 – 0.20 Slight 0.21 – 0.40 Fair 0.41 – 0.60 Moderate 0.61 – 0.80 Substantial 0.81 – 0.99 Almost perfect
69
𝑩( 𝒕 , 𝒃 , 𝒔 , 𝒐 , 𝒎 , 𝒅 , 𝑰, 𝒒)
𝐵( 𝑜𝑣𝑠𝑡𝑓 , 𝑝𝑠𝑒𝑓𝑠 , 𝑚𝑏𝑐 𝑞𝑠𝑝𝑑𝑓𝑒𝑣𝑠𝑓 , , , 𝑊: 𝑜𝑣𝑠𝑡𝑓, 𝑝𝑠𝑒𝑓𝑠, 𝑚𝑏𝑐 𝑞𝑠𝑝𝑑𝑓𝑒𝑣𝑠𝑓; 𝐹: (𝑝𝑠𝑒𝑓𝑠, 𝑜𝑣𝑠𝑡𝑓, 𝑜𝑡𝑣𝑐𝑘); (𝑝𝑠𝑒𝑓𝑠, 𝑚𝑏𝑐 𝑞𝑠𝑝𝑑𝑓𝑒𝑣𝑠𝑓, 𝑒𝑝𝑐𝑘) ), 𝑑𝑠𝑓𝑏𝑢𝑓) 𝐵( 𝑜𝑣𝑠𝑡𝑓 , 𝑝𝑠𝑒𝑓𝑠 , 𝑞𝑏𝑢𝑗𝑓𝑜𝑢 , , , (𝑊: 𝑜𝑣𝑠𝑡𝑓, 𝑝𝑠𝑒𝑓𝑠, 𝑞𝑏𝑢𝑗𝑓𝑜𝑢; 𝐹: (𝑝𝑠𝑒𝑓𝑠, 𝑜𝑣𝑠𝑡𝑓, 𝑜𝑡𝑣𝑐𝑘); (𝑝𝑠𝑒𝑓𝑠, 𝑞𝑏𝑢𝑗𝑓𝑜𝑢, 𝑞𝑠𝑓𝑞_𝑔𝑝𝑠) ), 𝑠𝑓𝑏𝑒)
70
Additional Information
Additional Information
71
Additional Information
72
% Entity_2 dobj NN VB R E Entity_1 NN E nsubj
Relationship: Association
Relationship: Aggregation / Composition
have part dobj NN VB R E whole NN E nsubj
Relationship: Inheritance
be General Entity prep of NN VB R Specific EntityNN nsubj E E
Attribute Entity NN NN A E poss
Entity-attributes
Attribute Entity NN NN A E prep_of
nsubj Entity NN E % VB Entity NN E % VB dobj prep_% Entity NN E % VB
Entity
Additional Information
73
Classify Patterns Known Entities and Relationships Generate patterns from templates Wildcard Patterns Pattern Set Pattern Search Extract Database Design Elements Manually Identified Patterns Inject Additional Patterns Extracted Access Control Rules Transform Patterns
74
Additional Information
75
Additional Information
76
Additional Information
77
Additional Information
Requirements Engineering, vol. 12, no. 2, pp. 103–120, Mar. 2007.
Access Control Privacy Audit Recoverability Availability Performance & Scalability Legal Reliability Look & Feel Security Maintenance Usability Operational Other
accessibility, accountability, accuracy, adaptability, agility, auditability, availability, buffer space performance, capability, capacity, clarity, code-space performance, cohesiveness, commonality, communication cost, communication time, compatibility, completeness, component integration time, composability, comprehensibility, conceptuality, conciseness, confidentiality, configurability, consistency, coordination cost, coordination time, correctness, cost, coupling, customer evalutation time, customer loyalty, customizability, data-space performance, decomposability, degradation of service, dependability, development cost, development time, distributivity, diversity, domain analysis cost, domain analysis time, efficiency, elasticity, enhanceability, evolvability, execution cost, extensibility, external consistency,fault-tolerance, feasibility, flexibility, formality, generality, guidance, hardware cost, impact analyzability, independence, informative-ness, inspection cost, inspection time, integrity, inter-operable, internal consistency, intuitiveness, learnability, main-memory performance, maintainability, maintenance cost, maintenance time, maturity, mean performance, measurability, mobility, modifiability, modularity, naturalness, nomadicity, observability, off-peak-period performance, operability, operating cost, peakperiod performance, performability, performance, planning cost, planning time, plasticity, portability, precision, predictability, process management time, productivity, project stability, project tracking cost, promptness, prototyping cost, prototyping time, reconfigurability, recoverability, recovery, reengineering cost, reliability, repeat ability, replaceability, replicability, response time, responsiveness, retirement cost, reusability, risk analysis cost, risk analysis time, robustness, safety, scalability, secondary storage performance, security, sensitivity, similarity, simplicity, software cost, software production time, space boundedness, space performance, specificity, stability, standardizability, subjectivity, supportability, surety, survivability, susceptibility, sustainability, testability, testing time, throughput, time performance, timeliness, tolerance, traceability, trainabilìty, transferability, transparency, understandability, uniform performance, uniformity, usability, user-friendliness, validity, variability, verifìabiìity, versatility, visibility, wrappability
78
79
Additional Information
Document Document Type Size AC AU AV LG LF MT OP PR PS RC RL SC US OT FN NA CCHIT Ambulatory Requirements Requirement 306 12 27 1 2 10 1 5 2 28 4 8 228 6 iTrust Requirement, Use Case 1165 439 44 2 2 18 2 9 9 9 55 2 734 376 PromiseData Requirement 792 164 20 36 10 50 26 89 7 75 4 12 71 101 19 340 Open EMR Install Manual Installation Manual 225 3 5 1 6 1 25 2 184 Open EMR User Manual User Manual 473 169 14 8 4 286 95 NC Public Health DUA DUA 62 1 20 4 1 41 US Medicare/Medicai d DUA DUA 140 1 26 17 5 2 108 California Correctional Health Care RFP 1893 94 120 9 85 133 94 52 13 16 13 193 14 38 987 409 Los Angeles County EHR RFP 1268 58 37 8 3 2 28 19 3 11 8 13 108 21 10 639 380 HIPAA Combined Rule CFR 2642 28 8 3 78 213 9 41 1 317 2018 Meaningful Use Criteria CFR 1435 8 116 1311 Health IT Standards CFR 1475 10 20 119 1 2 2 71 1 2 164 1146 Total 11876 979 276 57 152 68 413 207 300 100 50 43 563 148 82 3568 6076
80
81
Additional Information
82
Document Domain Number of Sentences Number of ACR Sentences Number of ACRs Fleiss’ Kappa iTrust Healthcare 1160 550 2274 0.58 iTrust for Text2Policy Healthcare 471 418 1070 0.73 IBM Course Management Education 401 169 375 0.82 CyberChair
303 139 386 0.71 Collected ACP Documents Multiple 142 114 258 n/a
Additional Information
83
Additional Information
(VB root(NN nsubj)(NN dobj))
(VB root(NN nsubjpass))
(VB root(NN nsubj)(NN prep))
(VB root(NN dobj))
(VB root(NN prep_%))
84
Additional Information
Ambiguity Occurrence % in ACR Sentences Pronouns 3.2% “System” / “user” 11.0% No explicit subject 17.3% Other ambiguous terms 21.5% Missing objects 0.2%
85
Additional Information
System Open Conference System Version 2.3.6, released May 28th, 2014 Language PHP Supported DBMSs MySQL, PostgreSQL Architecture Web-based application Number of PHP files 1557 Number lines in PHP files 22198 Number of application defined roles 7 Number of database tables 52 Number of fields in database tables 369
86
Additional Information
87
Additional Information
88
Motivation Goal Related Work Solution Studies Limitations Future Work
Top 10 ACR Extraction Errors Number Times Missed Error Type Pattern 89 FN ( % VB root ( % NN dobj )) 36 FN ( % VB root ( % PRP nsubj )( % NN dobj )) 20 FN ( % VB root ( % NN prep_% )) 18 FN ( % VB root ( % NN nsubj )( % NN dobj )) 17 FP ( % VB root ( % NN nsubjpass )) 12 FN ( % VB root ( % PRP nsubj )( % NN prep_% )) 8 FP ( % VB root ( % PRP nsubj )( % NN dobj )) 5 FN ( allow VB root ( % PRP dobj )( % VB dep ( % NN dobj ))) 5 FN ( % VB root ( % NN nsubj )( % NN prep_% )) 5 FN ( % VB root ( % NN nsubjpass ))
89
Additional Information
computeVertexDistance(Vertex a, Vertex b) 1: if a = NULL or b = NULL return 1 2: if a.partOfSpeech <> b.partOfSpeech return 1 3: if a.parentCount <> b.parentCount return 1 4: for each parent in a.parents 5: if not b.parents.contains(parent) return 1 6: if a.lemma = b.lemma return 0 7: if a and b are numbers, return 0 8: if ner classes match, return 0 9: wnValue = wordNetSynonyms(a.lemma,b.lemma) 10: if wnValue > 0 return wnValue 11: return 1
90
91
Source: Privacy Rights Clearinghouse[2]
93
94