An Incremental Learner for Language-Based Anomaly Detection in XML - - PowerPoint PPT Presentation

an incremental learner for language based anomaly
SMART_READER_LITE
LIVE PREVIEW

An Incremental Learner for Language-Based Anomaly Detection in XML - - PowerPoint PPT Presentation

An Incremental Learner for Language-Based Anomaly Detection in XML Harald Lampesberger Department of Secure Information Systems University of Applied Sciences Upper Austria harald.lampesberger@fh-hagenberg.at LangSec Workshop, 26. May 2016


slide-1
SLIDE 1

An Incremental Learner for Language-Based Anomaly Detection in XML

Harald Lampesberger

Department of Secure Information Systems University of Applied Sciences Upper Austria harald.lampesberger@fh-hagenberg.at

LangSec Workshop, 26. May 2016

slide-2
SLIDE 2

Motivation

Extensible Markup Language (XML)

  • Data serialization format for many protocols
  • SOAP/WS-*, XMPP, SAML, XHTML, RSS, Atom, ...

Schema validation is a first-line defense

  • A schema specifies types of elements and production rules
  • Validation rejects unacceptable inputs

Two language-theoretic flaws

  • 1. XML Schema (XSD) extension points are wildcards
  • 2. References raise expressiveness beyond context free

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 1/10

slide-3
SLIDE 3

XSD Extension Points

From http://schemas.xmlsoap.org/soap/envelope/

... <xs:element name="Header" type="tns:Header"/> <xs:complexType name="Header"> <xs:sequence> <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded" processContents="lax"/> </xs:sequence> <xs:anyAttribute namespace="##other" processContents="lax"/> </xs:complexType> ...

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 2/10

slide-4
SLIDE 4

Signature Wrapping Attack

Digitally signed part = processed part

  • Used in WS-Security and SAML single sign-on
  • Somorovsky et al. (2012): 11/14 SAML implementations vulnerable

soap:Envelope soap:Header wsse:Security ds:Signature ds:SignedInfo ds:Reference @URI soap:Body @wsu:Id MonitorInstances #123 123 verified, processed

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 3/10

slide-5
SLIDE 5

Signature Wrapping Attack

Digitally signed part = processed part

  • Used in WS-Security and SAML single sign-on
  • Somorovsky et al. (2012): 11/14 SAML implementations vulnerable

soap:Envelope soap:Header wsse:Security ds:Signature ds:SignedInfo ds:Reference @URI soap:Body @wsu:Id MonitorInstances #123 123 verified, processed soap:Envelope soap:Header wsse:Security ds:Signature ds:SignedInfo ds:Reference @URI Wrapper soap:Body @wsu:Id MonitorInstances soap:Body @wsu:Id CreateKeyPair #123 123 attack verified processed

Jensen et al. (2011): removing extension points is hard

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 3/10

slide-6
SLIDE 6

Language-Based Anomaly Detection

Approach: learn the acceptable language

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 4/10

slide-7
SLIDE 7

Language-Based Anomaly Detection

Approach: learn the acceptable language

  • 1. Datatyped XML Visibly Pushdown Automaton (dXVPA)
  • Mixed-content XML streaming
  • Datatypes generalize character data
  • Character-data XVPA (cXVPA) for stream validation

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 4/10

slide-8
SLIDE 8

Language-Based Anomaly Detection

Approach: learn the acceptable language

  • 1. Datatyped XML Visibly Pushdown Automaton (dXVPA)
  • Mixed-content XML streaming
  • Datatypes generalize character data
  • Character-data XVPA (cXVPA) for stream validation
  • 2. Incremental learner for grammatical inference
  • Constructs a dXVPA from examples
  • Unlearning and sanitization against poisoning attacks

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 4/10

slide-9
SLIDE 9

Language-Based Anomaly Detection

Approach: learn the acceptable language

  • 1. Datatyped XML Visibly Pushdown Automaton (dXVPA)
  • Mixed-content XML streaming
  • Datatypes generalize character data
  • Character-data XVPA (cXVPA) for stream validation
  • 2. Incremental learner for grammatical inference
  • Constructs a dXVPA from examples
  • Unlearning and sanitization against poisoning attacks
  • 3. Experiments
  • Train and test
  • Two synthetic scenarios from ToXgene
  • Two realistic scenarios from Axis2 web service

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 4/10

slide-10
SLIDE 10

dXVPAs

Event stream alphabets

  • Σcall . . . startElement
  • Σret . . . endElement
  • Σint . . . datatypes

Stack alphabet = states States partitioned into modules (schema types) Transitions in and between modules cXVPA representation

  • Unified text checks
  • Fast validation

e x Order e x Item itm/eOrder itm/xOrder itm/eOrder, itm/xOrder token, int q0 qf

  • rd/q0
  • rd/q0

<ord> <itm>Product A</itm> <itm>8877955335</itm> </ord>

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 5/10

slide-11
SLIDE 11

Incremental Learning Step

Learner computes an updated dXVPA

  • Datatyped event stream
  • Ai . . . incrementally updateable automaton
  • ωi . . . frequencies of states and transitions

Validator checks acceptance Training doci Ai, ωi dXVPAi cXVPAi Learner Validator incWeightedVPA genXVPA accept Document yes no Ai−1, ωi−1 trim . . .

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 6/10

slide-12
SLIDE 12

How Learning Works

Every event stream prefix gets a unique state

  • A named state is a pair (u, v)
  • u . . . typing-context string
  • v . . . left-sibling string

Merge two states if they are k-l-locally the same

dealer usedcars newcars ad ad ad ad ad dealer usedcars newcars model year model VW 2014 Tesla (dealer#usedcars · newcars, ad · ad) (newcars, ad) 1-1 local k l

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 7/10

slide-13
SLIDE 13

How Learning Works

Every event stream prefix gets a unique state

  • A named state is a pair (u, v)
  • u . . . typing-context string
  • v . . . left-sibling string

Merge two states if they are k-l-locally the same

dealer usedcars newcars ad ad ad ad ad dealer usedcars newcars model year model VW 2014 Tesla (dealer#usedcars · newcars, ad · ad) (newcars, ad) 1-1 local k l

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 7/10

slide-14
SLIDE 14

How Learning Works

Every event stream prefix gets a unique state

  • A named state is a pair (u, v)
  • u . . . typing-context string
  • v . . . left-sibling string

Merge two states if they are k-l-locally the same

dealer usedcars newcars ad ad ad ad ad dealer usedcars newcars model year model VW 2014 Tesla (dealer#usedcars · newcars, ad · ad) (newcars, ad) 1-1 local k l

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 7/10

slide-15
SLIDE 15

How Learning Works

Every event stream prefix gets a unique state

  • A named state is a pair (u, v)
  • u . . . typing-context string
  • v . . . left-sibling string

Merge two states if they are k-l-locally the same

dealer usedcars newcars ad ad ad ad ad dealer usedcars newcars model year model VW 2014 Tesla (dealer#usedcars · newcars, ad · ad) (newcars, ad) 1-1 local k l

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 7/10

slide-16
SLIDE 16

Poisoning Attacks

ωi . . . frequencies of states and transitions from learning Unlearning

  • An already learned attack is later identified
  • Remove specific knowledge by decrementing ωi
  • Trim zero-weight states and transitions

Sanitization

  • Hidden poisoning attacks
  • Assumption: only few of those
  • Decrement ωi and trim zero-weight states and transitions

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 8/10

slide-17
SLIDE 17

Experiments

Two synthetic and two realistic datasets Learning progress

  • Train and test, binary classification, mind changes (MC)

Catalog, k = 1, l = 2

0% 20% 40% 60% 80% 100%

F1 FPR

20 40 60

MC

20 40 60 80 100 Training iteration

VulnShopAuthOrder, k = 1, l = 2

0% 20% 40% 60% 80% 100%

F1 FPR

100 200

MC

40 80 120 160 200 Training iteration

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 9/10

slide-18
SLIDE 18

Conclusions

Learner outperformed schema validation

  • All signature wrapping attacks were detected (schema validation: 0)
  • No false positives
  • False negatives resulted from coarse XSD datatypes
  • Fast convergence

Contributions in the paper

  • dXVPA and cXVPA language representations
  • Lexical datatype system for datatype inference from text
  • Algorithms for the incremental learner
  • Details on experiments

Use cases

  • Security mechanism for any XML-based interaction
  • Especially for systems using composed schemas
  • XML firewall

Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML 10/10

slide-19
SLIDE 19

Appendix

slide-20
SLIDE 20

Lexical Datatype System

Learner needs a datatyped event stream

  • Lexically distinct XSD datatypes

instead of character data

  • 1. Lexical subsumption
  • Minimally required datatypes
  • Strict subsumption is ambiguous
  • false → {lang., bool., NCName}
  • 2. Preference heuristic
  • Datatypes partitioned into kinds
  • “Preferred” relation
  • e.g. boolean < language
  • false → {boolean}

Example

  • {1, 0, true, 33} →

{boolean, unsignedByte}

boolean unsignedByte byte language NCName duration dayTimeDuration yearMonthDuration QName Name NMTOKEN token normalizedString string ⊤ base64Binary gMonth gDay gMonthDay gYearMonth double decimal integer unsignedShort unsignedInt unsignedLong nonNegativeInteger short int long gYear nonPositiveInteger negativeInteger hexBinary anyURI dateTime dateTimeStamp time date positiveInteger NMTOKENS ENTITIES Harald Lampesberger An Incremental Learner for Language-Based Anomaly Detection in XML Appendix 1/1