SPiCE Workshop: Team Ping! Flexible State-Merging with Python 10th - - PowerPoint PPT Presentation

spice workshop team ping
SMART_READER_LITE
LIVE PREVIEW

SPiCE Workshop: Team Ping! Flexible State-Merging with Python 10th - - PowerPoint PPT Presentation

SPiCE Workshop: Team Ping! Flexible State-Merging with Python 10th October 2016 Chris Hammerschmidt, firstname.lastname@uni.lu Interdisciplinary Centre for Security, Reliability and Trust University of Luxembourg Context for our Participation


slide-1
SLIDE 1

SPiCE Workshop: Team Ping!

Flexible State-Merging with Python

10th October 2016 Chris Hammerschmidt, firstname.lastname@uni.lu

Interdisciplinary Centre for Security, Reliability and Trust University of Luxembourg

slide-2
SLIDE 2

Context for our Participation

Core Assumptions

We live in a deterministic world where everything is regular

We assume: Everything is generated by a PDFA.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 1 / 30

slide-3
SLIDE 3

State-Merging for PDFA I

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 2 / 30

slide-4
SLIDE 4

State-Merging for PDFA II

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 3 / 30

slide-5
SLIDE 5

State-Merging for PDFA III

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 4 / 30

slide-6
SLIDE 6

State-Merging for PDFA IV

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 5 / 30

slide-7
SLIDE 7

State-Merging for PDFA V

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 6 / 30

slide-8
SLIDE 8

State-Merging for PDFA VI

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 7 / 30

slide-9
SLIDE 9

State-Merging for PDFA VII

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 8 / 30

slide-10
SLIDE 10

State-Merging for PDFA VIII

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 9 / 30

slide-11
SLIDE 11

State-Merging for PDFA IX

Reminder: Merging Conceptually

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 10 / 30

slide-12
SLIDE 12

State-Merging for PDFA X

Reminder: Merging Conceptually

Animation graphics made by Sicco Verwer.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 11 / 30

slide-13
SLIDE 13

Our Results

Submitting baselines and our algorithms

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 12 / 30

slide-14
SLIDE 14

Implementation and Algorithms

A Python library for state-merging

Taken from GI-learning: an optimized framework for grammatical inference by P . Cottone, M. Ortolani, G. Pergola in Proceedings of the 17th International Conference on Computer Systems and Technologies

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 13 / 30

slide-15
SLIDE 15

Piggyback on scikit-learn with our existing Tool: dfasat

SVM Estimator in sklearn

from sklearn import svm # get t r a i n i n g samples and labels X_samples , Y_labels = get_data ( ) # i n i t i a l i z e c l a s s i f i e r c l f = svm .SVC(gamma=0.001 , C=100) # learn and p re di c t c l f . f i t ( X_samples , Y_labels ) c l f . p re di c t ( sequence )

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 14 / 30

slide-16
SLIDE 16

Piggyback on scikit-learn with our existing Tool: dfasat

SVM Estimator in sklearn

from sklearn import svm # get t r a i n i n g samples and labels X_samples , Y_labels = get_data ( ) # i n i t i a l i z e c l a s s i f i e r c l f = svm .SVC(gamma=0.001 , C=100) # learn and p re di c t c l f . f i t ( X_samples , Y_labels ) c l f . p re di c t ( sequence )

DFASAT Estimator in sklearn

from dfasat import DFASATEstimator # get t r a i n i n g samples and labels X_samples , Y_labels = get_data ( ) # i n i t i a l i z e c l a s s i f i e r estimator = DFASATEstimator (hName=" a l e rg i a " , hData=" alergia_data " , t r i e s =1 , state_count =25) # learn and p re di c t estimator . f i t ( X_samples , Y_labels ) estimator . p re d i ct ( sequence )

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 14 / 30

slide-17
SLIDE 17

Compatibility with the Rest of the World!

As well as piggybacking on scikit-learns’ features

Added bonus: scikit-learn infrastructure:

◮ cross-validation ◮ ensembles ◮ grid-search ◮ ...

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 15 / 30

slide-18
SLIDE 18

Flexible State-Merging

Why we care about our implementation.

◮ GI is often a heuristic process, intends to recover/converge to a

target

◮ what to do if there is no clear target? ◮ what to do if we have extra information from an application field?

Our approach: Change the heuristic!

◮ Use case: Distilling/privileged data ◮ windspeedprediction, protocol reverse engineering

E.g. learn from a tuple < a,b >,a ∈ A,b ∈ B and only classify/predict

  • nly from a ∈ A.
  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 16 / 30

slide-19
SLIDE 19

Flexible State-Merging

Why we care about our implementation.

◮ GI is often a heuristic process, intends to recover/converge to a

target

◮ what to do if there is no clear target? ◮ what to do if we have extra information from an application field?

Our approach: Change the heuristic!

◮ Use case: Distilling/privileged data ◮ windspeedprediction, protocol reverse engineering

E.g. learn from a tuple < a,b >,a ∈ A,b ∈ B and only classify/predict

  • nly from a ∈ A.
  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 16 / 30

slide-20
SLIDE 20

Flexible State-Merging

What do we mean by it?

Consistency check, score calculation, summary statistic collection on a a tree of node objects managed by a state-merger.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 17 / 30

slide-21
SLIDE 21

Flexible State-Merging

How does it work?

Plug and play evaluation functions.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 18 / 30

slide-22
SLIDE 22

Flexible State-Merging

Example: Evidence Driven Heuristic

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 19 / 30

slide-23
SLIDE 23

Flexible State-Merging

Example: Evidence Driven Heuristic

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 19 / 30

slide-24
SLIDE 24

Flexible State-Merging

Example: Mealy Machine Heuristic

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 20 / 30

slide-25
SLIDE 25

Encapsulating C++ Code in Python

Lessons learned

It’s very easy ... ... to shoot yourself in the foot.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 21 / 30

slide-26
SLIDE 26

Encapsulating C++ Code in Python

Lessons learned

It’s very easy ... ... to shoot yourself in the foot.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 21 / 30

slide-27
SLIDE 27

Evaluation of Wrappers

◮ Around a dozen wrappers using different methods ◮ How did we decide? ◮ Performance: our own benchmark suite ◮ Ease of use

ctypes Direct access to shared compiled libraries (gcc -shared -fPIC) SWIG Code generator, automatically creates bindings from C++ headers Boost.Python Interface library, explicit mappings, works both ways

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 22 / 30

slide-28
SLIDE 28

Evaluation of Wrappers

ctypes swig boost 1 2 3 4 5 ·10−2 seconds Read a global variable ctypes SWIG Boost 0.5 1 1.5 2 2.5 ·10−2 seconds Call a foreign function ctypes SWIG Boost 0.5 1 1.5 2 2.5 seconds Iteration ctypes SWIG Boost 1 2 3 4 5 seconds Recursion

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 23 / 30

slide-29
SLIDE 29

Boost.Python I

Py++

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 24 / 30

slide-30
SLIDE 30

Boost.Python I

The ultimate goal: rapid prototyping heuristics.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 25 / 30

slide-31
SLIDE 31

Implementation

Jupyter notebooks

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 26 / 30

slide-32
SLIDE 32

Package Criticism

Shortcomings

◮ np.array input format is weird, as columns are not features ◮ we also provide ensemble methods as we don’t have serialization

yet and the student graduated

bag = BaggingClassifier ( estimator=DFASATEstimator , n=50,

  • u t p u t _ f i l e = f i l e _ p r e f i x ,

random_seed=True , random_counts =[5 , 15 , 25] , hData= ’ alergia_data ’ , hName= ’ a l erg i a ’ , symbol_count =5 , # −y state_count =5 , # −t parameter =0.5 , # −p method=1 # − m ) f i t = bag . f i t ( train_data_x , train_data_y , subset=True )

We should team up to create a sequence-prediction counterpart to scikit-learn.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 27 / 30

slide-33
SLIDE 33

Conclusion

◮ It’s possible to benefit from the existing ecosystem, but it doesn’t

come for free.

◮ Flexible state-merging has a lot of unexplored use-cases.

Returning to the topic: SPiCE

◮ we were got at software traces, and really bad at NLP

problems–no surprise

◮ we did ok on most synthetic datasets, but they were generated by

a slightly more expressive model

◮ I think we can leverage grammatical information in our framework,

but it requires some additional feature engineering

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 28 / 30

slide-34
SLIDE 34

Conclusion

◮ It’s possible to benefit from the existing ecosystem, but it doesn’t

come for free.

◮ Flexible state-merging has a lot of unexplored use-cases.

Returning to the topic: SPiCE

◮ we were got at software traces, and really bad at NLP

problems–no surprise

◮ we did ok on most synthetic datasets, but they were generated by

a slightly more expressive model

◮ I think we can leverage grammatical information in our framework,

but it requires some additional feature engineering

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 28 / 30

slide-35
SLIDE 35

Thank You!

Also to Sicco Verwer, Benjamin Loos1, and the team.

Time for questions.

1and supervisors Thomas Engel and Radu State

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 29 / 30

slide-36
SLIDE 36

Remarks I

dfasat is currently on the python package index, and can be installed via pip install dfasat although it’s not quite production-ready. Talk to me about your needs in a tool.

  • C. Hammerschmidt (SnT)

Team Ping! @ SPiCE ICGI 2016 10th October 2016 30 / 30