SPiCE Workshop: Team Ping! Flexible State-Merging with Python 10th - - PowerPoint PPT Presentation
SPiCE Workshop: Team Ping! Flexible State-Merging with Python 10th - - PowerPoint PPT Presentation
SPiCE Workshop: Team Ping! Flexible State-Merging with Python 10th October 2016 Chris Hammerschmidt, firstname.lastname@uni.lu Interdisciplinary Centre for Security, Reliability and Trust University of Luxembourg Context for our Participation
Context for our Participation
Core Assumptions
We live in a deterministic world where everything is regular
We assume: Everything is generated by a PDFA.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 1 / 30
State-Merging for PDFA I
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 2 / 30
State-Merging for PDFA II
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 3 / 30
State-Merging for PDFA III
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 4 / 30
State-Merging for PDFA IV
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 5 / 30
State-Merging for PDFA V
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 6 / 30
State-Merging for PDFA VI
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 7 / 30
State-Merging for PDFA VII
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 8 / 30
State-Merging for PDFA VIII
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 9 / 30
State-Merging for PDFA IX
Reminder: Merging Conceptually
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 10 / 30
State-Merging for PDFA X
Reminder: Merging Conceptually
Animation graphics made by Sicco Verwer.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 11 / 30
Our Results
Submitting baselines and our algorithms
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 12 / 30
Implementation and Algorithms
A Python library for state-merging
Taken from GI-learning: an optimized framework for grammatical inference by P . Cottone, M. Ortolani, G. Pergola in Proceedings of the 17th International Conference on Computer Systems and Technologies
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 13 / 30
Piggyback on scikit-learn with our existing Tool: dfasat
SVM Estimator in sklearn
from sklearn import svm # get t r a i n i n g samples and labels X_samples , Y_labels = get_data ( ) # i n i t i a l i z e c l a s s i f i e r c l f = svm .SVC(gamma=0.001 , C=100) # learn and p re di c t c l f . f i t ( X_samples , Y_labels ) c l f . p re di c t ( sequence )
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 14 / 30
Piggyback on scikit-learn with our existing Tool: dfasat
SVM Estimator in sklearn
from sklearn import svm # get t r a i n i n g samples and labels X_samples , Y_labels = get_data ( ) # i n i t i a l i z e c l a s s i f i e r c l f = svm .SVC(gamma=0.001 , C=100) # learn and p re di c t c l f . f i t ( X_samples , Y_labels ) c l f . p re di c t ( sequence )
DFASAT Estimator in sklearn
from dfasat import DFASATEstimator # get t r a i n i n g samples and labels X_samples , Y_labels = get_data ( ) # i n i t i a l i z e c l a s s i f i e r estimator = DFASATEstimator (hName=" a l e rg i a " , hData=" alergia_data " , t r i e s =1 , state_count =25) # learn and p re di c t estimator . f i t ( X_samples , Y_labels ) estimator . p re d i ct ( sequence )
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 14 / 30
Compatibility with the Rest of the World!
As well as piggybacking on scikit-learns’ features
Added bonus: scikit-learn infrastructure:
◮ cross-validation ◮ ensembles ◮ grid-search ◮ ...
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 15 / 30
Flexible State-Merging
Why we care about our implementation.
◮ GI is often a heuristic process, intends to recover/converge to a
target
◮ what to do if there is no clear target? ◮ what to do if we have extra information from an application field?
Our approach: Change the heuristic!
◮ Use case: Distilling/privileged data ◮ windspeedprediction, protocol reverse engineering
E.g. learn from a tuple < a,b >,a ∈ A,b ∈ B and only classify/predict
- nly from a ∈ A.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 16 / 30
Flexible State-Merging
Why we care about our implementation.
◮ GI is often a heuristic process, intends to recover/converge to a
target
◮ what to do if there is no clear target? ◮ what to do if we have extra information from an application field?
Our approach: Change the heuristic!
◮ Use case: Distilling/privileged data ◮ windspeedprediction, protocol reverse engineering
E.g. learn from a tuple < a,b >,a ∈ A,b ∈ B and only classify/predict
- nly from a ∈ A.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 16 / 30
Flexible State-Merging
What do we mean by it?
Consistency check, score calculation, summary statistic collection on a a tree of node objects managed by a state-merger.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 17 / 30
Flexible State-Merging
How does it work?
Plug and play evaluation functions.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 18 / 30
Flexible State-Merging
Example: Evidence Driven Heuristic
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 19 / 30
Flexible State-Merging
Example: Evidence Driven Heuristic
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 19 / 30
Flexible State-Merging
Example: Mealy Machine Heuristic
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 20 / 30
Encapsulating C++ Code in Python
Lessons learned
It’s very easy ... ... to shoot yourself in the foot.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 21 / 30
Encapsulating C++ Code in Python
Lessons learned
It’s very easy ... ... to shoot yourself in the foot.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 21 / 30
Evaluation of Wrappers
◮ Around a dozen wrappers using different methods ◮ How did we decide? ◮ Performance: our own benchmark suite ◮ Ease of use
ctypes Direct access to shared compiled libraries (gcc -shared -fPIC) SWIG Code generator, automatically creates bindings from C++ headers Boost.Python Interface library, explicit mappings, works both ways
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 22 / 30
Evaluation of Wrappers
ctypes swig boost 1 2 3 4 5 ·10−2 seconds Read a global variable ctypes SWIG Boost 0.5 1 1.5 2 2.5 ·10−2 seconds Call a foreign function ctypes SWIG Boost 0.5 1 1.5 2 2.5 seconds Iteration ctypes SWIG Boost 1 2 3 4 5 seconds Recursion
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 23 / 30
Boost.Python I
Py++
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 24 / 30
Boost.Python I
The ultimate goal: rapid prototyping heuristics.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 25 / 30
Implementation
Jupyter notebooks
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 26 / 30
Package Criticism
Shortcomings
◮ np.array input format is weird, as columns are not features ◮ we also provide ensemble methods as we don’t have serialization
yet and the student graduated
bag = BaggingClassifier ( estimator=DFASATEstimator , n=50,
- u t p u t _ f i l e = f i l e _ p r e f i x ,
random_seed=True , random_counts =[5 , 15 , 25] , hData= ’ alergia_data ’ , hName= ’ a l erg i a ’ , symbol_count =5 , # −y state_count =5 , # −t parameter =0.5 , # −p method=1 # − m ) f i t = bag . f i t ( train_data_x , train_data_y , subset=True )
We should team up to create a sequence-prediction counterpart to scikit-learn.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 27 / 30
Conclusion
◮ It’s possible to benefit from the existing ecosystem, but it doesn’t
come for free.
◮ Flexible state-merging has a lot of unexplored use-cases.
Returning to the topic: SPiCE
◮ we were got at software traces, and really bad at NLP
problems–no surprise
◮ we did ok on most synthetic datasets, but they were generated by
a slightly more expressive model
◮ I think we can leverage grammatical information in our framework,
but it requires some additional feature engineering
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 28 / 30
Conclusion
◮ It’s possible to benefit from the existing ecosystem, but it doesn’t
come for free.
◮ Flexible state-merging has a lot of unexplored use-cases.
Returning to the topic: SPiCE
◮ we were got at software traces, and really bad at NLP
problems–no surprise
◮ we did ok on most synthetic datasets, but they were generated by
a slightly more expressive model
◮ I think we can leverage grammatical information in our framework,
but it requires some additional feature engineering
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 28 / 30
Thank You!
Also to Sicco Verwer, Benjamin Loos1, and the team.
Time for questions.
1and supervisors Thomas Engel and Radu State
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 29 / 30
Remarks I
dfasat is currently on the python package index, and can be installed via pip install dfasat although it’s not quite production-ready. Talk to me about your needs in a tool.
- C. Hammerschmidt (SnT)
Team Ping! @ SPiCE ICGI 2016 10th October 2016 30 / 30