Moven Machine/Deep Learning Models Distribution Relying on the - PowerPoint PPT Presentation

Moven Machine/Deep Learning Models Distribution Relying on the Maven Infrastructure Sergio Fernández (Redlink GmbH) November 14th, 2016 - Sevilla

SSIX aims to exploit the predictive power of Social Media on Financial Markets

High-Level Technical Architecture Data Collection Analysis Pipeline Storage Dashboard RESTful API ...apps X-Scores Further details at http://ssix-project.eu/

Models in SSIX and Redlink In Redlink , particularly in the SSIX project , we deal with quite deep neural networks that produce very large models (several Gigabytes). Therefore we thought how to address two problems: 1. How to properly manage its distribution and versioning? 2. How to automatizate its testing ?

Moven https://bitbucket.org/ssix-project/moven moven = models + maven The thesis why we started to work on Moven was the lack of proper state-of-the-art technology for addressing the two needs described before (distributing and testability). Some examples: ● TensorFlow public models use a regular git repository ● In Spark ML most of the people use a shared storage (e.g., HDFS) ● OpenNLP also bundle them as JARs ● Freeling uses a share folder from the native installation packages ● Some other proprietary methods... As Maven does a great work for software artifact, we decided to reuse that infrastructure for models too.

Moven key features ● Model agnostic ● Publication based on a regular Maven plugin ● Distribution relying on the existing Maven infrastructure ○ benefiting of all the features provided by existing tooling (access control, mirroring, etc) ● Retrieval current supported in: ○ Java (Maven of course) ○ Python (relying on jip) ● Built-in gzip-based compression

Related work There are some interesting work related with our goals: ● StandordNLP has recently ( >3.5.2 ) changed to bundle the modules as Maven artifacts ● TensorFlow Serving helps to deploy new algorithms for TensorFlow models ● PipelineIO combines several technologies (Spark, NetflixOSS, etc; they call it the PANCAKE STACK ) to provide models distribution, including incremental training, among many other features (more details).

Publish Moven models Create a regular Maven artifact, <plugin> placing the models at <groupId>io.redlink.ssix.moven</groupId> <artifactId>moven-maven-plugin</artifactId> src/main/models , just including <version>0.1.0-SNAPSHOT</version> a plugin configuration: <executions> <execution> <phase>process-resources</phase> <goals> <goal>copy-models</goal> </goals> </execution> </executions> </plugin> Then normally deploy your artifacts with mvn deploy

Using your Moven models From Java : From Python : ● ● Declare the dependency to your Install it: pip install moven ● models at your pom.xml Declare at models.txt your ● Then models will be available in models in your project (as we do your classpath: with requirements.txt ) with a syntax similar to Groovy's Grape: this.getClass().getClassLoader() .getResourceAsStream("META-INF/resources/models/foo.ex") io.redlink.ssix.moven:moven-syntaxnet-example:1.0-SNAPSHOT ● Also exposed via HTTP as static ● Execute moven models.txt to resources when the JAR is deployed retrieve all models to ./moven in any Servlet >=3.0 container organized by artifactId . (inspired by James Ward and the ● Thought-out specifically for WebJars project). container deployments

let’s play https://www.flickr.com/photos/gsfc/3533864222

Current status and future Moven is still in a very early stage, but already being used in production in SSIX and other Redlink projects. We will keep exploring such approaches to find a way to better manage the lifecycle of the models that drive our information extraction (Natural Language Processing, Machine Learning, Deep Learning, etc) stack. For example, we want to target more specific needs in some concrete environments, such as Apache Spark and/or Apache Beam Runners API.

Gracias!

Sergio Fernández Software Engineer sergio.fernandez@redlink.co https://www.wikier.org/ Redlink GmbH http://redlink.co Coworking Salzburg Jakob Haringer Straße 3 5020 Salzburg (Austria) project partially funded by the European Union’s Horizon 2020 research and innovation programme, under grant agreement no. 645425

Moven Machine/Deep Learning Models Distribution Relying on the - PowerPoint PPT Presentation

Moven Machine/Deep Learning Models Distribution Relying on the Maven Infrastructure Sergio Fernndez (Redlink GmbH) November 14th, 2016 - Sevilla SSIX aims to exploit the predictive power of Social Media on Financial Markets High-Level

j -stretched ideals and Sallys Conjecture Paolo Mantero Purdue University Joint work(s) with

Crypto and Security Project of Strategic Japanese- Indian Cooperative Program on

Robust Calibration of Radio Interferometers in Non-Gaussian Environment V. Ollier, M. N. El

WEM Reform Implementation Group (WRIG) Meeting #2 7 May 2020 Ground rules and virtual meeting

OIL PIPELINE LOGISTICS Jaime Cerd Instituto de Desarrollo Tecnolgico para la Industria

Automatic Reformulation in Peter J. Stuckey Overview A little bit about MiniZinc Predicates,

Industry Technical Advisory Committee Plymouth, UK, September 2017 Welcome & Introduction 29

Web Annotation Architecture and Scope @JAKEHARTNELL I love imagining the future. Post modern

Belief Formation Itzhak Gilboa Tel Aviv University and HEC, Paris ISIPTA 2015 Joint works of

Hidden Markov Models COSI 114 Computational Linguistics James Pustejovsky March 7, 2017

The 3-D-V Array: A Digital, Volumetric, 3-D Towed Hydrophone Array System Capable of Bearing and

Algorithms Theory 14 Dynamic Programming (3) Optimal binary search trees Prof. Dr. S.

Research on the Measure Method of Netizens Complaint Theme Influence for Public Decision- making

GNR607 Principles of Satellite Image Processing Instructor: Prof. B. Krishna Mohan CSRE, IIT

Dynamic Monitoring and Decision Systems (DyMonDS) Framework: Toward Making the Most Out of

Workshop 1: The Erasmus Mundus brand name (EMBN) ( ) Prof. Boas Erez Prof. Philippe

Making Applications Mobile using containers Ottawa Linux Symposium, July 2006 Cedric Le Goater

MULTIMEDIA RETRIEVAL Electronic album, Personalised electronic journals Education and Training

Search engines A search engine tries to bridge this gap Assumption: the required User

ss

Image Segmentation Perceptual and Sensory Augmented Computing Luc Van Gool, ETH Zurich With

1 2 3 4 5 6 The Graphics Processing Unit is controlled by the CPU through a direct interface

What Does Quality Mean? Operational meanings: CISC 323: Intro to Software Software does

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Moven Machine/Deep Learning Models Distribution Relying on the - PowerPoint PPT Presentation

Moven Machine/Deep Learning Models Distribution Relying on the Maven Infrastructure Sergio Fernndez (Redlink GmbH) November 14th, 2016 - Sevilla SSIX aims to exploit the predictive power of Social Media on Financial Markets High-Level

j -stretched ideals and Sallys Conjecture Paolo Mantero Purdue University Joint work(s) with

Crypto and Security Project of Strategic Japanese- Indian Cooperative Program on

Robust Calibration of Radio Interferometers in Non-Gaussian Environment V. Ollier, M. N. El

WEM Reform Implementation Group (WRIG) Meeting #2 7 May 2020 Ground rules and virtual meeting

OIL PIPELINE LOGISTICS Jaime Cerd Instituto de Desarrollo Tecnolgico para la Industria

Automatic Reformulation in Peter J. Stuckey Overview A little bit about MiniZinc Predicates,

Industry Technical Advisory Committee Plymouth, UK, September 2017 Welcome &amp; Introduction 29

Web Annotation Architecture and Scope @JAKEHARTNELL I love imagining the future. Post modern

Belief Formation Itzhak Gilboa Tel Aviv University and HEC, Paris ISIPTA 2015 Joint works of

Hidden Markov Models COSI 114 Computational Linguistics James Pustejovsky March 7, 2017

The 3-D-V Array: A Digital, Volumetric, 3-D Towed Hydrophone Array System Capable of Bearing and

Algorithms Theory 14 Dynamic Programming (3) Optimal binary search trees Prof. Dr. S.

Research on the Measure Method of Netizens Complaint Theme Influence for Public Decision- making

GNR607 Principles of Satellite Image Processing Instructor: Prof. B. Krishna Mohan CSRE, IIT

Dynamic Monitoring and Decision Systems (DyMonDS) Framework: Toward Making the Most Out of

Workshop 1: The Erasmus Mundus brand name (EMBN) ( ) Prof. Boas Erez Prof. Philippe

Making Applications Mobile using containers Ottawa Linux Symposium, July 2006 Cedric Le Goater

MULTIMEDIA RETRIEVAL Electronic album, Personalised electronic journals Education and Training

Search engines A search engine tries to bridge this gap Assumption: the required User

ss

Image Segmentation Perceptual and Sensory Augmented Computing Luc Van Gool, ETH Zurich With

1 2 3 4 5 6 The Graphics Processing Unit is controlled by the CPU through a direct interface

What Does Quality Mean? Operational meanings: CISC 323: Intro to Software Software does

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Industry Technical Advisory Committee Plymouth, UK, September 2017 Welcome & Introduction 29