A practical approach of different programming techniques to - - PowerPoint PPT Presentation

a practical approach of different programming techniques
SMART_READER_LITE
LIVE PREVIEW

A practical approach of different programming techniques to - - PowerPoint PPT Presentation

A practical approach of different programming techniques to implement a real-time application using Django Dipl.-Math. Sebastian Stigler sebastian.stigler@hs-aalen.de Marina Burdack, MSc marina.burdack@hs-aalen.de Aalen University of Applied


slide-1
SLIDE 1

A practical approach of different programming techniques to implement a real-time application using Django

Dipl.-Math. Sebastian Stigler sebastian.stigler@hs-aalen.de Marina Burdack, MSc marina.burdack@hs-aalen.de Aalen University of Applied Sciences, Germany

slide-2
SLIDE 2

Motivation

Dipl.-Math. Stigler, Burdack, MSc 2/18

slide-3
SLIDE 3

How far do we get with an Python only approach? Tool to configure and run DA / ML pipeline

Datasource Preprocessing Tasks Machine Learning Tasks Presentation of the Result

Aims for the Django Application

Dipl.-Math. Stigler, Burdack, MSc 3/18

slide-4
SLIDE 4

The Focus of this Paper

The preprocessing part of the pipeline. How does our application scale? What are the knobs we can use to scale the application?

Dipl.-Math. Stigler, Burdack, MSc 4/18

slide-5
SLIDE 5

Preprocessing App (in german)

Source: own graphic Dipl.-Math. Stigler, Burdack, MSc 5/18

slide-6
SLIDE 6

Methodology

Dipl.-Math. Stigler, Burdack, MSc 6/18

slide-7
SLIDE 7

Singlethreaded ✗ Multithreaded [4, 8] ✗ Multiprocessing ✓ Distributed Task Queue ✓

Types of Scaling

Dipl.-Math. Stigler, Burdack, MSc 7/18

slide-8
SLIDE 8

Multiprocessing Workflow

Source: own graphic

The Multiprocessing Pool is realized with the ProcessPoolExecuter Class of the concurrent.futures module [5] from Python 3.7’s Standard Library.

Multiprocessing

Dipl.-Math. Stigler, Burdack, MSc 8/18

slide-9
SLIDE 9

Celery Workflow

Source: own graphic

The Task Queue is realized with Celery 4.3 [7] and Redis [6].

Task Queue

Dipl.-Math. Stigler, Burdack, MSc 9/18

slide-10
SLIDE 10

Processing of a chained Task

Source: own graphic

Structure of a chained Task

Dipl.-Math. Stigler, Burdack, MSc 10/18

slide-11
SLIDE 11

Queueing Theory [1]

A queue with c servers is stable (won’t grow without bound) if the following equation holds: ρ = λ cµ < 1 (1) Where ρ is the server utilization, λ is the arrival rate and µ is the service rate (the inverse of the service time) for one task.

The Math

Dipl.-Math. Stigler, Burdack, MSc 11/18

slide-12
SLIDE 12

Evaluation

Dipl.-Math. Stigler, Burdack, MSc 12/18

slide-13
SLIDE 13

750′000 Measurements (rows) from a Davis Weatherstation

33 value/row in total 26 of them with numerical values

75 − 750′000 Messages (msg) are the output of the buffer with a rows/msg rate from 10000 down to 1 16 Subtasks

Prepare and Result Task 6 Tasks which directly uses methods from Pandas [3] 8 Tasks which uses preprocessing methods form scikit-learn [2]

Test Data

Dipl.-Math. Stigler, Burdack, MSc 13/18

slide-14
SLIDE 14

Mean Servicetime per Message

100 101 102 103 104 rows/msg 102 103 104 105 106 s

saturation

task_prepare (GEN)

task queue multiprocessing groundtruth 100 101 102 103 104 rows/msg 102 103 104 105 106 s

saturation

task_fillna_zero (PAN)

task queue multiprocessing groundtruth 100 101 102 103 104 rows/msg 102 103 104 105 106 s

saturation

task_normalizer (SKN)

task queue multiprocessing groundtruth

Mean Servicetime per Row

100 101 102 103 104 rows/msg 100 101 102 103 104 105 s

saturation

task_prepare (GEN)

task queue multiprocessing groundtruth 100 101 102 103 104 rows/msg 100 101 102 103 104 105 s

saturation

task_fillna_zero (PAN)

task queue multiprocessing groundtruth 100 101 102 103 104 rows/msg 100 101 102 103 104 105 s

saturation

task_normalizer (SKN)

task queue multiprocessing groundtruth

Source: own graphic

Test Runs

Dipl.-Math. Stigler, Burdack, MSc 14/18

slide-15
SLIDE 15

Conclusion

Dipl.-Math. Stigler, Burdack, MSc 15/18

slide-16
SLIDE 16

Python Libraries a sophisticated enough for scaling real-time applications. Buffering incomming datarows can compensate overhead for Task Queues.

λ µ < c determine’s the scaling for the application.

All results are applicable to the machine learning process too.

Dipl.-Math. Stigler, Burdack, MSc 16/18

slide-17
SLIDE 17

Thank you for your attention!

This was

A practical approach of different programming techniques to implement a real-time application using Django

Dipl.-Math. Sebastian Stigler sebastian.stigler@hs-aalen.de Marina Burdack, MSc marina.burdack@hs-aalen.de

Dipl.-Math. Stigler, Burdack, MSc 17/18

slide-18
SLIDE 18

[1]

  • U. Narayan Bhat. An Introduction to Queueing Theory. Modelling and Analysis in
  • Applications. Birkhäuser Basel, 2015. doi: 10.1007/978-0-8176-8421-1.

[2]

David Cournapeau and contriburors. scikit-learn. url: https://scikit-learn.org.

[3]

Wes McKinney et al. Pandas. Python Data Analysis Library. url:

https://pandas.pydata.org/.

[4]

Python Software Foundation. Thread State and the Global Interpreter Lock. url:

https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock.

[5]

Brian Quinlan. concurrent.futures — Launching parallel tasks. url:

https://docs.python.org/3/library/concurrent.futures.html.

[6]

Salvatore Sanfilippo and contriburors. Redis. url: hppts://redis.io.

[7]

Ask Solem and contributors. Celery: Distributed Task Queue. url:

www.celeryproject.org.

[8]

Thomas Wouters. GlobalInterpreterLock. url:

https://wiki.python.org/moin/GlobalInterpreterLock.

References I

Dipl.-Math. Stigler, Burdack, MSc 18/18