europython 2020
play

EuroPython 2020 c e d o Real Time Machine Learning with Python - PowerPoint PPT Presentation

@ A x S a u EuroPython 2020 c e d o Real Time Machine Learning with Python Alejandro Saucedo | as@seldon.io Twitter: @AxSaucedo @ my name is Alejandro A Hello, x S a u c e d o Engineering Director Seldon Technologies


  1. @ A x S a u EuroPython 2020 c e d o Real Time Machine Learning with Python Alejandro Saucedo | as@seldon.io Twitter: @AxSaucedo

  2. @ my name is Alejandro A Hello, x S a u c e d o Engineering Director Seldon Technologies Chief Scientist The Institute for Ethical AI & ML Head of Solutions Eng & Sci Eigen Technologies Software Engineer & DevX Lead Bloomberg LP Alejandro Saucedo

  3. Seldon: OSS Production @ A x S a u ML Deployment c e d o

  4. @ A The Institute for Ethical AI x S a u c e & Machine Learning d o

  5. @ We are part of the LFAI A x S a u c e d o

  6. @ A Today x S a u c e d o ● Conceptual intro to stream processing ● Machine learning for real time ● Tradeoffs across tools ● Hands on use-case

  7. @ A Real Time Reddit Processing x S a u c e d o ● Real time ML model for reddit comments ● 200k comments for training model ● /r/science comments removed by mods We will be fixing the front page of the internet

  8. @ A A trip to the past present: ETL x S a u c e d o E - Extract T - Transform L - Load

  9. @ A Variations x S a u c e d o ● ETL - Extract Transform Load ● ELT - Extract Load Transform ● EL - Extract Load ● LT - Load Transform ● WTF - LOL

  10. @ A Specialised Tools x S a u c e d o

  11. @ A x S a u c e d o EL ETL ELT Nifi Oozie Elasticsearch Flume Airflow Data Warehouse … Jupyter notebook?

  12. @ A Batch VS Streaming x S a u c e d o The spectrum of data processing

  13. @ A Batch VS AND Streaming x S a u c e d o The right tool for the challenge

  14. @ A Unifying Worlds x S a u c e d o Massive drive on converging worlds

  15. @ A Streaming Concepts: Windows x S a u c e d o Processing of batches in real time

  16. @ A Streaming Concepts: Checkpoints x S a u c e d o Keeping track of stream progress

  17. @ A Streaming Concepts: Watermarks x S a u c e d o Considering data that comes late in windows and stream batches

  18. @ A Some Stream Processing Tools x S a u c e d o ● Flink (Multiple Languages) ● Kafka Streams (Multiple Languages) ● Spark Stream (Multiple Languages) ● Faust (Python) ● Apache Beam (Python)

  19. @ A Today we’re using x S a u c e d o Stream Processing ML Serving ML Training

  20. @ A Machine Learning Workflow x S a u c e d o

  21. @ A Model Training x S a u c e d o Clean Text clean_text_transformer = CleanTextTransformer() spacy_tokenizer = SpacyTokenTransformer() SpaCy Tokenizer tfidf_vectorizer = TfidfVectorizer( min_df=3, max_features=1000, preprocessor=lambda x: x, tokenizer=lambda x: x, token_pattern=None, TFIDF Vectorizer ngram_range=(1, 3), use_idf=1, smooth_idf=1, sublinear_tf=1) lr_model = LogisticRegression(C=1.0, verbose=True) Logistic Regression

  22. @ A Model Training “You are a DUMMY!!!!!” x S a u c e d o x_train_clean = \ clean_text_transformer.transform(x_train) “You are dummy” x_train_tokenized = \ spacy_tokenizer.transform(x_train_clean) tfidf_vectorizer.fit( [ PRON, IS, DUMB ] x_train_tokenized[TOKEN_COLUMN].values) x_train_tfidf = \ tfidf_vectorizer.transform( [ 1000, 0100, 0010 ] x_train_tokenized[TOKEN_COLUMN].values) lr_model.fit(x_train_tfidf, y_train) pred = lr_model.predict(x_test_tfidf) [ 1 ]

  23. @ A More on EDA & Model Evaluation x S a u c e d o https://github.com/axsaucedo/reddit-classification-exploration/

  24. @ A Overview of Components x S a u c e d o Queue Reddit Source Topic: Topic: Topic: reddit_stream prediction alert ML Service Processor: Processor: seldon model fetch_stream ml_predict Stream processor

  25. @ A Generating comments x S a u c e d o @app.timer(0.1) Queue async def generate_reddit_comments(): Reddit Source reddit_sample = await fetch_reddit_comment() Topic: Topic: Topic: reddit_stream prediction alert reddit_data = { "id": reddit_sample["id"].values[0], "score": int(reddit_sample["score"].values[0]), ... # Cut down for simplicity } ML Service Processor: await app.topic("reddit_stream").send( seldon model fetch_stream key=reddit_data["id"], value=reddit_data) Stream processor

  26. @ A ML Stream Processing Step x S a u c e d o @app.agent(app.topic("reddit_stream")) Queue async def predict_reddit_content(tokenized_stream): async for key, comment_extended in tokenized_stream.items(): Reddit Source tokens = comment_extended["body_tokens"] Topic: Topic: Topic: reddit_stream prediction alert probability = seldon_prediction_req(tokens) data = { "probability": probability, "original": comment_extended["body"] } ML Service await app.topic("reddit_prediction").send( Processor: seldon model key=key, ml_predict value=data) Stream processor if probability > MODERATION_THRESHOLD: await reddit_mod_alert_topic.send( key=key, value=data)

  27. @ A ML Model Request Step x S a u c e d o sc = SeldonClient( Queue gateway_endpoint="istio-ingress.istio-system.svc.cluster.local", Reddit Source deploment_name="reddit-model", namespace="default") Topic: Topic: Topic: reddit_stream prediction alert def seldon_prediction_req(tokens): data = np.array(tokens) output = sc.predict(data=data) return output.response["data"]["ndarray"] ML Service Processor: Processor: seldon model fetch_stream ml_predict Stream processor

  28. @ A Overview of Seldon Model Serving x S a u c e d o

  29. @ import dill A x S a Wrapping u from ml_utils import CleanTextTransformer, SpacyTokenTransformer c e d o class RedditClassifier: ML def __init__(self): self._clean_text_transformer = CleanTextTransformer() models self._spacy_tokenizer = SpacyTokenTransformer() for with open('tfidf_vectorizer.model', 'rb') as model_file: self._tfidf_vectorizer = dill.load(model_file) Serving with open('lr.model', 'rb') as model_file: self._lr_model = dill.load(model_file) with def predict(self, X, feature_names): clean_text = self._clean_text_transformer.transform(X) spacy_tokens = self._spacy_tokenizer.transform(clean_text) Seldon tfidf_features = self._tfidf_vectorizer.transform(spacy_tokens) predictions = self._lr_model.predict_proba(tfidf_features) return predictions

  30. @ A Overview of Components x S a u c e d o Queue Reddit Source Topic: Topic: Topic: reddit_stream prediction alert ML Service Processor: Processor: seldon model fetch_stream ml_predict Stream processor

  31. @ A Recap of Today x S a u c e d o ● Conceptual intro to stream processing ● Machine learning for real time ● Tradeoffs across tools ● Hands on use-case

  32. EuroPython 2020 Real Time Machine Learning with Python Alejandro Saucedo | as@seldon.io @AxSaucedo

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend