I think theres data, and then theres information that comes from - PowerPoint PPT Presentation

Big Data in Network and Service Management: An Opportunity for Synergy Stan Matwin, CRC Institute for Big Data Analytics Dalhousie University Halifax, NS, Canada stan@cs.dal.ca

Toni Morrison , Nobel Prize in Literature 1993 [1931-2019] I think there’s data, and then there’s information that comes from the data, and then there’s knowledge that comes from information. And then, after knowledge, there’s wisdom. I’m interested how to get from data to wisdom. 2 CNSM, Halifax, 24/10/19

Roadmap • Big Data – Birds’ Eyes View • Sample of Big Data work at Dalhousie • Some Challenges before the Big Data field • An outside view of the use of BD techniques in Networking • Traffic classification • QoS/QoE • Security • Data Centre mgmt • Issues and opportunities discussion 3 CNSM, Halifax, 24/10/19

Big Data – 5 Vs • Volume • Velocity • Variety • Veracity • Value In one minute: � 2M Google queries � 6M FB posts � 100K tweets � 1.3M video clip views � 150 Identity theft victims � 135 virus infections � More than 10 10 network- connected devices 4 CNSM, Halifax, 24/10/19

5 CNSM, Halifax, 24/10/19

Deep learning (2010-…) • “Three Musketeers” • promise of representation learning no more feature engg • 2012 ImageNet dataset success: 72% � 85% • contextual representations 6 CNSM, Halifax, 24/10/19

Deep Learning toolbox • Conv nets • Embeddings • Denoising autoencoders • Transfer learning • Generative Adversarial Networks • tSNE • … architecture engineering 7 CNSM, Halifax, 24/10/19

Some challenges before the Big Data field • Interpretability/transparency (data and algorithms) • correlation/causality • anytime algorithms • standards • need for [quality] data 8 CNSM, Halifax, 24/10/19

Big Data at Institute for Big Data Analytics @ Dal • Machine Learning [Torgo, Matwin] • Deep Learning [Oore] • Text/Web Analytics [Keslej, Milios, Matwin ] • Visualization [Paulovich] • HCI [Orji, Reilly, Malloch ] • IoT [Haque] • Applications [all of the above+ Nur ZH] OCEAN DATA 9 CNSM, Halifax, 24/10/19

Big Data at Dal: Automatic Identification System (AIS) IMO/ITU standard by by by by by by by by by by by by by by by by by by by by by by by by by by by by by by by by by Oculus for by by by by y Oc y y y y y y y y y y Oc Oc Oc Oc Ocu Ocu Ocu Oc Ocu Oc Ocu Oc Oc Oc Ocu Oc Oc Oc Oc Ocu Oc Oc Ocu Oc lus Ocu Ocu Ocu Ocu Oc Ocu Ocu Oc Ocu Oc Ocu Oc Ocu Ocu Ocu Ocu Ocu Oc Ocu Oc Ocu Oc cu cu cu cu cu cu cu cu cu cu cu u u u u u u u u l lus lu lu lu lu lus lus lus lu lus u u u u fo fo fo fo fo fo fo fo fo fo o or r r r Ma Mar Ma Ma Mar M Mar Ma Mar Mar Mar Ma Ma Mar Mar Mar Ma Mar Marlant N6, Royal Mar Mar Mar Ma Ma a ar r r lan lan lan lan lan lan lan lan lan lan lan lan lan an an an an an an an an an an an an an an an an n nt n n n n t N t t t N t N t N t N t N t t N t t t N 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, , , Roy R yal Ro Ro Ro Roy Ro Ro Roy Roy Roy Ro Roy Ro Roy Roy Ro y y al al al al al Canadi Canadian Navy Can adi a i ian an an a Nav an an an an an an an Nav N y Courtesy of ExactEarth, Inc. Institute for Big Data Analytics 400,000 ships At least 100M records/day From weak to big signal 10 CNSM, Halifax, 24/10/19

Distance to Shore Calculation • S-AIS vessel data enrichment • Naive (GIS) approach, on S-AIS dataset infeasible (10^9) = years of runtime • Revised approach: • Calculate distance values between shore and “cells” • PostGIS • Runtime: ~0.5 day for ~1M cells, but: • Database approach used is not scalable further • Accurate only to cell diameter (22km +/-11km) • Ideally distance should be calculated to individual AIS vessel positions reports directly CNSM, Halifax, 24/10/19 11

CUDA Implementation Implementation Time for 1M targets for(int i = 0; i < 1000000, i++) { Numpy 17 days C (OpenMP) 2.5 days Shore representation CUDA 15 minutes (26.7 M points) Pre-Haversine Find . . . . . . Distance[i] Post-Haversine Minimum Core i7-7700K 16 GB Main Memory NVIDIA GTX 1080 Ti } target[i] • Architecture not subject to scalability issues previously encountered • Greatly improved per-target runtime • Direct distance calculation on entire AIS dataset now feasible • Distance values for 10^9 Points calculable in ~10 days • Further gains sought through tuning of CUDA kernel size and memory streaming 12 CNSM, Halifax, 24/10/19

Big Data at Dal: Machine Learning from Passive Acoustic Monitoring data Marine mammal Short Time Fourier Transform species detection and classification • Reframe the problem of detecting whale calls from an auditory task to a visual task: train a Convolutional Neural Network (CNN) � Identify vocalizations from other species via transfer learning � No need to re-train the entire CNN! Thomas, M., Martin, B., Kowarski, K., Gaudet, B., & Matwin, S. (2019). Marine Mammal Species Classificati using Convolutional Neural Networks and a Novel Acoustic Representation. ECML-PKDD 1 2019 3

Thomas, M., Martin, B., and Matwin., S. (2019) Detecting Endangered Baleen Whales within Acoustic Recordings using Region-based Convolutional Neural Networks. Joint Workshop on AI for Social Good at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) This is what the R-CNN can see • We did not have sufficient data to train a CNN to recognize humpback whales • We have applied • transfer learning to the CNN, obtained good results 1 4

Big Data in Networking • Good fit, because there’s MASSIVE amounts of data in all aspects of networking • BUT [Boutaba et al. 18]: • networks (eg enterprise) differ a lot • change continuously • Easier with Software-defined Networks • easier data collection • easier to apply resulting control actins on legacy networks 15 CNSM, Halifax, 24/10/19

A brief look at…. • Payload-based traffic classification • QoS/QoE • IDS/ISP • Data center mgmt. 16 CNSM, Halifax, 24/10/19

Payload-based traffic classification • many applications of different techniques on different data sets • Lots of ingenious feature engineering • Bag of Flow [Zhang et al 13] • Good results often obtained with the use of • K-NN • Random Forests and Boosting • SVMs • Some methodological questions? 17 CNSM, Halifax, 24/10/19

QoS/QoE • Mapping of network flow characteristics (delay, jitter, loss ratio,…) and Mean Opinion Scores by the user • User-labeled data: always limited • Use of GANs? • Possible inspiration from internet marketing (user experience visiting a web portal) • Data privacy issues 18 CNSM, Halifax, 24/10/19

Security/anomaly detection • IDS/IPS • Progress from using KDD 99 challenge dataset • Classifying network traffic into five categories of attacks • Limitation of the classification approaches • Clustering-based methods – unsupervised anomaly detection • Flow-based vs payload-based approaches 19 CNSM, Halifax, 24/10/19

Data Center Management with ML [Salman et al. 18] • Data Centers - a key internet component • Typical optimization: From [Salman et al 18] • gather performance data • Run a linear programming algorithm finding a good solution • Take action reinforcement • augmenting flexible links, learning • turning off links, • moving traffic • on subset of the network 20 CNSM, Halifax, 24/10/19

• Several DL agents for different tasks: reward function: • Traffic engg maximize link • energy savings utilization minimize flow- • … completion time • Each runs on top of an SDN 21 CNSM, Halifax, 24/10/19

Opportunities • “Spectrogramming” and CNNs • Embeddings • Training a representation, and then • Transfer learning? • Semi-supervised learning and Distillation? • Simple vs complex methods ? • Naïve Bayesian models? • Lessons from computational advertising 22 CNSM, Halifax, 24/10/19

Some general remarks on ML in networking research • efficiency of the learned models? • Are they efficient enough to be embedded in production systems? • Combined evaluation/utility measure involving decision time? • Lack of standardized benchmark datasets 23 CNSM, Halifax, 24/10/19

Discussion … 24 CNSM, Halifax, 24/10/19

I think theres data, and then theres information that comes from - PowerPoint PPT Presentation

Big Data in Network and Service Management: An Opportunity for Synergy Stan Matwin, CRC Institute for Big Data Analytics Dalhousie University Halifax, NS, Canada stan@cs.dal.ca Toni Morrison , Nobel Prize in Literature 1993 [1931-2019] I

How Economists Think and Things They Think About How Economists Think and Things They Think About

if-then-else Statements if-then Statements General form of an if-then statement: if [boolean

Think Aloud This slideshow is inspired from Rolf Mlichs book Think aloud & Steve

Options and Configurations Anand Paurana If you think you can Or if you think you cant,

Codes Peter J. Cameron p.j.cameron@qmul.ac.uk June/July 2001 Think of a number . . . Think of

Bill of Lading Internet Log System (BILS) Think GLA First Links & Downloads Think GLA

TO SIGHT-READ OR NOT . . . THERE IS NO QUESTION! What YOU think they think of sight-reading.

Classify then Summarize or Summarize then Classify Melvin F. Janowitz DIMACS, Rutgers University

How to Study the Bible Lesson 1 [2] Them [2] Them Type [3] The Then Then Analogy Gospel

There s no s no there there there! there! There W. Hyattsville Station

THINK TANKS: PRESENTATION OUTLINE OF WHAT WE WILL BE COVERING TODAY Go over Bibliography on Think

www.vistajet-think-global.com VistaJet Think Global world tour Leg One Leg Two 2 Dec | Depart

OUR SECRET GARDEN BEFORE THE DESIGN Children's ideas. Look at gardens. Think about

#whyiteach Think about then write your answers on an index card and post: Why do you teach?

LFCS Now and Then Gordon Plotkin LFCS@30 Edinburgh, April, 2016 Gordon Plotkin LFCS Now and

Wh a t We Think Those People Revenge! Just then a lawyer stood up to test Jesus.

Leakage-Resilient Public-Key Cryptography in the Bounded-Retrieval Model Jol Alwen, Yevgeniy

Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and

Security Applica.ons of GPUs So.ris Ioannidis Founda.on for

Lecture 19 - Network Security CMPSC 443 - Spring 2012 Introduction Computer and Network Security

RECONNAISSANCE CAMPAIGNS TARGETING INDUSTRIAL CONTROL SYSTEMS By Olivier Cabana, Amr M. Youssef,

Automatic intrusion recovery with system-wide history Taesoo Kim MIT CSAIL Current focus of

How to Write a 6.033 Design Report Mya Poe 1 and Keith Winstein 2 1 MIT Program in Writing and

Page 1 Example: Branch Stall Impact Example: Calculating CPI bottom up Run benchmark and collect

I think theres data, and then theres information that comes from - PowerPoint PPT Presentation

Big Data in Network and Service Management: An Opportunity for Synergy Stan Matwin, CRC Institute for Big Data Analytics Dalhousie University Halifax, NS, Canada stan@cs.dal.ca Toni Morrison , Nobel Prize in Literature 1993 [1931-2019] I

How Economists Think and Things They Think About How Economists Think and Things They Think About

if-then-else Statements if-then Statements General form of an if-then statement: if [boolean

Think Aloud This slideshow is inspired from Rolf Mlichs book Think aloud &amp; Steve

Options and Configurations Anand Paurana If you think you can Or if you think you cant,

Codes Peter J. Cameron p.j.cameron@qmul.ac.uk June/July 2001 Think of a number . . . Think of

Bill of Lading Internet Log System (BILS) Think GLA First Links &amp; Downloads Think GLA

TO SIGHT-READ OR NOT . . . THERE IS NO QUESTION! What YOU think they think of sight-reading.

Classify then Summarize or Summarize then Classify Melvin F. Janowitz DIMACS, Rutgers University

How to Study the Bible Lesson 1 [2] Them [2] Them Type [3] The Then Then Analogy Gospel

There s no s no there there there! there! There W. Hyattsville Station

THINK TANKS: PRESENTATION OUTLINE OF WHAT WE WILL BE COVERING TODAY Go over Bibliography on Think

www.vistajet-think-global.com VistaJet Think Global world tour Leg One Leg Two 2 Dec | Depart

OUR SECRET GARDEN BEFORE THE DESIGN Children's ideas. Look at gardens. Think about

#whyiteach Think about then write your answers on an index card and post: Why do you teach?

LFCS Now and Then Gordon Plotkin LFCS@30 Edinburgh, April, 2016 Gordon Plotkin LFCS Now and

Wh a t We Think Those People Revenge! Just then a lawyer stood up to test Jesus.

Leakage-Resilient Public-Key Cryptography in the Bounded-Retrieval Model Jol Alwen, Yevgeniy

Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and

Security Applica.ons of GPUs So.ris Ioannidis Founda.on for

Lecture 19 - Network Security CMPSC 443 - Spring 2012 Introduction Computer and Network Security

RECONNAISSANCE CAMPAIGNS TARGETING INDUSTRIAL CONTROL SYSTEMS By Olivier Cabana, Amr M. Youssef,

Automatic intrusion recovery with system-wide history Taesoo Kim MIT CSAIL Current focus of

How to Write a 6.033 Design Report Mya Poe 1 and Keith Winstein 2 1 MIT Program in Writing and

Page 1 Example: Branch Stall Impact Example: Calculating CPI bottom up Run benchmark and collect

Think Aloud This slideshow is inspired from Rolf Mlichs book Think aloud & Steve

Bill of Lading Internet Log System (BILS) Think GLA First Links & Downloads Think GLA