 
              Metrics and models for Web performance evaluation or, How to measure SpeedIndex from raw encrypted packets, and why it matters QoE = f(QoS) This talk Dario Rossi dario.rossi@huawei.com Director, DataCom (*) Lab Brussels, Feb 1 st 2020 Huawei (*) Data Communication Network Algorithm and Measurement Technology Laboratory
Metrics and models for Web performance evaluation or, How to measure SpeedIndex from raw encrypted packets, and why it matters QoE = f(QoS) This talk Dario Rossi and, in alphabetical order, Alemnew Asrese, Alexis Huet, Diego Da Hora, Enrico Bocchi, Flavia Salutari, Florian Metzger, Gilles Dubuc, Hao Shi, Jinchun Xu, Luca De Cicco, Marco Mellia, Matteo Varvello, Renata Teixeira, Tobias Hossfeld, Shengming Cai, Vassillis Christophides, Zied Ben Houidi
Internet ISP Offering G d user Q E is a common goal
Internet ISP User App Net QoE QoS QoS For ISPs/vendors encryption makes the inference harder Detect/forecast/prevent Q oE degradation is important!
Quality at different layers Context Context influence factors User Human influence User QoE factors influences Application Application QoS System influence factors affects Network Network QoS 31/01/2020 5
Quality at different layers Context Device type Activity Location User User PLT Mean opinion (uPLT) User QoE score (MOS) Engagement influences metrics Application Page load time (PLT) Application QoS Video bitrate SpeedIndex affects Network Packet loss Latency Network QoS Bandwidth Wi-Fi quality 31/01/2020 6
Quality at different layers Context Device type Activity Location User User PLT Mean opinion (uPLT) User QoE score (MOS) influences Application HTTP/2, QUIC… Application QoS (true for any other apps) affects Network Packet loss Latency Network QoS Bandwidth Wi-Fi quality 31/01/2020 7
Metrics and models for Web performance evaluation Agenda or, How to measure SpeedIndex from raw encrypted packets, and why it matters Context User Data collection User QoE (Crowdsourcing campaign) Models (Data driven vs Expert Models) Application Application QoS Browser metrics (Instant vs Integral vs Compound) Models (From raw encrypted packets) Network Network QoS 31/01/2020 8
Metrics and models for Web performance evaluation Agenda or, How to measure SpeedIndex from raw encrypted packets, and why it matters 1. Data collection User QoE (Crowdsourcing campaign) 2. Models (Data driven vs Expert Models) 3. Browser metrics (Instant vs Integral vs Compound) 4. Method (From raw encrypted packets) Network QoS 31/01/2020 9
Data collection: Crowdsourcing campaigns https://webqoe.telecom-paristech.fr/data  Mean opinion score (MOS) Lab experiments (Award winning) “Rate your experience from Small user diversity, volounteers dataset • [PAM18] Web browsing, but artificial websites • 1-poor to 5- excellent” Artificial controlled conditions •  User perceived PLT (uPLT) Crowdsourcing (payed crowdworkers) Ongoing, with “Which of these two pages Larger userbase, but higher noise • Side-to- side videos ≠ Web browsing! • finished first ?” Artificial controlled conditions •  User acceptance Collab with Experiments from operational website “Did the page load fast Actual service users • Browsing in typical user conditions enough ?”(Yes/No) • Huge heterogeneity (devices/browsers/nets) • [WWW19] 10
Models: Data driven vs Expert models UserQoE https://webqoe.telecom-paristech.fr/models y Learn y=f(x) Fit predetermined y=f(x) x=vector of input features x=single scalar metric, generally Page Load Time (PLT) optimal f(.) selected & tuned by machine learning f(.) = pre-selected by the expert IQX Hypotesis More flexible and (slightly) more accurate [PAM18] [1] M. Fiedler et al. "A generic quantitative relationship between quality of experience and quality of service." IEEE Network , 2010 Comparison of the two models in [QoMEX-18] [INFOCOM19] Weber Fechner Standard ITU-T G1030 Still room for improvement (see [WWW19] ) https://www.itu.int/rec/T-REC-G.1030/en 11
Browser metrics: Time Instant vs Time Integral (1/2) t=DOM, page structure loaded www.iceberg.com 1 x(t) Visual Progress TB t DOM ATF PLT 12 * Images by vvstudio, vectorpocket, Ydlabs / Freepik
Browser metrics: Time Instant vs Time Integral (1/2) t=ATF, visible portion (aka Above the Fold) loaded www.iceberg.com 1 Visual Progress t ATF DOM PLT 13 * Images by vvstudio, vectorpocket, Ydlabs / Freepik
Browser metrics: Time Instant vs Time Integral (1/2) t=ATF, visible portion (aka Above the Fold) loaded www.iceberg.com 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) t ATF DOM PLT 14 * Images by vvstudio, vectorpocket, Ydlabs / Freepik
Browser metrics: Time Instant vs Time Integral (1/2) t=ATF, visible portion (aka Above the Fold) loaded www.iceberg.com 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) t ATF DOM PLT 15 * Images by vvstudio, vectorpocket, Ydlabs / Freepik
Browser metrics: Time Instant vs Time Integral (1/2) t=PLT, all page content loaded www.iceberg.com 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) t DOM ATF PLT 16 * Images by vvstudio, vectorpocket, Ydlabs / Freepik
Browser metrics: Time Instant vs Time Integral (2/2)  SpeedIndex, RUMSI, PSSI › Processing intensive › Only at L7 (in browser) › Visual progress metric  ObjectIndex, ByteIndex and ImageIndex › Lightweight › ByteIndex also at L3 (in network) › Higly correlated with SpeedIndex SpeedIndex ? ImageIndex › Possibly far from user QoE ? %of visual ObjectIndex % of bytes ByteIndex completeness % of objects % of bytes of images (histogram, downloaded downloaded rectangles or SSim) 31/01/2020 17
Browser metrics: Time Instant vs Time Integral (2/2)  SpeedIndex, RUMSI, PSSI › Processing intensive x’(t ) › Only at L7 (in browser) › Visual progress metric Same PLT but slower loading  ObjectIndex, ByteIndex and ImageIndex › Lightweight › ByteIndex also at L3 (in network) › Higly correlated with SpeedIndex › Possibly far from user QoE ? ? SpeedIndex ImageIndex %of visual ObjectIndex % of bytes ByteIndex completeness % of objects % of bytes of images (histogram, downloaded downloaded rectangles or SSim) 31/01/2020 18
Browser metrics: Time Instant vs Time Integral (2/2)  SpeedIndex, RUMSI, PSSI › Processing intensive › Only at L7 (in browser) › Visual progress metric  ObjectIndex, ByteIndex and ImageIndex Different cutoffs › Lightweight › ByteIndex also at L3 (in network) › Higly correlated with SpeedIndex › Possibly far from user QoE ? ? SpeedIndex ImageIndex %of visual ObjectIndex % of bytes ByteIndex completeness % of objects % of bytes of images (histogram, downloaded downloaded rectangles or SSim) 31/01/2020 19
Method: From raw packets to browser metrics (1/2) Single session 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) img1 t Individual DOM ATF PLT css img2 objects js htm Domain x.com 20
Method: From raw packets to browser metrics (1/2) 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) img1 t Individual DOM ATF PLT css img2 objects js htm Domain x.com Packets ??! !?! time Single session 21
Method: From raw packets to browser metrics (1/2) Single session 1 SpeedIndex Visual Progress 1 − 𝑦 𝑢 𝑒𝑢 x(t) img1 t Individual DOM ATF PLT css img2 objects js htm Domain x.com Train ML models (XGBoost , 1D-CNN) Packets ??! !?! time Single session 22
23
Method: From raw packets to browser metrics (2/2) User Browser (L7) Network (L3) Works with encryption Handle multi-sessions (not in this talk) Exact online algorithm for ByteIndex Machine learning for any metric Accurate on joint tests with Orange Accurate for unseen pages & networks Available soon into Huawei products 24
Aftermath (1/3): From raw packets to rough sentiments  Expert-driven feature engineering  Neural Networks › Explainable but inherently heuristic approach › Less interpretable but more versatile › Hard to keep in sync with application/network change › Downside: requires lots of samples.... › User feedback (e.g. MOS, user PLT, etc.) › Feed NN with x(t) signal › Smartphone sensors (eg happiness › Still lightweight estimation via facial recognition) Possible inputs Possible outputs › Feed NN using a filmstrip › Brain signals acquired with sensors › More complex › Activity of brain areas correlated with user happiness 25
Aftermath (2/3): Divide et impera  World Wild Web One average model › Huge diversity, not captured by single model  Increase accuracy › Per-page QoE models › Inherently non scalable Many per-page models  Increase accuracy & scalability › Per-page QoE models (eg Alexa top 100 pages) Alexa Top 1M, 100 clusters › Aggregate QoE models (eg 100 clusters top 1M) › Generic QoE model (for the tail up to 1B pages) 26
Recommend
More recommend