FIXING BUFFERBLOAT IN WIFI STATUS AND NEXT STEPS Toke - - PowerPoint PPT Presentation

fixing bufferbloat in wifi
SMART_READER_LITE
LIVE PREVIEW

FIXING BUFFERBLOAT IN WIFI STATUS AND NEXT STEPS Toke - - PowerPoint PPT Presentation

FIXING BUFFERBLOAT IN WIFI STATUS AND NEXT STEPS Toke Hiland-Jrgensen, Karlstad University toke.hoiland-jorgensen@kau.se Netdev 2.2, Seoul, South Korea Nov 10th, 2017 Toke Hiland-Jrgensen <toke.hoiland-jorgensen@kau.se> OUTLINE


slide-1
SLIDE 1

FIXING BUFFERBLOAT IN WIFI

STATUS AND NEXT STEPS

Toke Høiland-Jørgensen, Karlstad University toke.hoiland-jorgensen@kau.se Netdev 2.2, Seoul, South Korea Nov 10th, 2017

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-2
SLIDE 2

OUTLINE

The problem 802.11 MAC protocol constraints The new mac80211 queueing structure and airtime scheduler Next steps - feedback wanted! More latency reductions Airtime policies QoS handling Configurability Summary

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-3
SLIDE 3

BUFFERBLOAT

100 200 300 400 500 Induced one-way delay (ms) 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative probability

Best qdisc still had 100ms+ of bloat on WiFi.

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-4
SLIDE 4

AIRTIME (UN)FAIRNESS

Effective transmission time and rate (for station ): Where is the effective rate of a station transmitting without collisions. Network throughput is determined by the slowest station.

T(i) R(i) i ∈ I T(i) R(i) = ⎧ ⎩ ⎨

1 |I| (i) Tdata (j) ∑j∈I Tdata

with fairness

  • therwise

= T(i) (i) R0 (i) = R0

Li (i)+ Tdata Toh

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-5
SLIDE 5

802.11 MAC PROTOCOL CONSTRAINTS

  • 1. We must handle aggregation per Traffic ID (TID)
  • 2. We must handle re-injection of packets for retransmission
  • 3. We must be able to keep the hardware busy
  • 4. We must support low-power access points (down to 32MB of RAM)
  • 5. We cannot modify clients

Also, some operations are sensible to reordering (crypto, seqno) 1 & 2 means we can't use a qdisc

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-6
SLIDE 6

WHAT IS ALREADY IN MAINLINE?

WiFi bufferbloat reduced by an order of magnitude Almost perfect airtime fairness Support in ath9k and ath10k (partial)

Linux >= 4.9 Linux < 4.9

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-7
SLIDE 7

SO HOW DID WE DO IT?

We increased the amount of queueing in the WiFi stack by a factor of 16. Queue smarter, not harder!

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-8
SLIDE 8

SERIOUSLY, HOW DOES IT WORK?

We designed a queueing structure and an airtime fairness scheduler.

QUEUEING STRUCTURE

Per-flow queueing based on FQ-CoDel Shared pool of queues to avoid memory explosion Supports per-TID dequeueing and scheduling

AIRTIME SCHEDULER

Measure actual airtime usage of each station Run a DRR-based scheduler to even them out Optimise for sparse stations

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-9
SLIDE 9

1000 *

Qdisc layer MAC layer ath9k driver

*Can be replaced with an arbitrary configuration

Per HW queue (x4) 2 aggr

FIFO

FIFO*

buf_q retry_q TID 123

Prio

buf_q retry_q

TID RR Assign TID

Retries To hardware 123

Prio

Qdisc layer (bypassed) MAC layer ath9k driver HW queue (x4)

2 aggr FIFO RR Assign TID Retries

To hardware

retry_q TID Prio Split flows 8192 (Global limit) retry_q TID

FQ-

CoDel

Prio Split flows 8192 (Global limit)

FQ-

CoDel

WiFi queueing structure (ath9k) before and aer the redesign.

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-10
SLIDE 10

EVALUATION

Four scenarios: FIFO: Default before modifications FQ-CoDel: FQ-CoDel qdisc on WiFi interface FQ-MAC: Our restructured MAC layer queues Airtime fairness: FQ-MAC + airtime fairness scheduler

OXYGEN OXYGEN OXYGEN

Fast 1 Fast 2 Slow AP

OXYGEN

Server

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-11
SLIDE 11

LATENCY (TCP)

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-12
SLIDE 12

THROUGHPUT (TCP)

Total 15 45 60

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-13
SLIDE 13

AIRTIME (UDP)

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-14
SLIDE 14

NEXT STEPS

More latency reductions Airtime policies QoS handling Configurability Feedback wanted!

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-15
SLIDE 15

MORE LATENCY REDUCTIONS

Minimise hardware buffering When aggregation is in firmware: BQL-like mechanism Otherwise: Precise aggregate scheduling Interrupt at TX start? Time-based max retransmission counter Maybe include queueing time? Dynamic aggregate sizing Many stations active → Smaller aggregates Throughput/latency tradeoff; what's the right one?

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-16
SLIDE 16

AIRTIME POLICIES

Strict fairness generally desirable, with a few exceptions. Such as: When the wireless music player is in the next room A bit more airtime can make it work When a group of clients should be limited E.g., limiting a guest network

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-17
SLIDE 17

AIRTIME POLICIES - MECHAMISM

Idea: Implement scheduling groups Stations can be assigned to groups (by userspace) Airtime is divided between groups, then between stations within Groups could also be recursive (station is always its own group) Groups can be weighted

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-18
SLIDE 18

EX: PRIORITISE SLOW STATION

G1 W=1 G2 W=1 G3 W=1 G4 W=2

OXYGEN OXYGEN OXYGEN OXYGEN

The slow station gets twice its fair airtime share

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-19
SLIDE 19

EX: LIMIT GUEST NETWORK

G1 W=1 G2 W=1

OXYGEN OXYGEN OXYGEN OXYGEN

The guest network (G2) gets only half the airtime But what if the guest network is G1?

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-20
SLIDE 20

AIRTIME POLICIES - SUMMARY

Grouping mechanism pretty expressive, but some limitations Alternative: Just allow userspace to divide airtime - How? BPF? Maybe need to move scheduling and accounting out of the fast path. Prereq: Move airtime scheduler to mac80211 ( ) Comments? patch series

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-21
SLIDE 21

QOS HANDLING

Some potential issues with current QoS handling: No admission control - potential for lockout Strict priority can be inefficient Interactions with airtime fairness

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-22
SLIDE 22

SOFT QOS ADMISSION CONTROL

VO

FQ-

CoDel

BE

FQ-

CoDel

Station

Idea: Demote entire flow if it builds a queue.

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-23
SLIDE 23

COMBINING QOS LEVELS

VO BE

Combined aggregate

Idea: Combine packets from different QoS levels in single aggregate.

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-24
SLIDE 24

QOS VS AIRTIME FAIRNESS

VO BE VO BE

Sta 1

Airtime deficit: -300 usec

Sta 2

Airtime deficit: +300 usec

Which station goes first?

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-25
SLIDE 25

CONFIGURABILITY

Currently, everything is in debugfs. How best to integrate with existing tools? Configuration (per phy): FQ knobs (packet/memory limit, quantum) Airtime flags (count airtime on TX/RX) Statistics (per sta): FQ per-tid stats (drops/marks/bytes etc) FQ multicast stats (per vif) Airtime stats (TX/RX usecs, deficit)

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-26
SLIDE 26

SUMMARY

We have: Reduced WiFi bufferbloat by an order of magnitude Achieved almost perfect airtime fairness in most cases Going forward: More latency reductions Airtime policies QoS handling Configurability

Original paper in USENIX ATC '17: with Michał Kazior, Dave Täht, Per Hurtig and Anna Brunstrom. Many thanks to Sven Eckelmann, Simon Wunderlich, Felix Fietkau, Tim Shepard, Eric Dumazet, Johannes Berg, Kalle Valo, and the numerous other contributors to the Make-Wifi-Fast and LEDE projects. Ending the Anomaly: Achieving Low Latency and Airtime Fairness in WiFi

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-27
SLIDE 27

EXTRA SLIDES

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-28
SLIDE 28

HOW TO GET AIRTIME FAIRNESS IN YOUR DRIVER

Once have been finalised and merged. Implement the wake_tx_queue driver op Set AIRTIME_ACCOUNTING HW flag Fill in airtime usage in rx_status->airtime and tx_status->tx_time Should contain time spent on transmission in microseconds Should include failed transmission attempts (on TX) TX: Get from hardware/firmware. RX: Can be calculated the patches

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-29
SLIDE 29

APPLICATION PERFORMANCE IMPACT

We evaluate: HTTP page load time performance VoIP performance (MOS values)

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-30
SLIDE 30

HTTP PAGE LOAD TIMES

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-31
SLIDE 31

VOIP TEST

QoS MOS Thrp MOS Thrp FIFO VO 4.17 27.5 4.13 21.6 BE 1.00 28.3 1.00 22.0 FQ-CoDel VO 4.17 25.5 4.08 15.2 BE 1.24 23.6 1.21 15.1 FQ-MAQ VO 4.41 39.1 4.38 28.5 BE 4.39 43.8 4.37 34.0 Airtime VO 4.41 56.3 4.38 49.8 BE 4.39 57.0 4.37 49.7 Synthetic MOS values calculated from the . ITU-T G.107 E-model

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-32
SLIDE 32

AIRTIME FAIRNESS

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-33
SLIDE 33

SPARSE STATION OPTIMISATION

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-34
SLIDE 34

30 STATIONS TEST

We cooperated with another lab to evaluate our solution 30 station testbed, one slow station (1 Mbps)

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-35
SLIDE 35

30 STATIONS

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-36
SLIDE 36

30 STATIONS - LATENCY

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-37
SLIDE 37

AIRTIME SCHEDULER

function on_tx(pkt) { station = get_station(pkt) station.deficit -= pkt.duration } function on_rx(pkt) { station = get_station(pkt) station.deficit -= calc_dur(pkt) } function schedule(hwq) { if full(hwq) { return } begin: station = list_head(station_list) if station.deficit <= 0 { station.deficit += quantum list_move_end(station, station_list) goto begin } if !station.queue { list_del(station) goto begin } queue_aggregate(station) }

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>

slide-38
SLIDE 38

CONVERTING MAC80211 TO TXQS

Johannes for converting mac80211 to use TXQs: Introduce a compatibility layer Convert multicast PS buffering Use TXQs for offchannel frames Handle monitor mode Handle non-data frames for stations & vifs Remove all the now-dead code Convert more infrastructure to TXQs

  • utlined a plan

Toke Høiland-Jørgensen <toke.hoiland-jorgensen@kau.se>