Voice Coding with Opus Koen Vos, Karsten Vandborg Srensen, Sren - - PowerPoint PPT Presentation

voice coding with opus
SMART_READER_LITE
LIVE PREVIEW

Voice Coding with Opus Koen Vos, Karsten Vandborg Srensen, Sren - - PowerPoint PPT Presentation

Voice Coding with Opus Koen Vos, Karsten Vandborg Srensen, Sren Skak Jensen, Jean-Marc Valin Two Opus presentations This talk: Voice Mode (Koen) Features Technology Listening test results Next talk: Audio Mode


slide-1
SLIDE 1

Voice Coding with Opus

Koen Vos, Karsten Vandborg Sørensen, Søren Skak Jensen, Jean-Marc Valin

slide-2
SLIDE 2

Two Opus presentations

  • This talk: Voice Mode (Koen)

○ Features ○ Technology ○ Listening test results

  • Next talk: Audio Mode (Jean-Marc)
slide-3
SLIDE 3

What is Opus?

  • Flexible speech and audio codec
  • Best-in-class performance across a wide

range of applications

  • IETF Standard RFC 6716 (Sep. 2012)
  • Royalty free
  • Open source
slide-4
SLIDE 4

Flexible Indeed

  • Bitrates from 6 to 510 kbps
  • Frame sizes from 2.5 to 60 ms
  • Narrowband to full-band (in 5 steps)
  • Speech and music
  • Mono and stereo
  • Rate control
  • Variable complexity

All changeable dynamically, signalled within the bitstream

slide-5
SLIDE 5

Merging Two Codecs

  • 1. SILK

○ Developed by Skype ○ Based on Linear Prediction ○ Efficient for voice ○ Up to 8 kHz audio bandwidth

  • 2. CELT

○ Developed by Xiph.Org ○ Based on MDCT ○ Good for universal audio/music

slide-6
SLIDE 6

Hybrid Mode

For super-wideband or full-band voice

slide-7
SLIDE 7

SILK Decoder

Standard defines only the decoder

  • Doesn’t get much simpler
slide-8
SLIDE 8

SILK Encoder

Standard includes high-quality reference implementation

slide-9
SLIDE 9

Predictive Noise Shaping Quantization

  • Linear short- and long-term prediction to

model formants and harmonics

○ Reduce entropy of residual

  • Short- and long-term emphasis filtering

○ Emphasize important spectral components ○ Reduce input noise

  • Short- and long-term noise shaping

○ Mask quantization noise

slide-10
SLIDE 10

Predictive Noise Shaping Quant. II

slide-11
SLIDE 11

Predictive Noise Shaping Quant. III

Example (short-term shaping only)

slide-12
SLIDE 12

Stereo

  • Mid-Side representation
  • Side is predicted from mid; residual coded
slide-13
SLIDE 13

Internet Robustness

  • Forward Error Correction (FEC)

○ Include coarse encoding of previous packet, for active speech

  • Flexible Error Propagation

○ Code packets more independently for channels with packet loss

  • Discontinuous Transmission (DTX)

○ Reduce packet rate during silence

  • Packet Loss Concealment (PLC)

○ Decoder side ○ Fills in DTX blanks

slide-14
SLIDE 14

FEC

slide-15
SLIDE 15

Flexible Error Propagation

  • Reduce LTP filter state at beginning of a

packet, in encoder and decoder

  • Spend more bits only during first pitch period
  • Other codecs constrain LTP filter coefficients

and spend more bits throughout the packet

slide-16
SLIDE 16

Effect of LTP scaling

slide-17
SLIDE 17

Packet Loss Example

  • Original
  • AMR-WB, 30% packet loss
  • Opus without FEC, 30% packet loss
  • Opus with FEC, 30% packet loss
slide-18
SLIDE 18

Listening Results: Narrowband

Google Mushra Test

slide-19
SLIDE 19

Listening Results: Wide/Full-Band

Google Mushra Test

slide-20
SLIDE 20

Questions?

Find all things Opus at http://www.opus-codec.org