The State of Ady0 Cmprshn Scott Selfon Senior Development Lead Xbox - - PowerPoint PPT Presentation

the state of ady0 cmprshn
SMART_READER_LITE
LIVE PREVIEW

The State of Ady0 Cmprshn Scott Selfon Senior Development Lead Xbox - - PowerPoint PPT Presentation

Squeeze Play: The State of Ady0 Cmprshn Scott Selfon Senior Development Lead Xbox Advanced Technology Group | Microsoft Agenda Why compress? The tools at present Measuring success A glimpse of the future The Philosophy of


slide-1
SLIDE 1

Squeeze Play: The State of Ady0 Cmprshn

Scott Selfon Senior Development Lead Xbox Advanced Technology Group | Microsoft

slide-2
SLIDE 2

Agenda

  • Why compress?
  • The tools at present
  • Measuring success
  • A glimpse of the future
slide-3
SLIDE 3

The Philosophy of Compression

slide-4
SLIDE 4

The tools of the present

  • Black box codecs
  • Parameters that may or may not have well-

understood meaning

  • Results that may or may not be appropriate
  • Compression targets
  • Iteration slow enough to be discouraged
  • Bulk quality settings
slide-5
SLIDE 5

Compression formats, ca. 2012

  • Lossless codecs (<3:1): FLAC, Apple Lossless
  • Lossy codecs
  • “Reductions” (up to ∞:1): sample rate, bit depth,

channel count, noise floor, culling

  • Time domain: A-law/u-law, ADPCM (~4:1)
  • Perceptual (6-40+:1): MP3, Ogg Vorbis, XMA, etc.
  • Hybrids (vary): AAL, WavPack, MP3 variants
slide-6
SLIDE 6

PCM

  • Pulse Code Modulation
  • Analog signal regularly

sampled and stored digitally

  • Bit depth: Storage representation of a sample
  • Linear PCM = linear quantization
  • Sampling rate: Frequency of analog signal

capture or reproduction

  • Nyquist frequency (SR/2)

Yes, still compression!

slide-7
SLIDE 7

PCM and Quantization

  • Frequency quantization
  • 44,100 Hz can represent sound frequencies up to

22,050 Hz

  • Amplitude quantization
  • 16 bits: 20 log

216 2 = ~90 dB range

  • 8 bits: 20 log

28 2 = ~42 dB range

slide-8
SLIDE 8

PCM A-Law/µ-Law (G.711)

  • Pulse Code Modulation (1972, ITU 1988)
  • Adds compander support
  • A-Law (13 bit signed8 bit signed)
  • µ-Law (14 bit signed8 bit signed)
  • Encodes location of most significant non-zero

bit, drops one or more LSBs

  • Designed for telephony (8 kHz, 8 bit)
slide-9
SLIDE 9
  • Adaptive Differential

Pulse Code Modulation (ITU 1970s, IMA 1990s)

  • Stores difference between samples
  • Quantized to a step size lookup table
  • ~4:1 compression (16 bits4 bits)
  • Cheap to decode on CPU, straightforward to

HW accelerate

ADPCM (G.726)

slide-10
SLIDE 10

ADPCM Artifacts

  • Codec assumption: Signal slope doesn’t

change suddenly

  • Poor response to transients,

quick attacks

  • Settling time before silence
  • Challenged particularly at

lower sampling rates (<32 kHz)

  • Step size quantization errors

PCM source ADPCM

  • utput
slide-11
SLIDE 11

Perceptual Compression

  • MP3, WMA, XMA, AAC,

Ogg Vorbis, ATRAC, AC-3…

  • Psychoacoustic: based on

human frequency sensitivities

  • Frequency-domain compression
  • Take advantage of limits of

auditory perception

slide-12
SLIDE 12

Perceptual Compression Strategies

  • Frequency sensitivities
  • Nominally 20 kHz,
  • ften realistically 16 kHz
  • Most sensitive to speech range
  • Absolute threshold of hearing
  • Masking
slide-13
SLIDE 13

Acoustic Masking

  • Frequency Masking
  • Time Masking
  • Forward masking
  • Backward masking

A narrow 1200 Hz noise band masks sounds at higher frequencies (Scharf 1975)

20 50 100 200 500 1000 2000 4000 8000 16000

slide-14
SLIDE 14

Perceptual Codec Artifacts

  • Time  frequency domain artifacts
  • Window size limits accuracy

for transients: ringing or pre-echoes

  • Loss of phase information: warbles, ‘underwater’
  • Channel collapse/recreation artifacts
  • Spatial loss and cross-talk
slide-15
SLIDE 15

Game-Specific Perceptual Artifacts

(Or, Games are from Mars, Codecs are from Venus)

  • Pitch shifting
  • Mixing / Synchronization
  • Repetition and Reuse
  • Looping
slide-16
SLIDE 16

New Dog, Old Tricks

  • Sample rate reduction
  • Bit depth reduction
  • Channel reduction
  • Normalization

…can all be less effective (or ineffective) with perceptual codecs

slide-17
SLIDE 17

Choosing a Compression Format

  • Support (device platform, middleware)
  • Performance tradeoffs (CPU or hardware)
  • Licensing (or lack thereof)
slide-18
SLIDE 18

Evaluating Codec Capabilities

  • Storage and bandwidth
  • Decode latency
  • Multichannel support (and leveraging)
  • Looping accuracy
  • Seamless seeking
  • Perceptual quality
slide-19
SLIDE 19

Measuring Success

  • Critical listening and perceptual codecs
slide-20
SLIDE 20

Squeeze Play: The Game Show Which wave is more compressed?

A B C

PCM (46 KB) XMA q60 (8 KB, ~6:1) ADPCM (12.5 KB, ~3.6:1)

slide-21
SLIDE 21

Which wave is more compressed?

Input (44.1 kHz PCM) 1.85 MB Output (XMA, quality 1) 140 KB [13:1 compression] Output (xWMA, 48 kbps) 76 KB [24:1 compression]

A B

slide-22
SLIDE 22

Measuring Success

  • Critical listening and perceptual codecs
  • Visual evaluations
slide-23
SLIDE 23

Input (32 kHz PCM) 298 KB Output (ADPCM) 82 KB [3.6:1 compression] Output (xWMA, 20 kbps) 16 KB [18.6:1 compression] Output (XMA, quality 1) 28 KB [10.6:1 compression]

A B C

Which wave is more compressed?

slide-24
SLIDE 24

Measuring Success

  • Critical listening and perceptual codecs
  • Visual evaluations
  • Delta evaluations (Taylor, 2011)
slide-25
SLIDE 25

Delta Evaluations

slide-26
SLIDE 26

Measuring Success

  • Critical listening and perceptual codecs
  • Visual evaluations
  • Delta evaluations (Taylor, 2011)
  • Automated evaluation
  • PESQ/POLQA (ITU-T Rec. P.863)
  • PEAQ (ITU BS.1387-1)
  • Noise to Mask Ratio (NMR)
slide-27
SLIDE 27

NMR Evaluation

  • Noise to Mask Ratio
  • Windowed evaluation
  • f Signal-to-Mask

Ratio (SMR) minus Signal-to-Noise Ratio (SNR)

NMR at three XMA quality settings (Mathews 2012)

slide-28
SLIDE 28

The Compression of the Future?

  • Self-correcting/adjusting compression
  • Communicating more with less
  • Linguistic sounds and speech synthesis
  • MIDI music: the revenge?
  • Parameterized procedural synthesis
  • Case study: impacts
slide-29
SLIDE 29

Impacts

  • Resonant decay + transient
  • Compress as modes + residual (>150:1)

Lloyd, Raghuvanshi, Govindaraju (ACM, 2011)

= +

Residual (“Noise”) Modal (“Clean”) Original

frequency time

slide-30
SLIDE 30

Conclusions

  • Know thy artifacts
  • And use appropriate techniques to counter
  • What’s the playback context?
  • More robust qualitative evaluation
  • Avoid the ‘bulk’ knob
  • Consider automating listening tests
slide-31
SLIDE 31

Questions?

scottsel@microsoft.com

Xbox LIVE Gamertag: Timmmmmay