Codec 2 open source speech codec low bit rate (2400 bit/s and - - PowerPoint PPT Presentation

codec 2
SMART_READER_LITE
LIVE PREVIEW

Codec 2 open source speech codec low bit rate (2400 bit/s and - - PowerPoint PPT Presentation

Codec 2 open source speech codec low bit rate (2400 bit/s and below) applications include digital speech for HF and VHF radio fills gap in open source speech codecs beneath 5000 bit/s Why Open Source? Ham radio is an


slide-1
SLIDE 1

Codec 2

  • open source speech codec
  • low bit rate (2400 bit/s and below)
  • applications include digital speech for HF and

VHF radio

  • fills gap in open source speech codecs beneath

5000 bit/s

slide-2
SLIDE 2

Why Open Source?

  • Ham radio is an experimental service
  • we need to be able to experiment, understand,

and modify

  • open source means no license fees, e.g.

include in SDR systems for free

slide-3
SLIDE 3

Proprietary Codecs

  • come in hardware or licensed software form
  • difficult to distribute
  • they cannot be modified
  • understanding how they work is discouraged
  • modification may actually be illegal under the

license

slide-4
SLIDE 4

Codec 2 Author - David Rowe

  • Adelaide, South Australia
  • VK5DGR, first licensed over 30 years ago at

age 13

  • PhD in speech coding (1999)
  • Built some of the first real time speech codecs

in the late 1980's on early DSP chips

  • Now work full time on open software/open

hardware for developing world communications

  • http://rowetel.com
slide-5
SLIDE 5

Digital Voice Radio System

A/D codec2 enc FEC enc mod HF/VHF radio D/A codec2 dec FEC dec demod mic spk r

slide-6
SLIDE 6

Patents and Codecs

  • The authors of proprietary/patented codecs

borrowed heavily from the public domain

  • Perhaps 5% of the algorithms they use are
  • riginal and patented
  • 95% of the algorithms in these codecs are

public domain algorithms

  • To build an equivalent codec, we simply need

alternatives for the 5% that is patented

slide-7
SLIDE 7

Speech Coding

  • Take speech samples (e.g. 16 bit samples at 8

kHz sampling rate)

  • Convert to 2400 bit/s
  • What can we throw away?
  • Retain intelligible speech
  • Retain natural speech
  • Use a model of speech, send model parameters
slide-8
SLIDE 8

Model Parameter

  • example of a model parameter is pitch
  • for humans in the range 50 to 500 Hz
  • can be quantised to 7 bits
  • updated every 20 ms
  • so 7/0.02 = 350 bit/s to represent pitch
slide-9
SLIDE 9

Sinusoidal Speech Coding

Pitch Period 35 samples

  • r 4.4ms at

8kHz sample rate Time (samples) Amplitude (16 bit samples)

slide-10
SLIDE 10

Sinusoidal Speech Coding

Amplitud e (dB) Frequency (Hz) Pitch 230Hz or 4.3ms Harmonics of 230Hz

slide-11
SLIDE 11

Sinusoidal Speech Model

Amplitude 1 Phase 1 Frequency 1 Amplitude 2 Phase 2 Frequency 2 Amplitude L Phase L Frequency L

slide-12
SLIDE 12

Amplitude Modelling

  • Adjacent amplitudes have similar values
  • This leads to coding efficiencies
  • We use LPC to represent amplitudes
  • fixed number of parameters
  • LPC envelope approximates amplitudes
  • Sampled at the decoder to recover amplitudes
slide-13
SLIDE 13

Amplitude Modelling

slide-14
SLIDE 14

Encoder Block Diagram

LPC Analysis MBE Voicing est FFT 16 bit, 8kHz samples LPC to LSP LSP Quant Energy Quant Pitch est Pitch Quant LPC Correcti

  • n

2550

slide-15
SLIDE 15

Bit Allocation

  • Alpha V0.1 codec, subject to rapid change
  • 51 bits per 20ms frame, or 2550 bit/s
This image cannot currently be displayed.
slide-16
SLIDE 16

Decoder Block Diagram

Inverse FFT 16 bit, 8kHz samples Recover Harm Amps LSP to LPC Overlap Add Post Filter FFT LSPs Energ y LPC Correctio n Phase Synthesi s Voicin g

slide-17
SLIDE 17

Prior Art Summary

  • Sinusoidal Coding, Mcaulay & Quatieri, 1984
  • Linear Predictive Coding, Makhoul, 1975
  • Line Spectrum Pairs, Itakura, 1975
  • MBE Voicing, Griffin & Lim, 1988
  • Overlap Add, Tribolet & Crochiere, 1979
  • NLP Pitch Estimation, Rowe, 1999
  • LPC Amplitude Recovery (algorithm used here),

Rowe, 1991, 1999, 2009

  • Post Filter, Rowe, 2009
slide-18
SLIDE 18

Further Work

  • Better phase model and voicing estimator
  • Toll quality at 2000 bit/s
  • Lower bit rate, 2400, 1200 bit/s
  • Better background noise performance
  • FEC and non-redundant error correction
  • Integration with modem and test over radio

channels

  • Fixed point and DSP chip implementation
slide-19
SLIDE 19

Brainstorms

  • what can we do with Codec 2. HF rather than

VHF?

  • how can we get people using it?
  • work with others to integrate with modem and

FEC code

  • create a digital voice application that can run on

a laptop

  • novel combinations of codec, FEC, modulation
  • PSK31 low bit rate voice mode
  • Better than SSB on HF?