ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project - PowerPoint PPT Presentation

ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Motivation

The realities of mp3 widespread use low quality vs. bit rate when compared to modern codecs Vision for row-mp3 backwards compatible with mp3 for easy adoption higher quality minimal data rate increase

Approach

Coding the difference between the original and mp3 impracticality of lossless approach (mp3HD) Exploiting features specific to the difference data "noisy"/largely stochastic "flat" spectrum (Take a Listen to the difference files) Use ID3 tags in the metadata section of mp3 store up to 16 megabytes of data (ID3v2.x) use TXXX user defined text information tag row-mp3 ignorant players will play the mp3 as usual while proper decoders will play a higher quality file

Overview Encoder Decoder

Flow Diagram for ROW.mp3 Encoder

Flow Diagram for ROW.mp3 Decoder

Implementation Noise shaping Non-stochastic error coding Huffman coding Using the ID3 tags Time matching error and mp3 Dependencies

Noise shaping Exploit the "helpful" parts of noise and hearing humans can't differentiate between noise signals noisiness is (somewhat) easily measured hearing is on a per-critical-band basis Don't code noise, just code noise level in each band level estimate based on spectral flux Decode by synthesizing weighted noise signal overlap-add to prevent discontinuities interpolation between noise levels

Synthesized noise spectrum

Non-stochastic (tonal) error coding Tonal component separation is difficult complex algorithms with high cost works poorly for high-noise signals (like coding error) Instead, use "inverse flux" look for stationary spectral components quotient approach for smoother output power parameter determines repeat importance Code tonal error with PAC at low bit rate simple signal makes PAC's job easier

Huffman coding row-mp3 applies Huffman coding to the noise level data 25 floating-point numbers per block of 1024 samples reduces the mantissas size by ~50% (when quantized 4 bits) ...assuming we generate a Huffman table specific to each given sound file the Huffman table is not very big, it's okay potentially also be applied to the PAC coding stage experimenting with PAC coding at 0.3bits/samp using 3 scale and 2 mantissa bits: mantissas coding: ~70% of original scale factors coding: ~90% of the original

Huffman coding: modules huffmanCode.py creates a Huffman binary tree given a list of dictionary data (symbol, frequency) pairs for quick look-up of symbols and codes, also creates two dictionaries from this tree: Symbol2Code Code2Symbol trainNoise method in trainData.py input: array of entire noise level output: Code2Symbol dictionary Huffman-coded quantized noise values

Using the ID3 tags ID3 tag specifications each tag can hold up to 16 MB TXXX user defined text information tag tags can only hold unicode strings use Python pickle module to serialize as strings use eyeD3 Python library Store extra data for error + noise in ID3v2.x tags arrays of mantissas, scales, bit allocation for PAC-coded error Huffman-encoded noise levels Huffman table

Time matching error and mp3

Dependencies LAME v3.98.3 wav to mp3 encoder mpg123 v1.10.1 mp3 to wav decoder eyeD3 v0.6.17 ID3 tag manipulation scipy v0.8.0 wav file reading/writing

Evaluation Data Rate Analysis Listening Test

Data Rate Analysis Error levels 25 bands, 8 bits per band, 1024 samples per block, 44100 samples per second, 50% Huffman coding gain = 4 kbps per channel PAC tonal error .2 bits per sample = 8 kbps per channel Total data rate mp3 data rate per channel + 12 kbps per channel

Listening Test: MUSHRA Formats: Audio Sources: Reference file Dance/electronic music (lossless, 44.1khz 16 bit PCM) Pop/country music 3.5 khz low-pass filtered reference Rock/blues music (as required by MUSHRA) Glockenspiel 128 kbps mp3 Harpsichord 128 kbps row-mp3 Male Speech 64 kbps mp3 Castanets 64 kbps row-mp3 320 kbps mp3 https://ccrma.stanford.edu/~craffel/etc/mp3challenge/

Listening Test: Results Preference for row-mp3 for low bitrate for music 64 kbps row-mp3 ranked significantly higher for "complex"/music signals 128 kbps row-mp3 ranked roughly equivalent

Future Work

An intelligent algorithm which analyzes an mp3 file and predicts the error in absence of the original lossless file Noise synthesis in the time domain with a scaled filter bank rather than using random complex numbers in the frequency domain Block switching when extracting the noisy component to deal with poor coding of transients Direct coding of missing transients in the time domain A more intelligent tonal algorithm with better reconstruction in the time domain A perceptual audio codec for the tonal component which is especially well suited for low data rates and coding highly tonal sound Application of Huffman coding for the perceptual audio coder component to further reduce the file size

Conclusion

In summary row-mp3 does the following: (lossless audio file) - (mp3) => ID3 tag of mp3 Backwards-compatible with the mp3 Small storage size Exploited the noisy nature of the error: Passed quantized, Huffman coded per-critical band noise level values For the remainder of error: Basic tonal extraction and used a standard perceptual audio coder to decrease file size. With some potential improvements, the row-mp3 codec could provide a viable, backwards-compatible solution to low-quality mp3s at low bit rates.

Acknowledgments

Special thanks to: Professor Bosi for great lectures, advice, and feedback Craig Sapp for help on course materials All who participated in the "mp3 challenge"!

ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project - PowerPoint PPT Presentation

ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 Motivation The realities of mp3 widespread use low quality vs. bit rate when compared to modern codecs Vision for row-mp3 backwards compatible with mp3 for easy

Additional Intro of MP3 Jinda Han April 1, 2020 Getting Started Clone and initial the MP3 git

Thursday, March 5 Director's Row 1 Director's Row 2 Director's Row 3 Director's Row 4

Friday, March 6 Director's Row 1 Director's Row 2 Director's Row 3 Director's Row 4 Director's

Row: The Third Scott Kircher Volition, Inc. Saints Row: The Third Saints Row 2 vs. The Third

English for presentation pdf mp3 Download English for presentation pdf mp3 Intel PRO1000 LAN

MP3

Network Controllable MP3 Player BRADY THORNTON & JASON BROWN (GROUP 12) Goal A user-friendly

MP3 By Overview Definition History MPEG standards MPEG 1 / 2 Layer III

The Creation of Saints Row Saints Row 's Open World Cityscape: 's Open World Cityscape: The

Session 12: git Citations and Slides in Markdown P . S. Langeslag 17 January 2019 No Headings

Row together Row in the right direction Row faster Jason Yip @jchyip jcyip@thoughtworks.com

Orthogonal projections of row spaces P B denotes the operator that projects the row space of a

Right of Way Clean-up JPB Citizens Advisory Committee December 18, 2019 Agenda Item #10 Right

MP3 Player 4840 Final Project Presentation -Zheng Lai -Zhao Liu -Quan Yuan -Meng Li

Background Music For Powerpoint Presentation Mp3 Rip a track from your audio CD, using Windows

NPTEL VIDEO COURSES (672) IN SUPPLEMENTARY FORMATS PDF Slides of MP4, Audio Lectures (MP3),

Huffman Coding Variable Rate Codes Example: David A. Huffman (1951) Huffman coding uses

School and EA Network Meeting Spring 2020 Enterprise Adviser Network Update Team update EA

A.G.M. 17th January 2020 Agenda 1. Introductions 2. To discuss and agree the PMFs aims and

Year three evaluation December 2018 LWN Hub evaluation partners Small Kings College London

FRIENDSWOOD DOWNTOWN ECONOMIC DEVELOPMENT CORPORATION (FDEDC) FRIENDS OF DOWNTOWN FRIENDSWOOD

Ma Machine chine Lear arning ning for r Auton tonomous mous Dr Driving ving Nasser r

Kingwood Area Mobility Study Lake Houston Redevelopment Authority (TIRZ #10) Stakeholder Meeting

SC DEPARTMENT OF COMMERCE The South Carolina Department of Commerce (SCDOC) is the economic

ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project - PowerPoint PPT Presentation

ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 Motivation The realities of mp3 widespread use low quality vs. bit rate when compared to modern codecs Vision for row-mp3 backwards compatible with mp3 for easy

Additional Intro of MP3 Jinda Han April 1, 2020 Getting Started Clone and initial the MP3 git

Thursday, March 5 Director's Row 1 Director's Row 2 Director's Row 3 Director's Row 4

Friday, March 6 Director's Row 1 Director's Row 2 Director's Row 3 Director's Row 4 Director's

Row: The Third Scott Kircher Volition, Inc. Saints Row: The Third Saints Row 2 vs. The Third

English for presentation pdf mp3 Download English for presentation pdf mp3 Intel PRO1000 LAN

MP3

Network Controllable MP3 Player BRADY THORNTON &amp; JASON BROWN (GROUP 12) Goal A user-friendly

MP3 By Overview Definition History MPEG standards MPEG 1 / 2 Layer III

The Creation of Saints Row Saints Row 's Open World Cityscape: 's Open World Cityscape: The

Session 12: git Citations and Slides in Markdown P . S. Langeslag 17 January 2019 No Headings

Row together Row in the right direction Row faster Jason Yip @jchyip jcyip@thoughtworks.com

Orthogonal projections of row spaces P B denotes the operator that projects the row space of a

Right of Way Clean-up JPB Citizens Advisory Committee December 18, 2019 Agenda Item #10 Right

MP3 Player 4840 Final Project Presentation -Zheng Lai -Zhao Liu -Quan Yuan -Meng Li

Background Music For Powerpoint Presentation Mp3 Rip a track from your audio CD, using Windows

NPTEL VIDEO COURSES (672) IN SUPPLEMENTARY FORMATS PDF Slides of MP4, Audio Lectures (MP3),

Huffman Coding Variable Rate Codes Example: David A. Huffman (1951) Huffman coding uses

School and EA Network Meeting Spring 2020 Enterprise Adviser Network Update Team update EA

A.G.M. 17th January 2020 Agenda 1. Introductions 2. To discuss and agree the PMFs aims and

Year three evaluation December 2018 LWN Hub evaluation partners Small Kings College London

FRIENDSWOOD DOWNTOWN ECONOMIC DEVELOPMENT CORPORATION (FDEDC) FRIENDS OF DOWNTOWN FRIENDSWOOD

Ma Machine chine Lear arning ning for r Auton tonomous mous Dr Driving ving Nasser r

Kingwood Area Mobility Study Lake Houston Redevelopment Authority (TIRZ #10) Stakeholder Meeting

SC DEPARTMENT OF COMMERCE The South Carolina Department of Commerce (SCDOC) is the economic

Network Controllable MP3 Player BRADY THORNTON & JASON BROWN (GROUP 12) Goal A user-friendly