WAVILA WP3 Benchmarking Christian Kraetzer, Jana Dittmann, Andreas Lang
Motivation • Evaluation is an important research field • Promises improvements • Identifies application fields – Content protection, – Authentication, – Integrity protection, – DRM, – Annotation, . . . • Benchmarking provides recommendations – Based on the application field watermarking algorithms have to fulfil different parameter settings like: robustness/fragility, transparency, capacity, . . .
But … how can benchmarking be done? • Generally: Many ways possible to evaluate WM – subjective tests, single attacks, application scenarios, . . . • 1999, Kutter, Petitcolas, for images – Attacks: JPEG, Geometric Transform, Gamma, Histogramm, Color, Noise, etc. • Some early benchmarking tool sets: – StirMark for Images ( www.petitcolas.net/fabien/watermarking/stirmark/) – Optimark (poseidon.csd.auth.gr/optimark/download.htm) – Certimark (www.igd.fhg.de/igd-a8/projects/certimark/) – Checkmark (watermarking.unige.ch/Checkmark/) – Image WET (www.datahiding.org) • Some of the questions raised by the state of the art and answered by WP3: – How can benchmarking results be made comparable? – How can they be made interpretable for non-experts?
How can benchmarking results be made comparable? – Some measures applicable BER – Bit Error Rate BBER Bit Burst Error Rate BLER – Bit Lost Error Rate HFR/LFR – High-, Low Frequency Ratio MPSNR – Masked Peak Signal to Noise Ratio PSNR – Peak Signal to Noise Ratio RMS – Root Mean Square SNR – Signal to Noise Ratio TPE – Total Perceptual Error WJR – Wrong Judge Rate ZCR – Zero Crossing Rate Need for a definition, formalisation and measurement of watermarking properties with the aim of comparability
How can benchmarking results be made comparable? – Some test sets used • 2001, Dittmann, Fates, Fontaine, Petitcolas, Raynal, Steinebach, Seibel – 6 own, unspecified audio files • 2003, Tachibana – 3 own, unspecified audio files • 2005, Donovan, Hurley, Silvestre – 1000 own unspecified audio files, CD quality, 30s each • 2007, Steinebach – 1000 own unspecified audio files, CD quality • 2007, Wang, Huang, Yat-Sen – 5 own unspecified audio files, 44.1kHz., 16 bit, mono, 10s each Need for the generation and distribution of test sets with the aim of comparability
• Evaluation definition, measurements, strategies, etc are required … … and introduced for the example of audio watermarking!
Benchmarking framework • Theoretical benchmarking framework: definitions and formalisations • Design of application profile depending audio signal modifications (malicious/non-malicious) • Definition and formalisation of benchmarking profiles • Evaluation methodology for practical framework • Evaluation of: – Single attacks – Digital audio watermark schemes: basic profiles – Digital audio watermark schemes: application profiles • Application of the introduced framework to exemplarily selected WM schemes
The results are comparable because … • Measured properties comparable – Standardised definition of measured features – Normalisation of measured values • Evaluated watermarking schemes comparable – Measure same properties – Measure with same measurement function – Same test set
How can benchmarking results be made interpretable for non-experts? • Recommendation of watermarking schemes Audio Watermarking Algorithm Test Goal
How can benchmarking results be made interpretable for non-experts? • Application scenario specific benchmarking • Identification (and description) of relevant characteristics • Choice of easily understandable presentations/visualisations
How can benchmarking results be made interpretable for non-experts?
Practical Evaluation Results Basic Profile: Transparency and Robustness for different kinds of audio material and 6 exemplarily chosen watermarking algorithms Light gray: Transparency Dark gray: Robustness Inter Algorithm Evaluation and Analysis
Further scopes of WP3 - example: PHDG
Future Directions • Generalisation of the introduced approach for audio watermarking benchmarking to other types of media • How can benchmarking results be made interpretable for non-experts? • “Is benchmarking an academic chimera?” – scientists tend to test based on very specific theoretical assumptions
Thank you very much for your attention!
Recommend
More recommend