 
              Stefano Lonardi March, 2000 Fragile watermarks for LZ- -77 77 Fragile watermarks for LZ Stefano Lonardi Stefano Lonardi U niver s it y of Cal if or nia, R iver s ide U niver s it y of Cal if or nia, R iver s ide joint work with M. M. Atallah Atallah (CERIAS & CS, Purdue U.) (CERIAS & CS, Purdue U.) joint work with Alice Alice Bob Bob Mallory Mallory Data Compression Conference 2000
Stefano Lonardi March, 2000 Problem Problem � Alice sends a document � Alice sends a document T T to Bob to Bob � She wants to make sure that what Bob � She wants to make sure that what Bob receive is receive is – Authentic Authentic – – Integral Integral – � Mallory monitors the communication Mallory monitors the communication � and he will attempt to modify T and he will attempt to modify T and and impersonate Alice impersonate Alice Signatures Signatures � � Signature requirements Signature requirements – Authentic/ Authentic/Unforgeable Unforgeable – – Not reusable Not reusable – – Cannot be repudiated Cannot be repudiated – � The signed document should be unalterable The signed document should be unalterable � (integrity) (integrity) � Typical solution involves PKC Typical solution involves PKC � Data Compression Conference 2000
Stefano Lonardi March, 2000 Information Hiding Information Hiding � Steganography Steganography � � Watermarking Watermarking � Steganography Steganography � The art/science of hiding a secret � The art/science of hiding a secret message within another one, in such a message within another one, in such a way that the adversary cannot discern cannot discern way that the adversary the presence or the content of the the presence or the content of the hidden message hidden message Data Compression Conference 2000
Stefano Lonardi March, 2000 (Robust) Watermarking (Robust) Watermarking � The art/science of hiding a secret The art/science of hiding a secret � message within another one, in such a message within another one, in such a way that the adversary cannot remove cannot remove way that the adversary the hidden message (watermark the hidden message (watermark) ) without destroying the cover without destroying the cover Example of watermarked image Example of watermarked image From research.ibm.com Data Compression Conference 2000
Stefano Lonardi March, 2000 Image Watermarking Image Watermarking � Some methods have been proved � Some methods have been proved remarkably resilient to remarkably resilient to – Lossy Lossy compression/Filtering compression/Filtering – – Cropping/Resizing Cropping/Resizing – – Scanning and printing Scanning and printing – – Repeated photocopying Repeated photocopying – (see, e.g., Cox et al., et al., IEEE TIP 97) IEEE TIP 97) (see, e.g., Cox Watermarking Watermarking � So far, most of the research has been So far, most of the research has been � focused on focused on – Images Images – – Movies Movies – – Audio Audio – – Source Code Source Code – � � Little has been done for textual data Little has been done for textual data Data Compression Conference 2000
Stefano Lonardi March, 2000 Information hiding in textual data Information hiding in textual data � It is believed that It is believed that � “… text is in many ways the most “… text is in many ways the most difficult to hide data … due largely to the difficult to hide data … due largely to the lack of redundant information in a text lack of redundant information in a text as compared with a picture or a sound as compared with a picture or a sound file …” file …” Information hiding in textual data Information hiding in textual data � Methods range from changing slightly Methods range from changing slightly � the fonts or the spacing between the fonts or the spacing between words/lines, to rewriting some words/lines, to rewriting some words/phrases of the text without words/phrases of the text without changing the semantics changing the semantics � Hiding information in textual data is a � Hiding information in textual data is a challenging problem challenging problem Data Compression Conference 2000
Stefano Lonardi March, 2000 Motivation Motivation � Lossless compression is very common Lossless compression is very common � nowadays nowadays – gzip gzip, ( , (win)zip win)zip, ( , (win)rar win)rar, compress, bzip2, , compress, bzip2, – etc. etc. � Since we are sending the document Since we are sending the document � over the network and it is likely that we over the network and it is likely that we are going to compress it anyway, why are going to compress it anyway, why not watermark the compressed file watermark the compressed file? ? not Fragile watermarks Fragile watermarks � A A fragile watermark fragile watermark is a watermark is a watermark � designed to break as soon as the designed to break as soon as the content of the document is changed content of the document is changed � An alternative way to authenticate a An alternative way to authenticate a � document and ensure that it reaches document and ensure that it reaches the destination in a integral state the destination in a integral state Data Compression Conference 2000
Stefano Lonardi March, 2000 Notation Notation � T T: : document, document, |T|= |T|=n n � � k k: : secret key secret key � � W W: : (fragile) watermark (fragile) watermark � � T’ T’: : watermarked & compressed document watermarked & compressed document � Specifications Specifications � T=T’ T=T’ (or semantically equivalent) (or semantically equivalent) � � Unless Unless k k is known is known � – it is very hard to retrieve it is very hard to retrieve W W from from T’ T’ – – it is very hard to add it is very hard to add W W to another text to another text and and – pretend to be Alice pretend to be Alice � The presence of � The presence of W W in in T’ T’ would hold up in would hold up in court (false positives are extremely rare) court (false positives are extremely rare) � � The security of the process should be The security of the process should be based solely on the secrecy of the key based solely on the secrecy of the key (Kerckhoffs Kerckhoffs’ principle) ’ principle) ( Data Compression Conference 2000
Stefano Lonardi March, 2000 Approach Approach � We propose a method that hides � We propose a method that hides W W (the (the digest of T T ) directly in the compressed ) directly in the compressed digest of file as a fragile watermark, and file as a fragile watermark, and therefore therefore – is transparent to the casual observer is transparent to the casual observer – – does not require to send separately the does not require to send separately the – signature signature � It also satisfies all the previous It also satisfies all the previous � requirements requirements Which format? Which format? � We choose Lempel We choose Lempel- -Ziv ‘77 because … Ziv ‘77 because … � … is very popular and widespread … is very popular and widespread … hiding data turns out to be very … hiding data turns out to be very elegant elegant Data Compression Conference 2000
Stefano Lonardi March, 2000 Lempel- -Ziv 77 Ziv 77 ( Lempel (gzip gzip) ) already compressed 5 6 7 0 1 2 3 4 T a b a a b a b a a b a a b a b a a b a b a (7,2,a) 0 1 2 3 4 5 6 7 T a b a a b a b a a b a a b a b a a b a b a (1,4,a) 5 6 7 0 1 2 3 4 T a b a a b a b a a b a a b a b a a b a b a history lookahead The LZ processing induces a parsing of T T into into phrases phrases The LZ processing induces a parsing of Idea Idea Data Compression Conference 2000 10
Stefano Lonardi March, 2000 history history current position current position Which of these pointers do we choose? Which of these pointers do we choose? history history current position current position 00 01 10 11 By choosing one of these pointers we are “hiding” two bits of By choosing one of these pointers we are “hiding” two bits of the watermark. Note that we are not changing LZ the watermark. Note that we are not changing LZ- -77 77 Data Compression Conference 2000 11
Stefano Lonardi March, 2000 document T document T “Dear Bob, How are you doing today? …” LZS-77 T.gz 0110100010010 watermarked watermarked secret key k k secret key text text T’ T’ W= W=H H k k (T (T) ) “Dear Bob, How are you LZ-77 T.gz doing today? ...” watermarked T’ T’ watermarked text T text T watermarked T’ T’ watermarked “Dear Bob, T.gz How are you LZS-77 doing today? …” - Authentic 0110100010010 - Integral secret key k k secret key text T T text Data Compression Conference 2000 12
Stefano Lonardi March, 2000 Method Method Multiplicity Multiplicity � Definition Definition: a position : a position i i in the text in the text T T has has � multiplicity q multiplicity q if there exists exactly if there exists exactly q q matches of the longest prefix of T[i,n T[i,n] ] matches of the longest prefix of � Given a position with multiplicity Given a position with multiplicity q q , we , we � denote by p p 0 ,p 1 ,…,p q the q q choices for choices for denote by 0 ,p 1 ,…,p 1 the q- -1 the pointer the pointer Data Compression Conference 2000 13
Recommend
More recommend