vxa a virtual architecture for durable compressed archives
play

VXA : A Virtual Architecture for Durable Compressed Archives Bryan - PowerPoint PPT Presentation

VXA : A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology http://pdos.csail.mit.edu/~baford/vxa/ The Ubiquity of Data Compression


  1. VXA : A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology http://pdos.csail.mit.edu/~baford/vxa/

  2. The Ubiquity of Data Compression Everything is compressed these days – Archive/Backup/Distribution: ZIP, tar.gz, ... – Multimedia streams: mp3, ogg, wmv, ... – Office documents: XML-in-ZIP – Digital cameras: JPEG, proprietary RAW, ... – Video camcorders: DV, MPEG-2, ...

  3. Compressed Data Formats Observation #1: Data compression formats evolve rapidly s s e r c p 2 C O r R a m p p P R A U O H Q i i o I z z z A R S L Z L Z c g b 7 — — — — — — — — — — — Lossless Compression 1980 1985 1990 1995 2000 2005

  4. Compressed Data Formats Observation #1: Data compression formats evolve rapidly s s e r c p 2 C O r R a m p p P R A U O H Q i i o I z z z A R S L Z L Z c g b 7 — — — — — — — — — — — Lossless Compression 0 0 0 2 M G G P F A X G M B F F E E G C N I L I P P G B T P T P I J J — — — — — — — — — e Image Encoding m n 1 2 4 i o T - - - 7 8 9 G G G s M V V V k n C E E E c e M M M I I i N P P V P r u L o W W W M M M A Q D F S — — — — — — — — — — — Video Encoding o i d C u 7 9 s X - A A C A V i F F 3 C b l M A M V A F F P a r A o e L S I I W M W W A A A V R F 8 — — — — — — — — — — — Audio Encoding 1980 1985 1990 1995 2000 2005

  5. Compressed Data Formats Observation #1: Data compression formats evolve rapidly Problems: – Inconvenient: each new algorithm requires decoder install/upgrade – Impedes data portability: data unusable on systems without supported decoder – Threatens long-term data usability: old decoders may not run on new operating systems

  6. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE x86 Architecture 1980 1985 1990 1995 2000 2005

  7. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Fully Backward Compatible Extensions (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE x86 Architecture 1980 1985 1990 1995 2000 2005

  8. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005

  9. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005

  10. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005

  11. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005

  12. Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively (FP vector) (int vector) — x86-64 (64-bit) (32-bit) — 80386 — MMX — 8086 — SSE — DEC Alpha x86 Architecture — PA-RISC — PowerPC — SPARC — Itanium — 68000 — MIPS — ARM Other Architectures 1980 1985 1990 1995 2000 2005 Itanic

  13. VXA: Virtual Executable Archives Observation 1+2: Instruction formats are historically more durable than compressed data formats Make archive self-extracting (data + executable decoder) To extract data, archive reader runs embedded decoder Archive Archive Archive Writer Reader Encoder Decoder D D

  14. Goals of VXA Make self-extracting archives... Archive Archive Archive Writer Reader Encoder Decoder D D

  15. Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can't compromise host 2. Future-proof: simple, well-defined architecture [Lorie] Archive Archive Archive Writer Reader Encoder Emulator Decoder D D

  16. Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can't compromise host 2. Future-proof: simple, well-defined architecture [Lorie] 3. Easy: allow reuse of existing code, languages, tools Archive Archive Archive Writer Reader Encoder x86 Emulator Decoder D D

  17. Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can't compromise host 2. Future-proof: simple, well-defined architecture [Lorie] 3. Easy: allow reuse of existing code, languages, tools 4. Efficient: practical for short term data packaging too Archive Archive Archive Writer Reader Encoder Fast x86 Emulator Decoder D D

  18. Outline ● Archiver Operation ● vxZIP Archive Format ● Decoder Architecture ● Emulator Design & Implementation ● Evaluation (performance, storage overhead) ● Conclusion

  19. Archive Writer Operation VXA Archiver Archive

  20. Archive Writer Operation Uncompressed Input Files VXA Archiver General Compressor Decoder 1 D 1 Archive

  21. Archive Writer Operation Uncompressed Input Files VXA Archiver General Compressor Decoder 1 D 1 Archive

  22. Archive Writer Operation Uncompressed Input Files General Image Audio Compressor Compressor Compressor Decoder 1 Decoder 2 Decoder 3 D 1 D 2 D 3 Archive

  23. Archive Writer Operation Uncompressed Input Files Pre-Compressed Input Files General Image Audio Compressor Compressor Compressor Decoder 1 Decoder 2 Decoder 3 D 1 D 2 D 3 Archive

  24. Archive Writer Operation Uncompressed Input Files Pre-Compressed Input Files General Image Audio Image Format Audio Format Compressor Compressor Compressor Recognizer Recognizer Decoder 1 Decoder 2 Decoder 3 Decoder 4 Decoder 5 D 1 D 2 D 3 D 4 D 5 Archive

  25. Archive Reader Operation VXA Archive Reader x86 Emulator D 1 D 2 D 3 D 4 D 5 Archive

  26. Archive Reader Operation Original Uncompressed Files VXA Archive Reader x86 Emulator Decoder 1 D 1 D 2 D 3 D 4 D 5 Archive

  27. Archive Reader Operation Original Uncompressed Files VXA Archive Reader x86 Emulator Decoder 1 Decoder 2 Decoder 3 D 1 D 2 D 3 D 4 D 5 Archive

  28. Archive Reader Operation Original Uncompressed Files Original Pre-Compressed Files VXA Archive Reader x86 Emulator Decoder 1 Decoder 2 Decoder 3 D 1 D 2 D 3 D 4 D 5 Archive

  29. Archive Reader Operation Original Uncompressed Files De-compressed Files VXA Archive Reader x86 Emulator Decoder 1 Decoder 2 Decoder 3 Decoder 4 Decoder 5 D 1 D 2 D 3 D 4 D 5 Archive

  30. vxZIP Archive Format ● Backward compatible with legacy ZIP format Image file Audio file Audio file Central Directory vxZIP Archive

  31. vxZIP Archive Format ● Backward compatible JP2 Decoder with legacy ZIP format Image file ● Decoders intermixed FLAC Decoder with archived files Audio file Audio file Central Directory vxZIP Archive

  32. vxZIP Archive Format ● Backward compatible JP2 Decoder with legacy ZIP format Image file (JP2-encoded) ● Decoders intermixed FLAC Decoder with archived files Audio file ● Archived files have (FLAC-encoded) Audio file new extension header (FLAC-encoded) pointing to decoder Central Directory vxZIP Archive

  33. vxZIP Archive Format ● Backward compatible JP2 Decoder (deflated) with legacy ZIP format Image file (JP2-encoded) ● Decoders intermixed FLAC Decoder with archived files (deflated) Audio file ● Archived files have (FLAC-encoded) Audio file new extension header (FLAC-encoded) pointing to decoder Central Directory ● Decoders are hidden, vxZIP Archive “deflated” (gzip)

  34. vxZIP Decoder Architecture ● Decoders are ELF executables for x86-32 – Can be written in any language, safe or unsafe – Compiled using ordinary tools (GCC) ● Decoders have access to five “system calls”: – read stdin, write stdout, malloc, next file, exit ● Decoders cannot : – open files, windows, devices, network connections, ... – get system info: user name, current time, OS type, ...

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend