Formalising Filesystems in the ACL2 Theorem Prover An Application - PowerPoint PPT Presentation

Formalising Filesystems in the ACL2 Theorem Prover An Application To FAT32 Mihir Mehta Department of Computer Science University of Texas at Austin mihir@cs.utexas.edu 05 November, 2018 1/25

Why filesystem verification matters ◮ Filesystems form the basis of our current computing paradigm. ◮ They help application programmers address data by pathnames (not numbers), and additionally address security/efficiency/redundancy concerns which the average dev doesn’t want to be involved in. ◮ Thus, verifying the properties of filesystems in common use, and thereby making them reliable, is critically important. ◮ FAT32 - once widely used on Windows, and still used by a large number of embedded systems - qualifies. 2/25

The plan ◮ Modelling a real, widely used filesystem in ACL2 - FAT32 ◮ Verification through refinement ◮ Binary compatibility and execution efficiency ◮ Co-simulation tests for accuracy 3/25

Outline FAT32 The models Proofs and co-simulation Related and future work 4/25

Our FAT32 model aims to have . . . ◮ . . . the same space constraints as a FAT32 volume of the same size. ◮ . . . the same success and failure conditions for file operations, and the same error codes for the latter. ◮ . . . a way to read a FAT32 disk image from a block device, and a way to write it back. ◮ This is made easier by choosing to replicate the on-disk data structures of FAT32 in the model. 5/25

File operations in our model ◮ File operations categorised into read operations , which do not change the state of the filesystem, and write operations which do. ◮ Generic signature for read operations: (read fs-inst) �→ (mv ret-val status errno) ◮ Generic signature for write operations: (write fs-inst) �→ (mv fs-inst ret-val status errno) 6/25

The FAT32 specification In a FAT32 volume, the unit of data storage is a cluster (also known as an extent ). There are three on-disk data structures. ◮ reserved area , volume-level metadata such as the size of a cluster and the number of clusters. ◮ file allocation table , collection of clusterchains (linked lists of clusters), one for each regular file/directory file. ◮ data region , collection of clusters. 7/25

A FAT32 Directory Tree / FAT index FAT entry initrd.img tmp/ vmlinuz 0 (reserved) 1 (reserved) 2 eoc ticket1.txt ticket2.txt 3 4 4 eoc directory entry in / 5 eoc 0 “vmlinuz”, 3 6 eoc 32 “initrd.img”, 5 7 eoc 64 “tmp”, 6 . . 8 eoc . . . . 9 0 directory entry in /tmp/ . . . . . . 0 “ticket1”, 7 32 “ticket2”, 8 . . . . . . 8/25

Abstract models ◮ Bootstrap - begin with abstract filesystem models, in order to explore the properties we require in a FAT32 model. ◮ Incrementally add the desired properties in a series of models ◮ Wherever possible, capture common features expected to exist in different filesystems. 10/25

Abstract models in brief Filesystem is a literal directory tree; contents of regular L1 files are represented as strings stored in the nodes. A single element of metadata, length , is stored within L2 each regular file. Regular files are divided into fixed-size blocks; blocks are L3 stored in an external “disk” data structure; storage for these blocks remains unbounded as in L1 and L2 . Disk size is now bounded; allocation vector data structure L4 is introduced to help allocate and garbage-collect blocks. Additional metadata for file ownership and access per- L5 missions is stored within each regular file. Allocation vector is replaced by a file allocation table. L6 11/25

Beginning to model FAT32 Next, in models M1 and M2 , we model FAT32 more concretely, providing the standard POSIX system calls. ◮ M1 - another tree model, but with directory entries exactly matching the FAT32 spec. ◮ M2 - stobj model with fields for all the metadata in the reserved area and arrays for the file allocation table and data region. This way, we benefit from efficient stobj array operations in M2 , and we can simplify our reasoning in M1 by continuing with directory trees. 12/25

Read-after-write proofs ◮ Read-over-write properties show that write operations have their effects made available immediately for reads at the same location, and also that they do not affect reads at other locations. ◮ We’ve proved these properties for the abstract models L1 - L6 , and we’ve also proved them for our concrete models M1 and M2 , with the caveat that the transformations between M1 and M2 are not yet verified. 14/25

Refinement proofs ◮ For the abstract models, we started by proving the read-over-write properties ab initio for L1 . ◮ For each subsequent model in L2 - L6 , we proved a refinement relationship where possible, or an equivalence where a strict refinement did not hold, with a previous model and used it to prove read-over-write properties by analogy. ◮ An illustration of such a proof follows. 15/25

Proof example: first read-over-write in L2 l 2 l 2 write(text) l2-to-l1-fs l2-to-l1-fs l 1 l 1 write(text) Figure: l2-wrchs-correctness-1 (write is overloaded for L2 and L1 ) l 2 text read l2-to-l1-fs read l 1 Figure: l2-rdchs-correctness-1 (read is overloaded for L2 and L1 ) 16/25

Proof example: first read-over-write in L2 l 2 l 2 text read write(text) l2-to-l1-fs l2-to-l1-fs read l 1 l 1 write(text) Figure: l2-read-over-write-1 17/25

Relationships between abstract models L1 - tree L2 - length L4 - bounded disk with garbage collection L3 - unbounded disk L5 - permissions L6 - file allocation table 18/25

Co-simulation ◮ Ensure that our implementation lines up with FAT32, the target filesystem. ◮ POSIX system calls supported - lstat , open , pread , pwrite , close , mkdir and mknod . ◮ Wherever errno is to be set, do what Linux does. ◮ Compare the output of our ACL2 programs (based on the FAT32 model) with the utilities (such as cp and mkfs ) which they replicate. 19/25

Related work - interactive theorem provers ◮ Bevier and Cohen - Synergy FS, executable model with processes and file descriptors, but no read-over-write theorems (ACL2). ◮ Klein et al. - COGENT, verifying compiler from a DSL to C code for a filesystem (Isabelle/HOL). ◮ Chen - FSCQ, high-performance filesystem with verified crash consistency properties (Coq). 21/25

Related work - non-interactive theorem provers ◮ Hyperkernel - microkernel with system calls simplified until the point where useful properties can be proved through SMT solving (Z3). ◮ Yggdrasil - filesystem verification through SMT solving, but constrained from modelling important features such as extents (Z3). 22/25

Future work ◮ Model the remaining POSIX system calls for FAT32 and use them to reason about sequences of file operations (i.e. do code proofs). ◮ Reuse FAT32 verification artefacts for a filesystem with crash consistency, for instance, ext4. ◮ Model concurrent file operations in a multiprogramming environment. 23/25

Recent progress ◮ Set of supported POSIX system calls expanded. ◮ Set of co-simulation tests, mostly based on coreutils programs, expanded based on these. ◮ Functions for converting M2 instances to FAT32 disk images and back proved to be inverses of each other. ◮ Equivalence relation developed to allow two FAT32 disk images to be compared modulo rearrangement of data and reordering of files within directories. ◮ This gives us a means to co-simulate programs which modify filesystem state, such as mv and rm . 24/25

Conclusion ◮ FAT32 formalised, demonstrating the applicability of the refinement style to filesystem verification. ◮ Co-simulation infrastructure developed to compare filesystem models to a canonical implementation, such as that of Linux. ◮ FAT32’s allocation and garbage collection algorithms certified. 25/25

Formalising Filesystems in the ACL2 Theorem Prover An Application - PowerPoint PPT Presentation

Formalising Filesystems in the ACL2 Theorem Prover An Application To FAT32 Mihir Mehta Department of Computer Science University of Texas at Austin mihir@cs.utexas.edu 05 November, 2018 1/25 Why filesystem verification matters

A Simple Java Code Generator for ACL2 Based on a Deep Embedding of ACL2 in Java Alessandro

Hygienic Macros for the ACL2 Theorem Prover Carl Eastlund Matthias Felleisen cce@ccs.neu.edu

ACL2(ml): Machine-Learning for ACL2 J. Heras and E. Komendantskaya

Axiomatic Events in ACL2 ( r ) Ruben Gamboa, John Cowles, and Nadya Kuzmina University of Wyoming

Adding a typing Adding a typing mechanism to ACL2 mechanism to ACL2 Vernon Austel Vernon

Challenge Problems for Challenge Problems for the ACL2 Community the ACL2 Community David

Flat Domains and Recursive Equations in ACL2 by John Cowles University of Wyoming 1 ACL2 is a

Java Code Generation for the ACL2 Theorem Prover Kestrel Alessandro Coglio Institute designer

Parallelizing an Interactive Theorem Prover Functional Programming and Proofs with ACL2 David L.

Meditation for a Theorem Prover Reasoning and Consciousness Teaching a Theorem Prover to let

A SAT-Based Procedure for Verifying Finite State Machines in ACL2 Warren A. Hunt, Jr. and Erik

Double Rewriting for Equivalential Reasoning in ACL2 Matt Kaufmann and J Strother Moore ACL2

How Can I Do That with ACL2? Recent Enhancements to ACL2 Matt Kaufmann and J Strother Moore 1

A Parallelized Theorem Prover for Interactive Theorem Proving David L. Rager, Warren A. Hunt Jr.,

This time we'll talk about filesystems. We'll start out by looking at disk partitions, which are

Introduction Introduction to storage and to storage and filesystems filesystems Introduction

I'm Etienne React trainer & dev & consultant @LeReacteurIO A Binary adder written

Introduction to FPGAs Getting Started with Xilinx Digital Design Everything is represented in

Theories within Theories Berislav Zarni c Physics and Philosophy, Split, July 2012

The equivalence between many-to-one polygraphs and opetopic sets Cdric Ho Thanh 1 July 7 th ,

Optimum Binary Search Trees* D. E. KNUTH Received June 22, t97o One of the popular methods for

An optimal minimum spanning tree algorithm Claus Andersen Aarhus University December 19, 2008

Integrated Schedulers for a Predictable Interrupt Management on Real-Time Kernels A. Crespo S.

Induction Motor Emulation Senior Design Team 1506 Geoffrey Roy, Amber Reinwald, Matthew Geary

Formalising Filesystems in the ACL2 Theorem Prover An Application - PowerPoint PPT Presentation

Formalising Filesystems in the ACL2 Theorem Prover An Application To FAT32 Mihir Mehta Department of Computer Science University of Texas at Austin mihir@cs.utexas.edu 05 November, 2018 1/25 Why filesystem verification matters

A Simple Java Code Generator for ACL2 Based on a Deep Embedding of ACL2 in Java Alessandro

Hygienic Macros for the ACL2 Theorem Prover Carl Eastlund Matthias Felleisen cce@ccs.neu.edu

ACL2(ml): Machine-Learning for ACL2 J. Heras and E. Komendantskaya

Axiomatic Events in ACL2 ( r ) Ruben Gamboa, John Cowles, and Nadya Kuzmina University of Wyoming

Adding a typing Adding a typing mechanism to ACL2 mechanism to ACL2 Vernon Austel Vernon

Challenge Problems for Challenge Problems for the ACL2 Community the ACL2 Community David

Flat Domains and Recursive Equations in ACL2 by John Cowles University of Wyoming 1 ACL2 is a

Java Code Generation for the ACL2 Theorem Prover Kestrel Alessandro Coglio Institute designer

Parallelizing an Interactive Theorem Prover Functional Programming and Proofs with ACL2 David L.

Meditation for a Theorem Prover Reasoning and Consciousness Teaching a Theorem Prover to let

A SAT-Based Procedure for Verifying Finite State Machines in ACL2 Warren A. Hunt, Jr. and Erik

Double Rewriting for Equivalential Reasoning in ACL2 Matt Kaufmann and J Strother Moore ACL2

How Can I Do That with ACL2? Recent Enhancements to ACL2 Matt Kaufmann and J Strother Moore 1

A Parallelized Theorem Prover for Interactive Theorem Proving David L. Rager, Warren A. Hunt Jr.,

This time we'll talk about filesystems. We'll start out by looking at disk partitions, which are

Introduction Introduction to storage and to storage and filesystems filesystems Introduction

I'm Etienne React trainer &amp; dev &amp; consultant @LeReacteurIO A Binary adder written

Introduction to FPGAs Getting Started with Xilinx Digital Design Everything is represented in

Theories within Theories Berislav Zarni c Physics and Philosophy, Split, July 2012

The equivalence between many-to-one polygraphs and opetopic sets Cdric Ho Thanh 1 July 7 th ,

Optimum Binary Search Trees* D. E. KNUTH Received June 22, t97o One of the popular methods for

An optimal minimum spanning tree algorithm Claus Andersen Aarhus University December 19, 2008

Integrated Schedulers for a Predictable Interrupt Management on Real-Time Kernels A. Crespo S.

Induction Motor Emulation Senior Design Team 1506 Geoffrey Roy, Amber Reinwald, Matthew Geary

I'm Etienne React trainer & dev & consultant @LeReacteurIO A Binary adder written