Inhalt 1 Concept and problem case 2 Software design 3 Implementation - - PowerPoint PPT Presentation

inhalt
SMART_READER_LITE
LIVE PREVIEW

Inhalt 1 Concept and problem case 2 Software design 3 Implementation - - PowerPoint PPT Presentation

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature Dbfs - Database filesystem 1 Timo Minartz Software project WS 2008/09 April 6, 2009 1 supervised by Julian Kunkel 1 / 25 Concept and


slide-1
SLIDE 1

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Dbfs - Database filesystem 1

Timo Minartz

Software project WS 2008/09

April 6, 2009

1supervised by Julian Kunkel 1 / 25

slide-2
SLIDE 2

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Inhalt

1 Concept and problem case 2 Software design 3 Implementation 4 Benchmarks 5 Conclusion and future work 6 Literature

2 / 25

slide-3
SLIDE 3

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Project goal

Problem case specific

  • map filesystem sources and database tables in one namespace
  • implement a lightweight filesystem with FUSE [Sou]
  • easy to maintain database design
  • minimize database overhead

General

  • reusable software
  • well documented
  • usability

3 / 25

slide-4
SLIDE 4

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Project goal

Problem case specific

  • map filesystem sources and database tables in one namespace
  • implement a lightweight filesystem with FUSE [Sou]
  • easy to maintain database design
  • minimize database overhead

General

  • reusable software
  • well documented
  • usability

3 / 25

slide-5
SLIDE 5

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Problem case

Initial situation

  • a microscope generates lots of data in a specific folder

hierarchy

  • in particular it creates a tiff-File with a size of a few MByte
  • this tiff-File is identicated by a collaboration, project, plate,

replicate, well and file name

  • there are multiple collaborations, projects, etc. so lots of

tiff-Files are created Further situation

  • tiff-Files should be evaluated by different applications
  • these applications store their results in simple files
  • it should be easy to manage these files (i.e. by a database

system)

4 / 25

slide-6
SLIDE 6

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Problem case

Initial situation

  • a microscope generates lots of data in a specific folder

hierarchy

  • in particular it creates a tiff-File with a size of a few MByte
  • this tiff-File is identicated by a collaboration, project, plate,

replicate, well and file name

  • there are multiple collaborations, projects, etc. so lots of

tiff-Files are created Further situation

  • tiff-Files should be evaluated by different applications
  • these applications store their results in simple files
  • it should be easy to manage these files (i.e. by a database

system)

4 / 25

slide-7
SLIDE 7

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Problem case (2)

Initial filestructure (base filesystem) /collaboration/project/plate/replicate/well-file.tiff Resulting filestructure (fuse filesystem, dbfs) /collaboration/project/application/plate/replicate/well/file.tiff

5 / 25

slide-8
SLIDE 8

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Problem case (2)

Initial filestructure (base filesystem) /collaboration/project/plate/replicate/well-file.tiff Resulting filestructure (fuse filesystem, dbfs) /collaboration/project/application/plate/replicate/well/file.tiff

5 / 25

slide-9
SLIDE 9

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Example

Base filesystem structure /collab0/project0/plate0/replicate0/000-file1.tiff /collab0/project0/plate0/replicate0/000-file2.tiff /collab0/project0/plate0/replicate0/001-file3.tiff /collab0/project0/plate0/replicate0/metadata Dbfs filestructure /collab0/project0/application0/plate0/replicate0/000/file1.tiff /collab0/project0/application0/plate0/replicate0/000/file2.tiff /collab0/project0/application0/plate0/replicate0/001/file3.tiff /collab0/project0/application0/plate0/replicate0/metadata

6 / 25

slide-10
SLIDE 10

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Example

Base filesystem structure /collab0/project0/plate0/replicate0/000-file1.tiff /collab0/project0/plate0/replicate0/000-file2.tiff /collab0/project0/plate0/replicate0/001-file3.tiff /collab0/project0/plate0/replicate0/metadata Dbfs filestructure /collab0/project0/application0/plate0/replicate0/000/file1.tiff /collab0/project0/application0/plate0/replicate0/000/file2.tiff /collab0/project0/application0/plate0/replicate0/001/file3.tiff /collab0/project0/application0/plate0/replicate0/metadata

6 / 25

slide-11
SLIDE 11

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files examples

Dbfs filesystem /collaboration0/project0/application0/plate0/replicate0/000/ergs /collaboration0/project0/application0/plate0/replicate0/001/ergs

  • virtual files are stored in database
  • virtual files are identificated by collaboration, project, plate,

replicate, well, file name AND application

7 / 25

slide-12
SLIDE 12

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files examples

Dbfs filesystem /collaboration0/project0/application0/plate0/replicate0/000/ergs /collaboration0/project0/application0/plate0/replicate0/001/ergs

  • virtual files are stored in database
  • virtual files are identificated by collaboration, project, plate,

replicate, well, file name AND application

7 / 25

slide-13
SLIDE 13

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Further constraints

Virtualization layers

  • one for the application and
  • one for the well

Permissions

  • only read permission to tiff-Files
  • permissions for metadata files inherited from base filesystem
  • read and write permissions to virtual files on application level
  • no structural changes allowed (chmod,mkdir,. . . )

8 / 25

slide-14
SLIDE 14

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Further constraints

Virtualization layers

  • one for the application and
  • one for the well

Permissions

  • only read permission to tiff-Files
  • permissions for metadata files inherited from base filesystem
  • read and write permissions to virtual files on application level
  • no structural changes allowed (chmod,mkdir,. . . )

8 / 25

slide-15
SLIDE 15

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files model

  • table for every application
  • table has columns for every subfolder and one for every virtual

file

Table: Example database table collaboration0 project0 application0

plate replicate well ergs plate0 replicate0 000 “ergs for well 000” plate0 replicate0 001 “ergs for well 001”

9 / 25

slide-16
SLIDE 16

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files model

  • table for every application
  • table has columns for every subfolder and one for every virtual

file

Table: Example database table collaboration0 project0 application0

plate replicate well ergs plate0 replicate0 000 “ergs for well 000” plate0 replicate0 001 “ergs for well 001”

9 / 25

slide-17
SLIDE 17

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Permissions model

  • permissions on project level
  • second table for permissions
  • containing one column for application and one for the owner

(user id from operating system)

Table: Example permission table permissions collaboration0 project0

name

  • wner

application0 1000 application1 1001

10 / 25

slide-18
SLIDE 18

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Permissions model

  • permissions on project level
  • second table for permissions
  • containing one column for application and one for the owner

(user id from operating system)

Table: Example permission table permissions collaboration0 project0

name

  • wner

application0 1000 application1 1001

10 / 25

slide-19
SLIDE 19

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Managing the directory structure

General

  • changes in the base filesystem
  • and in the database tables (i.e. new virtual files)

Howto

  • “by hand”, see documentation and/or README file
  • using a simple GUI

11 / 25

slide-20
SLIDE 20

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Managing the directory structure

General

  • changes in the base filesystem
  • and in the database tables (i.e. new virtual files)

Howto

  • “by hand”, see documentation and/or README file
  • using a simple GUI

11 / 25

slide-21
SLIDE 21

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Managing the directory structure (2)

Figure: Graphical user interface to manage the directory structure

12 / 25

slide-22
SLIDE 22

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Optimizations and restrictions

Database overhead

  • multiple users who need own database connections
  • lots of queries are generated for a simple command (like ls)

Optimization

  • thread-safe database pooling
  • simple caching for query results
  • both can be enabled in the sourcecode

Restrictions

  • cache consistency problem
  • if underlying base filesystem changes (creating new

(sub-)folders etc.)

13 / 25

slide-23
SLIDE 23

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Optimizations and restrictions

Database overhead

  • multiple users who need own database connections
  • lots of queries are generated for a simple command (like ls)

Optimization

  • thread-safe database pooling
  • simple caching for query results
  • both can be enabled in the sourcecode

Restrictions

  • cache consistency problem
  • if underlying base filesystem changes (creating new

(sub-)folders etc.)

13 / 25

slide-24
SLIDE 24

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Optimizations and restrictions

Database overhead

  • multiple users who need own database connections
  • lots of queries are generated for a simple command (like ls)

Optimization

  • thread-safe database pooling
  • simple caching for query results
  • both can be enabled in the sourcecode

Restrictions

  • cache consistency problem
  • if underlying base filesystem changes (creating new

(sub-)folders etc.)

13 / 25

slide-25
SLIDE 25

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Implementation in C++

Implemented classes can be spread into 4 modules

  • handling filesystem issues
  • database access
  • GUI and
  • the helper classes and functions

Implemented filesystem operations

  • gettattr
  • readdir
  • read and
  • write

14 / 25

slide-26
SLIDE 26

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Implementation in C++

Implemented classes can be spread into 4 modules

  • handling filesystem issues
  • database access
  • GUI and
  • the helper classes and functions

Implemented filesystem operations

  • gettattr
  • readdir
  • read and
  • write

14 / 25

slide-27
SLIDE 27

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Implementation in C++ (2)

Further implementation details

  • documentation (PDF)
  • in-line documentation (doxygen)
  • type make doc in software project root

15 / 25

slide-28
SLIDE 28

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

FUSE stumbling blocks

Mounting fuse without administrative privileges

  • mount: ./dbfs mountpoint [args]
  • umount: fusermount -u mountpoint

Logging

  • fuse forks a new process, so logging to stdout is not possible
  • the parameter -f prevents fuse from forking
  • alternative: logging to a file (implemented)

Debugging with valgrind

  • problem with older kernel versions: fusermount not traceable
  • workaround available: see README in project root
  • with kernel 2.6.27-11-generic working out-of-the-box

16 / 25

slide-29
SLIDE 29

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

FUSE stumbling blocks

Mounting fuse without administrative privileges

  • mount: ./dbfs mountpoint [args]
  • umount: fusermount -u mountpoint

Logging

  • fuse forks a new process, so logging to stdout is not possible
  • the parameter -f prevents fuse from forking
  • alternative: logging to a file (implemented)

Debugging with valgrind

  • problem with older kernel versions: fusermount not traceable
  • workaround available: see README in project root
  • with kernel 2.6.27-11-generic working out-of-the-box

16 / 25

slide-30
SLIDE 30

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

FUSE stumbling blocks

Mounting fuse without administrative privileges

  • mount: ./dbfs mountpoint [args]
  • umount: fusermount -u mountpoint

Logging

  • fuse forks a new process, so logging to stdout is not possible
  • the parameter -f prevents fuse from forking
  • alternative: logging to a file (implemented)

Debugging with valgrind

  • problem with older kernel versions: fusermount not traceable
  • workaround available: see README in project root
  • with kernel 2.6.27-11-generic working out-of-the-box

16 / 25

slide-31
SLIDE 31

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

The benchmark process

Testsets

  • comparision of Dbfs and tmpfs
  • evaluation of Dbfs
  • tmpfs as base filesystem
  • ext3 / tmpfs filesystem for the mysql database
  • clean / dirty database

Different use cases

  • reading filesystem attributes
  • reading metadata files and tiff-Files
  • reading virtual files
  • writing virtual files

17 / 25

slide-32
SLIDE 32

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

The benchmark process

Testsets

  • comparision of Dbfs and tmpfs
  • evaluation of Dbfs
  • tmpfs as base filesystem
  • ext3 / tmpfs filesystem for the mysql database
  • clean / dirty database

Different use cases

  • reading filesystem attributes
  • reading metadata files and tiff-Files
  • reading virtual files
  • writing virtual files

17 / 25

slide-33
SLIDE 33

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Metadata

Figure: Reading filesystem attributes

18 / 25

slide-34
SLIDE 34

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Physical files

Figure: Read test for the physical files depending on blocksize

19 / 25

slide-35
SLIDE 35

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Physical files (2)

Figure: Read test for the physical files, time for reading one byte

20 / 25

slide-36
SLIDE 36

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Physical files (3)

Figure: Read test for the physical files, bytes per sec

21 / 25

slide-37
SLIDE 37

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files

Figure: Read test for virtual files

22 / 25

slide-38
SLIDE 38

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files (2)

Figure: Write test for virtual files

23 / 25

slide-39
SLIDE 39

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files (3)

Figure: Read and write for virtual files

24 / 25

slide-40
SLIDE 40

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Conclusion

Software project goal

  • mapping filesystem and database sources in one namespace

can be solved by a fuse implementation

  • good performance for physical files (stored on underlying

filesystem)

  • bottleneck for virtual files is not the database access itself
  • concrete use case must take decision about using this

implementation Future work

  • implementation issues (sql injection, dynamic virtualization

layers)

25 / 25

slide-41
SLIDE 41

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Conclusion

Software project goal

  • mapping filesystem and database sources in one namespace

can be solved by a fuse implementation

  • good performance for physical files (stored on underlying

filesystem)

  • bottleneck for virtual files is not the database access itself
  • concrete use case must take decision about using this

implementation Future work

  • implementation issues (sql injection, dynamic virtualization

layers)

25 / 25

slide-42
SLIDE 42

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

ROFS, the Read-Only Filesystem for FUSE. http://mattwork.potsdam.edu/projects/wiki/index. php/Rofs IEEE, The ; Group, The O.: The Open Group Base Specifications Issue 6. http://www.opengroup.org/onlinepubs/009695399/ functions/contents.html Microsystems, Sun: MySQL 6.0 Reference Manual. http://dev.mysql.com/doc/refman/6.0/en/index.html Sourceforge.net: Main Page - fuse. http://apps.sourceforge.net/mediawiki/fuse/index. php?title=Main_Page

25 / 25