File Organisation Part - II Dr. V. V. Subrahmanyam Associate - - PowerPoint PPT Presentation

file organisation part ii
SMART_READER_LITE
LIVE PREVIEW

File Organisation Part - II Dr. V. V. Subrahmanyam Associate - - PowerPoint PPT Presentation

File Organisation Part - II Dr. V. V. Subrahmanyam Associate Professor, SOCIS, IGNOU Heap File Organisation The simplest file structure is an unordered file or heap file. The data in the pages of a heap file is not ordered. Every


slide-1
SLIDE 1

File Organisation Part - II

  • Dr. V. V. Subrahmanyam

Associate Professor, SOCIS, IGNOU

slide-2
SLIDE 2

Heap File Organisation

  • The simplest file structure is an

unordered file or heap file.

  • The data in the pages of a heap file is

not ordered.

  • Every record in the file has a unique rid

and every page in a file is of the same size.

slide-3
SLIDE 3

Contd…

  • Records are inserted at the end of the file as

and when they are inserted.

  • Once the data block is full, the next record is

stored in the new block. This new block need not be the very next block.

  • This method can select any block in the

memory to store the new records.

slide-4
SLIDE 4

Contd…

  • It is similar to pile file in the sequential

method, but here data blocks are not selected sequentially.

  • They can be any data blocks in the

memory.

  • It is the responsibility of the DBMS to

store the records and manage them.

slide-5
SLIDE 5

Supported Operations on Heap Files

  • Create
  • Destroy
  • Insert a record with a given rid
  • Delete a record with a given rid
  • Get a record with a given rid
  • Scan all records in the file
slide-6
SLIDE 6

Two alternative ways

  • Linked list of pages
  • Directory of pages

**In each of these alternatives, pages must hold two pointers(which are page ids) for file-level bookkeeping in addition to the data

slide-7
SLIDE 7

Linked List of Pages

  • One possibility is to maintain a heap file as a

doubly linked list of pages.

  • DBMS can remember where the first page is

located by maintaining a table containing pairs

  • f Heap _file _name and Page_1 _address.
  • First page of the file is known as the header

page.

slide-8
SLIDE 8

Contd…

  • An important task is to maintain information

about empty slots created by deleting a record from the heap file.

  • This task has 2 distinct parts:

– How to keep track of free space within a page? – How to keep track of pages those are free? The second part can be addresses by 2 doubly linked lists (i) for free space and (ii) for full pages.

slide-9
SLIDE 9

Contd…

  • If a new page is required, it is obtained by

making a request to the disk space manager and then added to the list of pages in the file.

  • If a page is deleted from the heap file, it is

removed from the list and the disk space manager is told to deallocate it.

slide-10
SLIDE 10

Heap File Organisation with a Doubly Linked Lists

Header Page Free Page Free Page Free Page Data Page 1 Data Page 2 Data Page N

Linked List of pages with free space Linked List of full pages

slide-11
SLIDE 11

Disadvantage

  • Virtually all pages in a file will be on the free

list if records are of variable length. To insert a typical record, we must retrieve and examine several pages on the free list before we find

  • ne with enough free space.
  • This is overcome in the directory-based heap

file organisation.

slide-12
SLIDE 12

Directory of Pages

  • An alternative technique to maintain directory
  • f pages.
  • DBMS must remember where the first

directory page of each heap file is located.

  • The directory is itself a collection of pages
  • Each directory entry identifies a page in the

heap file.

slide-13
SLIDE 13

Contd…

  • The heap file grows or shrinks, the no. of

entries in the directory.

  • Free space can be managed by maintaining a

bit per entry, indicating whether the corresponding page has any free space, or a count per entry, indicating the amount of free space on the page.

slide-14
SLIDE 14

Heap File Organisation with a Directory

Data Page 1 Data Page 2 Data Page N Directory

Header Page

slide-15
SLIDE 15

Multikey File Organisation

  • Allow records to be accessed by more than
  • ne key field.
  • The ability to search on many keys is enabled

by building multiple index files “on top of “ the data file.

  • The physical DB consists of one or more data

files and many index files and each data file contains either one or several record types.

slide-16
SLIDE 16

Two Approaches

  • Multilist file organisation
  • Inverted file organisation
slide-17
SLIDE 17

Contd…

  • An index for each secondary key.
  • An index entry for each distinct value of the

secondary key.

  • The index may be tabular or tree-structured.
  • The entries in an index may or may not be

sorted.

  • The pointers to data records may be direct or

indirect.

slide-18
SLIDE 18

Contd..

  • The indexes differ in that:

– An entry in an inverted index has a pointer to each data record with that value. – An entry in a multilist index has a pointer to the first data record with that value.

slide-19
SLIDE 19

Contd…

  • Inverted index may have variable-length

entries whereas a multilist index has fixed length entries.

slide-20
SLIDE 20

Hash / Direct File Organisation

  • Hash function is used to calculate the address
  • f the block to store the records.
  • The hash function can be any simple or

complex mathematical function.

  • The hash function is applied on some

columns/attributes – either key or non-key columns to get the block address.

  • Hence each record is stored randomly

irrespective of the order they come.

slide-21
SLIDE 21

Contd…

  • This method is also known as Direct or

Random file organization.

  • If the hash function is generated on key

column, then that column is called hash key, and if hash function is generated on non-key column, then the column is hash column.

slide-22
SLIDE 22

Contd…

  • When a record has to be retrieved, based on

the hash key column, the address is generated and directly from that address whole record is

  • retrieved. Here no effort to traverse through

whole file.

  • Similarly , when a new record has to be

inserted, the address is generated by hash key and record is directly inserted. Same is the case with update and delete.

slide-23
SLIDE 23

Advantages

  • Records need not be sorted after any of the
  • transaction. Hence the effort of sorting is reduced in

this method.

  • Since block address is known by hash function,

accessing any record is very faster. Similarly updating or deleting a record is also very quick.

  • This method can handle multiple transactions as each

record is independent of other as there is no dependency on storage location for each record, multiple records can be accessed at the same time.

  • It is suitable for online transaction systems like online

banking, ticket booking system etc.

slide-24
SLIDE 24

Disadvantages

  • Since all the records are randomly stored, they are

scattered in the memory. Hence memory is not efficiently used.

  • If we are searching for range of data, then this method

is not suitable. Because, each record will be stored at random address. Hence range search will not give the correct address range and searching will be inefficient.

  • Searching for records with exact name or value will be
  • efficient. If the Student name starting with ‘B’ will not

be efficient as it does not give the exact name of the student.

slide-25
SLIDE 25
  • This method is efficient only when the search is

done on hash column. Otherwise, it will not be able find the correct address of the data.

  • If there is multiple hash columns – say name and

phone number of a person, to generate the address, and if we are searching any record using phone or name alone will not give correct results.

  • If these hash columns are frequently updated,

then the data block address is also changed

  • accordingly. Each update will generate new

address.

  • Hardware and software required for the memory

management are costlier in this case.