File Organisation Part - II Dr. V. V. Subrahmanyam Associate - - PowerPoint PPT Presentation

▶

Dec 11, 2023 360 likes •624 views

File Organisation Part - II Dr. V. V. Subrahmanyam Associate Professor, SOCIS, IGNOU Heap File Organisation The simplest file structure is an unordered file or heap file. The data in the pages of a heap file is not ordered. Every

SLIDE 1

File Organisation Part - II

Dr. V. V. Subrahmanyam

Associate Professor, SOCIS, IGNOU

SLIDE 2

Heap File Organisation

The simplest file structure is an

unordered file or heap file.

The data in the pages of a heap file is

not ordered.

Every record in the file has a unique rid

and every page in a file is of the same size.

SLIDE 3

Contd…

Records are inserted at the end of the file as

and when they are inserted.

Once the data block is full, the next record is

stored in the new block. This new block need not be the very next block.

This method can select any block in the

memory to store the new records.

SLIDE 4

Contd…

It is similar to pile file in the sequential

method, but here data blocks are not selected sequentially.

They can be any data blocks in the

memory.

It is the responsibility of the DBMS to

store the records and manage them.

SLIDE 5

Supported Operations on Heap Files

Create
Destroy
Insert a record with a given rid
Delete a record with a given rid
Get a record with a given rid
Scan all records in the file

SLIDE 6

Two alternative ways

Linked list of pages
Directory of pages

**In each of these alternatives, pages must hold two pointers(which are page ids) for file-level bookkeeping in addition to the data

SLIDE 7

Linked List of Pages

One possibility is to maintain a heap file as a

doubly linked list of pages.

DBMS can remember where the first page is

located by maintaining a table containing pairs

f Heap _file _name and Page_1 _address.
First page of the file is known as the header

page.

SLIDE 8

Contd…

An important task is to maintain information

about empty slots created by deleting a record from the heap file.

This task has 2 distinct parts:

– How to keep track of free space within a page? – How to keep track of pages those are free? The second part can be addresses by 2 doubly linked lists (i) for free space and (ii) for full pages.

SLIDE 9

Contd…

If a new page is required, it is obtained by

making a request to the disk space manager and then added to the list of pages in the file.

If a page is deleted from the heap file, it is

removed from the list and the disk space manager is told to deallocate it.

SLIDE 10

Heap File Organisation with a Doubly Linked Lists

Header Page Free Page Free Page Free Page Data Page 1 Data Page 2 Data Page N

Linked List of pages with free space Linked List of full pages

SLIDE 11

Disadvantage

Virtually all pages in a file will be on the free

list if records are of variable length. To insert a typical record, we must retrieve and examine several pages on the free list before we find

ne with enough free space.
This is overcome in the directory-based heap

file organisation.

SLIDE 12

Directory of Pages

An alternative technique to maintain directory
f pages.
DBMS must remember where the first

directory page of each heap file is located.

The directory is itself a collection of pages
Each directory entry identifies a page in the

heap file.

SLIDE 13

Contd…

The heap file grows or shrinks, the no. of

entries in the directory.

Free space can be managed by maintaining a

bit per entry, indicating whether the corresponding page has any free space, or a count per entry, indicating the amount of free space on the page.

SLIDE 14

Heap File Organisation with a Directory

Data Page 1 Data Page 2 Data Page N Directory

Header Page

SLIDE 15

Multikey File Organisation

Allow records to be accessed by more than
ne key field.
The ability to search on many keys is enabled

by building multiple index files “on top of “ the data file.

The physical DB consists of one or more data

files and many index files and each data file contains either one or several record types.

SLIDE 16

Two Approaches

Multilist file organisation
Inverted file organisation

SLIDE 17

Contd…

An index for each secondary key.
An index entry for each distinct value of the

secondary key.

The index may be tabular or tree-structured.
The entries in an index may or may not be

sorted.

The pointers to data records may be direct or

indirect.

SLIDE 18

Contd..

The indexes differ in that:

– An entry in an inverted index has a pointer to each data record with that value. – An entry in a multilist index has a pointer to the first data record with that value.

SLIDE 19

Contd…

Inverted index may have variable-length

entries whereas a multilist index has fixed length entries.

SLIDE 20

Hash / Direct File Organisation

Hash function is used to calculate the address
f the block to store the records.
The hash function can be any simple or

complex mathematical function.

The hash function is applied on some

columns/attributes – either key or non-key columns to get the block address.

Hence each record is stored randomly

irrespective of the order they come.

SLIDE 21

Contd…

This method is also known as Direct or

Random file organization.

If the hash function is generated on key

column, then that column is called hash key, and if hash function is generated on non-key column, then the column is hash column.

SLIDE 22

Contd…

When a record has to be retrieved, based on

the hash key column, the address is generated and directly from that address whole record is

retrieved. Here no effort to traverse through

whole file.

Similarly , when a new record has to be

inserted, the address is generated by hash key and record is directly inserted. Same is the case with update and delete.

SLIDE 23

Advantages

Records need not be sorted after any of the
transaction. Hence the effort of sorting is reduced in

this method.

Since block address is known by hash function,

accessing any record is very faster. Similarly updating or deleting a record is also very quick.

This method can handle multiple transactions as each

record is independent of other as there is no dependency on storage location for each record, multiple records can be accessed at the same time.

It is suitable for online transaction systems like online

banking, ticket booking system etc.

SLIDE 24

Disadvantages

Since all the records are randomly stored, they are

scattered in the memory. Hence memory is not efficiently used.

If we are searching for range of data, then this method

is not suitable. Because, each record will be stored at random address. Hence range search will not give the correct address range and searching will be inefficient.

Searching for records with exact name or value will be
efficient. If the Student name starting with ‘B’ will not

be efficient as it does not give the exact name of the student.

SLIDE 25

This method is efficient only when the search is

done on hash column. Otherwise, it will not be able find the correct address of the data.

If there is multiple hash columns – say name and

phone number of a person, to generate the address, and if we are searching any record using phone or name alone will not give correct results.

If these hash columns are frequently updated,

then the data block address is also changed

accordingly. Each update will generate new

address.

Hardware and software required for the memory

File Organisation Part - II

Associate Professor, SOCIS, IGNOU

Heap File Organisation

unordered file or heap file.

not ordered.

and every page in a file is of the same size.

Contd…

and when they are inserted.

stored in the new block. This new block need not be the very next block.

memory to store the new records.

Contd…

method, but here data blocks are not selected sequentially.

memory.

store the records and manage them.

Supported Operations on Heap Files

Two alternative ways

**In each of these alternatives, pages must hold two pointers(which are page ids) for file-level bookkeeping in addition to the data

Linked List of Pages

doubly linked list of pages.

located by maintaining a table containing pairs

page.

Contd…

about empty slots created by deleting a record from the heap file.

– How to keep track of free space within a page? – How to keep track of pages those are free? The second part can be addresses by 2 doubly linked lists (i) for free space and (ii) for full pages.

Contd…

making a request to the disk space manager and then added to the list of pages in the file.

removed from the list and the disk space manager is told to deallocate it.

Heap File Organisation with a Doubly Linked Lists

Disadvantage

list if records are of variable length. To insert a typical record, we must retrieve and examine several pages on the free list before we find

file organisation.

Directory of Pages

directory page of each heap file is located.

heap file.

Contd…

entries in the directory.

bit per entry, indicating whether the corresponding page has any free space, or a count per entry, indicating the amount of free space on the page.

Heap File Organisation with a Directory

Multikey File Organisation

by building multiple index files “on top of “ the data file.

files and many index files and each data file contains either one or several record types.

Two Approaches

Contd…

secondary key.

sorted.

indirect.

Contd..

– An entry in an inverted index has a pointer to each data record with that value. – An entry in a multilist index has a pointer to the first data record with that value.

Contd…

entries whereas a multilist index has fixed length entries.

Hash / Direct File Organisation

complex mathematical function.

columns/attributes – either key or non-key columns to get the block address.

irrespective of the order they come.

Contd…

Random file organization.

column, then that column is called hash key, and if hash function is generated on non-key column, then the column is hash column.

Contd…

the hash key column, the address is generated and directly from that address whole record is

whole file.

inserted, the address is generated by hash key and record is directly inserted. Same is the case with update and delete.

Advantages

this method.

accessing any record is very faster. Similarly updating or deleting a record is also very quick.

record is independent of other as there is no dependency on storage location for each record, multiple records can be accessed at the same time.

banking, ticket booking system etc.

Disadvantages

scattered in the memory. Hence memory is not efficiently used.

is not suitable. Because, each record will be stored at random address. Hence range search will not give the correct address range and searching will be inefficient.

be efficient as it does not give the exact name of the student.

done on hash column. Otherwise, it will not be able find the correct address of the data.

phone number of a person, to generate the address, and if we are searching any record using phone or name alone will not give correct results.

then the data block address is also changed

address.

management are costlier in this case.