Outline ! Introduction ! Basic - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline ! Introduction ! Basic - - PowerPoint PPT Presentation

File Structures An Introduction Outline ! Introduction ! Basic Concepts ! Secondary Storage ! Sequential Files ! Direct Files ! Indexed Files ! Tree-Based Files ! Multilist &


slide-1
SLIDE 1

File Structures

An Introduction

สมชาย ประสิทธิ์จูตระกูล

slide-2
SLIDE 2

rasitjutrakul

Outline

! Introduction ! Basic Concepts ! Secondary Storage ! Sequential Files ! Direct Files ! Indexed Files ! Tree-Based Files ! Multilist & Inverted Files

slide-3
SLIDE 3

rasitjutrakul

Managing Large Quantities of Data "Accessed by multiple people and programs "Kept on external storage devices "Always reliably available for processing "Rapidly accessible when information is needed

slide-4
SLIDE 4

rasitjutrakul

Speed & Capacity "Disks are slow.

– RAM ≈ 100 ns – Disk ≈ 10 ms

"Disks provide enormous capacity.

– RAM ≈ 10 MB (volatile) – Disk ≈ 1000 MB (nonvolatile)

slide-5
SLIDE 5

rasitjutrakul

Design Goal

Minimizing disk accesses for files that keep changing in content and size. Minimizing disk accesses for files that keep changing in content and size.

slide-6
SLIDE 6

rasitjutrakul

"1950s

: Sequential access + indexes

"1960s

: Tree Structures

"1970s

: B-tree

"1980s

: Extendible Hashing

A Short History

slide-7
SLIDE 7

rasitjutrakul

Basic Concept : Outline

! Files ! Records, Fields ! Keys ! Users ! File Processing ! File Design

slide-8
SLIDE 8

rasitjutrakul

Filing System

Persistence Persistence Sharability Sharability Size Size

slide-9
SLIDE 9

rasitjutrakul

Files

Savings Account File Checking Accounts File Loan Applications File Employee File

slide-10
SLIDE 10

rasitjutrakul

Records

Account Name Address Balance

018-745-96 Thongdee 36 Sathon, 10600 25,250.93 108-964-09 Dundee 488 Rama 4, 10330 2,252.00 116-057-43 Yudee 56 Chareonkrung, 10210 99,768.25 248-922-88 Wangdee 102 Bantadthong, 10330 125,899.29 741-673-76 Dundee 77 Saphanluang, 10330 232.48

Checking Accounts File Checking Accounts File

slide-11
SLIDE 11

rasitjutrakul

Fields

Account Name Address Balance

018-745-96 Thongdee 36 Sathon, 10600 25,250.93 108-964-09 Dundee 488 Rama 4, 10330 2,252.00 116-057-43 Yudee 56 Chareonkrung, 10210 99,768.25 248-922-88 Wangdee 102 Bantadthong, 10330 125,899.29 741-673-76 Dundee 77 Saphanluang, 10330 232.48

Checking Accounts File Checking Accounts File

slide-12
SLIDE 12

rasitjutrakul

Files & Records

! A file is a collection of records of the same

type.

! A record is a collection of related fields.

slide-13
SLIDE 13

rasitjutrakul

"Locate the Checking Account file. "Access the record whose contents of the Account field = 116-057-43. "Retrieve the record from the file. "Examine the contents of the Balance field. Keys

Find the Balance of [ Account = 116-057-43 ] Find the Balance of [ Account = 116-057-43 ]

slide-14
SLIDE 14

rasitjutrakul

Keys

Key is a field of a record whose contents identify the record. Key is a field of a record whose contents identify the record. Find the Balance of [ Account = 116-057-43 ] Find the Balance of [ Account = 116-057-43 ]

slide-15
SLIDE 15

rasitjutrakul

Primary Keys

Account Name Address Balance 018-745-96

Thongdee 36 Sathon, 10600 25,250.93

108-964-09

Dundee 488 Rama 4, 10330 2,252.00

116-057-43

Yudee 56 Chareonkrung, 10210 99,768.25

248-922-88

Wangdee 102 Bantadthong, 10330 125,899.29

741-673-76

Rakdee 77 Saphanluang, 10330 232.48

Primary key Primary key

A primary key is a field that uniquely identify the record.

slide-16
SLIDE 16

rasitjutrakul

Secondary Keys

Account Name Address Balance

018-745-96

Thongdee

36 Sathon, 10600 25,250.93 108-964-09

Dundee

488 Rama 4, 10330 2,252.00 116-057-43

Yudee

56 Chareonkrung, 10210 99,768.25 248-922-88

Wangdee

102 Bantadthong, 10330 125,899.29 741-673-76

Rakdee

77 Saphanluang, 10330 232.48

Secondary key Secondary key

A secondary key is a field that does identify the record, but this identification is not unique.

slide-17
SLIDE 17

rasitjutrakul

"End-users "Application programmers "System programmers Users

slide-18
SLIDE 18

rasitjutrakul

File Processing Systems

Retrieve Balance of Account = 116-057-43 Retrieve Balance of Account = 116-057-43 99,768.25 99,768.25 Checking Accounts File Processing System Checking Accounts File Processing System Checking Accounts File Checking Accounts File File System File System

end-users response application programmers system programmers

slide-19
SLIDE 19

rasitjutrakul

"End-users

– receive accurate information.

"Application programmers

– aware of the file organization, record structure, and access mechanisms.

"System programmers

– aware of the available tools and resources to enhance the file system efficiency.

Users' Concerns

slide-20
SLIDE 20

rasitjutrakul

Data Transfer

Logical Record Logical Record

Application programmers' view of the records

Physical Block Physical Block

System programmers' view of the records

Application Program Application Program File System File System

Logical READ Physical READ

slide-21
SLIDE 21

rasitjutrakul

Logical Records

typedef struct customerTag { int iAccount; char szName[20]; char szAddress[50]; float fBalance; } recCustomer; recCustomer CustomerRecord; typedef struct customerTag { int iAccount; char szName[20]; char szAddress[50]; float fBalance; } recCustomer; recCustomer CustomerRecord;

iAccount szName szAddress fBalance

slide-22
SLIDE 22

rasitjutrakul

Physical Blocks

System data Logical record #1 Logical record #2 Logical record #3

logical block physical block

Blocking factor Blocking factor

slide-23
SLIDE 23

rasitjutrakul

Blocking & Deblocking

Deblocking

Input buffer Input buffer Output buffer Output buffer Logical record Logical record Logical record Logical record

Blocking

Physical Block Physical Block Physical Block Physical Block

physical read physical write

slide-24
SLIDE 24

rasitjutrakul

Disk Caching

record record record record record record record record record record record record block block block block block block block block block block block block block block block block block block block block

user space buffer disk cache disk

slide-25
SLIDE 25

rasitjutrakul

"Blocking factor vs # Block transfers "Blocking factor vs Buffer size "Optimal blocking factor Blocking Factor

If the blocking factor were equal to the number of logical records then one could successfully argue that

  • nly one data transfer would be needed !!!
slide-26
SLIDE 26

rasitjutrakul

Logical & Physical File Structure "Logical file structure

– The organization of all logical records in the file.

"Physical file structure

– The organization of all the physical blocks stored in secondary storage.

slide-27
SLIDE 27

rasitjutrakul

Logical & Physical File Structure

record 1 record 1 record 2 record 2 record 3 record 3 record 4 record 4 . . . . . . record 48 record 48 record 49 record 49 record 50 record 50 record 1 record 1 record 2 record 2 record 3 record 3 record 4 record 4 record 49 record 49 record 50 record 50 record 47 record 47 record 48 record 48 key 1 key 1 key 2 key 2 key 3 key 3 key 4 key 4 . . . . . . key 48 key 48 key 49 key 49 key 50 key 50

sequential file physical linked sequential file

slide-28
SLIDE 28

rasitjutrakul

Access Path

31 70 90 130 162 200 250 5 9 15 19 23 27 31 39 42 49 53 60 65 70 2 Somchai P. ... 3 Somboon T. ... 5 Chukiat V. ... 7 Samruay W. ...

8 Supat R. ...

9 Chatchart S. ... 12 Kukiat R. ... 14 Wiwat W. ... 15 Boonchai S. ... 34 Yingyong E. ... 35 Rangsan S. ... 39 Kriengkai F. ...

slide-29
SLIDE 29

rasitjutrakul

Access Path

31 70 90 130 162 200 250 5 9 15 19 23 27 31 39 42 49 53 60 65 70 2 Somchai P. ... 3 Somboon T. ... 5 Chukiat V. ... 7 Samruay W. ...

8 Supat R. ...

9 Chatchart S. ... 12 Kukiat R. ... 14 Wiwat W. ... 15 Boonchai S. ... 34 Yingyong E. ... 35 Rangsan S. ... 39 Kriengkai F. ...

slide-30
SLIDE 30

rasitjutrakul

Access Methods

Physical File Structure Physical File Structure Access Method Access Method Target Record Target Record

slide-31
SLIDE 31

rasitjutrakul

Classification of Access Methods

Access methods Primary access methods Sequential access methods Sequential Random access methods Direct Hash Indexed sequential Binary search AVL-tree Paged tree B-tree B+ -tree Trie Secondary access methods Inverted file Cellular inverted Multilist Cellular multilist

slide-32
SLIDE 32

rasitjutrakul

"Logical file design

– select one of the available file organizations – design a new file organization

"Physical file design

– design the physical file

File Design

slide-33
SLIDE 33

rasitjutrakul

"Selection of blocking factor "Allocation of the I/O buffers "Size of the physical file "Organization of the physical blocks "Design or selection of the access method "Selection of the primary key "File growth "Reorganization point File Design

slide-34
SLIDE 34

rasitjutrakul

File Operations "RetrieveAll "Batch "RetrieveOne "RetrieveNext "RetrievePrevious "InsertOne "DeleteOne "UpdateOne "RetrieveFew

slide-35
SLIDE 35

rasitjutrakul

Performance "Response time

– The type of allowable operations. – The frequency of each type of operation.

  • Ex. 95% Retrieve_One

5% Batch Random or Sequential ?

Search length Search length