11. Persistence The use of files, streams and serialization for - - PowerPoint PPT Presentation

11 persistence
SMART_READER_LITE
LIVE PREVIEW

11. Persistence The use of files, streams and serialization for - - PowerPoint PPT Presentation

11. Persistence The use of files, streams and serialization for storing object model data Storing Application Data Without some way of storing data off-line computers would be virtually unusable imagine a Word Processor which forced you


slide-1
SLIDE 1
  • 11. Persistence

The use of files, streams and serialization for storing object model data

slide-2
SLIDE 2

Storing Application Data

Without some way of storing data off-line

computers would be virtually unusable

imagine a Word Processor which forced you to

complete a document and print it in a single session – who’d use it?

Most programs involve several types of data

Status information – e.g. index of current item in a list Convenience information – e.g. location and size of main

window (just another type of status information)

Object Model data – the state of every live object in a running

program

Some or all of this can be saved, either in a single site

  • r spread among any of the available storage

mechanisms

slide-3
SLIDE 3

Storage Mechanisms

Windows Registry

Part of the operating system (files owned by the O/S) Good for storing small amounts of data

Files

Standard way of persisting information Can be highly structured or very simple, depending on data

being stored

XML

Files again, but based on standards that make it possible for

different systems to share the data

Databases

Very structured and with a lot of program overhead, but very

efficient for saving large amounts of data (we will cover this in detail next chapter)

slide-4
SLIDE 4

The Registry

The Windows Registry is a large text database, that

stores data in a hierarchical structure

Application data is stored in a tree structure– typically

The application name is the top level – e.g. MyProgram A section name to group related items of data together – e.g.

RecentFiles

A key name, that specifies a name and a single item of data –

e.g. File1=“C:\MyData\Datafile1.dat”

The registry is an operating-system wide resource, and

so must be treated with care

NO STORING LARGE AMOUNTS OF DATA, because

potentially every program in windows will use the registry

Use only the standard functions – GetSetting() and

SaveSetting() for reading and writing

slide-5
SLIDE 5

e.g. Saving a form’s size and position in the registry

Private Sub frmRegistry_Load(ByVal sender As System.Object, _ ByVal e As System.EventArgs) _ Handles MyBase.Load Me.Left = CInt(GetSetting("RegDemo", "Position", "Left", _ CStr(Me.Left))) Me.Top = CInt(GetSetting("RegDemo", "Position", "Top", _ CStr(Me.Top))) Me.Width = CInt(GetSetting("RegDemo", "Size", "Width", _ CStr(Me.Width))) Me.Height = CInt(GetSetting("RegDemo", "Size", "Height", _ CStr(Me.Height))) End Sub Private Sub frmRegistry_Closing(ByVal sender As Object, _ ByVal e As System.ComponentModel.CancelEventArgs) _ Handles MyBase.Closing SaveSetting("RegDemo", "Position", "Left", CStr(Me.Left)) SaveSetting("RegDemo", "Position", "Top", CStr(Me.Top)) SaveSetting("RegDemo", "Size", "Width", CStr(Me.Width)) SaveSetting("RegDemo", "Size", "Height", CStr(Me.Height)) End Sub

slide-6
SLIDE 6

File Storage

All computer data (including registry data, database

data) is stored in files if it needs to be persisted

Various device types (Disks, Hard Disks, CD-R/Ws,

Mag-Tape, Flash cards etc.) have data stored in them by the OS/ File System, so that all appear the same to a program – simple File devices

There are only 4 basic operations to worry about when

using files

Opening a file – prepares it for Read and Write operations Reading from a file – extracts an item of data and moves on to

prepare to read the next item

Writing to a file – inserts new data at the end of the file Closing a file – files that are open are vulnerable to corruption.

Closing a file puts it into a safe state

slide-7
SLIDE 7

Files and Streams

Because of the way a file works we can think of it as having a flow of

data

Data is read from a file in exactly the same order it was written to it The name used to indicate this is a stream (although streams can also

be to a network, memory, a modem or other devices)

In .NET, most files are treated as streams

StreamReader class defines objects that know how to read from a

stream

StreamWriter class defines objects that know how to read from a stream

Data sent to a stream can be ambiguous, because there is no

automatic way to separate one item from the next

e.g. save 10, 20, 30 and 40, and it will be written as 10203040 – all

crunched together

To deal with this, we use delimiters to mark the end of each item of

data

CSV – Comma Separated Variables, so the 4 numbers are saved as

“10, 20, 30, 40”

Other delimiters (e.g. Tab, Space) can be used instead, but comma is

normal

slide-8
SLIDE 8

Structured Data and Streams

Saving Objects to a stream brings

new problems

How to separate the individual object

member fields

How to separate the different objects

Best approach is to precede each

  • bject with a header, indicating its

class – do this for EVERY class, including individual member variables

When reading objects from a

stream, start by reading the header, then create the object and read the data into it (up to the next header)

This process is called Serialization

:BankAccount Joe Bloggs 1 High St., Sometown 12345678 £550.00 BankAccount Name : String Address : String AccountNo : Long Balance : Decimal

Class Object BANKACCOUNT~STRING Joe Bloggs~STRING 1High St.,Sometown~LONGINT12 345678~DECIMAL550.00 Stream

Note ~ is used as a header prefix in this example

slide-9
SLIDE 9

Serialization

There are two ways to do serialization in .NET

Write Load() and Save() methods for each class, including code

to handle structure (collections etc.)

Use the .NET <Serializable()> attribute and the BinaryFormatter

  • r XMLFormatter class to store the data

The first of these is likely to produce output that is easier

for a human to read, but involves a lot of work

The second requires less work, but produces Binary or

XML output. Binary can be difficult to fix if the data gets corrupted; XML contains a lot of redundant information.

slide-10
SLIDE 10

XML

Serialization in general is not based on any specific standards

All programs/programmers/environments have their own variations,

based on ease of programming, efficiency (in storage) and other preferences

This makes it difficult to exchange data between programs

Two programs written by the same programmers can share data without

too much difficulty, but…

What about programs written by different programmers, in different

languages, or for different environments (e.g. .NET and Linux)

XML was created as a standard way of serializing data into files

XML uses plain text, so no problems about binary compatibility XML documents are ‘self-describing’, so the content of a document is

easy to interpret

XML is not a rigid language, but a format that allows new types of

document to be designed easily so that their content is described adequately for any given domain (e.g. finance, CAD) within the rules of XML

slide-11
SLIDE 11

XML Format and Rules

An XML document has a tree structure

with a single root node (e.g. customer)

Each element of data is encolsed in an

  • pening and a closing tag

<tag>Data</tag>

Null data can be represented by an

empty pair of tags <tag></tag> or an empty tag <tag/>

Elements can be nested, but this must

be done correctly e.g. <x><y>data</y></x>, not <x><y>data</x></y>

Tag names are case-sensitive e.g.

<Tag> is not the same as <TAG> or <tag>

Elements can have attributes, which

appear within the opening tag as a name and value – the value must be in quotes

<customer ID=”12345”> <name>Fred Bloggs</name> <address> <street>25 Glen Road</street> <town>Ayr</town> <postcode>KA11 1BG</postcode> </address> <lastorderdate>17/12/2002</lastorderdate> <email/> </customer>

Note empty email tag

slide-12
SLIDE 12

System.XML

The System.XML namespace in .NET provides a

number of classes for reading, writing and formatting XML

Use XmlTextWriter class to create a XML document The XmlDocument class is used to read data from a Xml

file, and provides methods for extracting elements and attributes

The XmlNode class is used to accept single nodes

extracted from a XmlDocument or create new nodes

Since a XML element can be a complex item containing

collections and hierarchy, a XmlNode can house anything from an entire XML document to a single element containing one item

  • f data
slide-13
SLIDE 13

XML and Object Models

Best approach is to provide each class in an

application with methods for dealing with XML data

WriteXML() method can be used to pack the class

member data into a XML element and return it as a string

An overloaded New() method can be created to

accept an XmlNode as a parameter, and construct an

  • bject from it

Using this approach, even complex hierarchies

can be dealt with easily in an application, since each class that needs to be persisted to and retrieved from XML can fend for itself

slide-14
SLIDE 14

Example XML-Aware class

Class Subject Private mvarCode As String Private mvarTitle As String Private mvarMark As Integer Public Sub New(ByVal code As String, _ ByVal title As String, _ ByVal mark As Integer) mvarCode = code mvarTitle = title mvarMark = mark End Sub Public Sub New(ByVal subjectNode As XmlNode) Dim Code As String, Title As String, Mark As Integer mvarCode = subjectNode.Attributes("Code").Value mvarTitle = subjectNode.Item("Title").InnerText mvarMark = _ CType(subjectNode.Item("Mark").InnerText, Integer) End Sub ‘continues… Public Sub WriteXML(ByVal writer As XmlWriter) With writer .WriteStartElement("Subject") .WriteAttributeString("Code", mvarCode) .WriteElementString("Title", mvarTitle) .WriteElementString("Mark", _ mvarMark.ToString()) .WriteEndElement() End With End Sub End Class Constructor creates an

  • bject from data in a

XML node WriteXML() method serializes object to a XmlWriter

slide-15
SLIDE 15

Comparing persistence methods

Registry

Easy, good for small amounts of data only, text only

Files

Primitive, efficient, good for lots of data where structure does not

change – e.g. plain text, tables, possible to have random access

Serialization

More support from .NET, possible to store simple or complex data

structures, can be automated (using <Serializable()> attributes) or hand- coded (for human-readable structured output)

XML

Inefficient (can be more mark-up than data), but best for data

interchange due to self-descriptive nature

All of the above

Limitations in dealing with VERY LARGE data sets, where all can not be

held in main memory. Primitive file handling with Binary Access files can be used to provided access to small amounts of data from a large file, but coding is difficult and requires use of ancillary structures (indexes) to make it work

slide-16
SLIDE 16

Summary

Programs that are expected to create significant outputs need some

persistence mechanism

Should distinguish between data stored for convenience (e.g. recent

files) and data stored for more strategic purposes

Files are the basis of saving all computer data The registry is a set of files for storing small amounts of data about

the computer, its configuration and applications

Simple file handling can be used for simple data More complex data must use more structured files – object models

require serializing

XML is a form of serialization where data is stored embedded in

descriptive tags, which makes the data easy to move to other systems

All forms of data storage have some limitations, either in scope

(registry), convenience (files), complexity (serialization) or storage efficiency (XML)