- 11. Persistence
11. Persistence The use of files, streams and serialization for - - PowerPoint PPT Presentation
11. Persistence The use of files, streams and serialization for - - PowerPoint PPT Presentation
11. Persistence The use of files, streams and serialization for storing object model data Storing Application Data Without some way of storing data off-line computers would be virtually unusable imagine a Word Processor which forced you
Storing Application Data
Without some way of storing data off-line
computers would be virtually unusable
imagine a Word Processor which forced you to
complete a document and print it in a single session – who’d use it?
Most programs involve several types of data
Status information – e.g. index of current item in a list Convenience information – e.g. location and size of main
window (just another type of status information)
Object Model data – the state of every live object in a running
program
Some or all of this can be saved, either in a single site
- r spread among any of the available storage
mechanisms
Storage Mechanisms
Windows Registry
Part of the operating system (files owned by the O/S) Good for storing small amounts of data
Files
Standard way of persisting information Can be highly structured or very simple, depending on data
being stored
XML
Files again, but based on standards that make it possible for
different systems to share the data
Databases
Very structured and with a lot of program overhead, but very
efficient for saving large amounts of data (we will cover this in detail next chapter)
The Registry
The Windows Registry is a large text database, that
stores data in a hierarchical structure
Application data is stored in a tree structure– typically
The application name is the top level – e.g. MyProgram A section name to group related items of data together – e.g.
RecentFiles
A key name, that specifies a name and a single item of data –
e.g. File1=“C:\MyData\Datafile1.dat”
The registry is an operating-system wide resource, and
so must be treated with care
NO STORING LARGE AMOUNTS OF DATA, because
potentially every program in windows will use the registry
Use only the standard functions – GetSetting() and
SaveSetting() for reading and writing
e.g. Saving a form’s size and position in the registry
Private Sub frmRegistry_Load(ByVal sender As System.Object, _ ByVal e As System.EventArgs) _ Handles MyBase.Load Me.Left = CInt(GetSetting("RegDemo", "Position", "Left", _ CStr(Me.Left))) Me.Top = CInt(GetSetting("RegDemo", "Position", "Top", _ CStr(Me.Top))) Me.Width = CInt(GetSetting("RegDemo", "Size", "Width", _ CStr(Me.Width))) Me.Height = CInt(GetSetting("RegDemo", "Size", "Height", _ CStr(Me.Height))) End Sub Private Sub frmRegistry_Closing(ByVal sender As Object, _ ByVal e As System.ComponentModel.CancelEventArgs) _ Handles MyBase.Closing SaveSetting("RegDemo", "Position", "Left", CStr(Me.Left)) SaveSetting("RegDemo", "Position", "Top", CStr(Me.Top)) SaveSetting("RegDemo", "Size", "Width", CStr(Me.Width)) SaveSetting("RegDemo", "Size", "Height", CStr(Me.Height)) End Sub
File Storage
All computer data (including registry data, database
data) is stored in files if it needs to be persisted
Various device types (Disks, Hard Disks, CD-R/Ws,
Mag-Tape, Flash cards etc.) have data stored in them by the OS/ File System, so that all appear the same to a program – simple File devices
There are only 4 basic operations to worry about when
using files
Opening a file – prepares it for Read and Write operations Reading from a file – extracts an item of data and moves on to
prepare to read the next item
Writing to a file – inserts new data at the end of the file Closing a file – files that are open are vulnerable to corruption.
Closing a file puts it into a safe state
Files and Streams
Because of the way a file works we can think of it as having a flow of
data
Data is read from a file in exactly the same order it was written to it The name used to indicate this is a stream (although streams can also
be to a network, memory, a modem or other devices)
In .NET, most files are treated as streams
StreamReader class defines objects that know how to read from a
stream
StreamWriter class defines objects that know how to read from a stream
Data sent to a stream can be ambiguous, because there is no
automatic way to separate one item from the next
e.g. save 10, 20, 30 and 40, and it will be written as 10203040 – all
crunched together
To deal with this, we use delimiters to mark the end of each item of
data
CSV – Comma Separated Variables, so the 4 numbers are saved as
“10, 20, 30, 40”
Other delimiters (e.g. Tab, Space) can be used instead, but comma is
normal
Structured Data and Streams
Saving Objects to a stream brings
new problems
How to separate the individual object
member fields
How to separate the different objects
Best approach is to precede each
- bject with a header, indicating its
class – do this for EVERY class, including individual member variables
When reading objects from a
stream, start by reading the header, then create the object and read the data into it (up to the next header)
This process is called Serialization
:BankAccount Joe Bloggs 1 High St., Sometown 12345678 £550.00 BankAccount Name : String Address : String AccountNo : Long Balance : Decimal
Class Object BANKACCOUNT~STRING Joe Bloggs~STRING 1High St.,Sometown~LONGINT12 345678~DECIMAL550.00 Stream
Note ~ is used as a header prefix in this example
Serialization
There are two ways to do serialization in .NET
Write Load() and Save() methods for each class, including code
to handle structure (collections etc.)
Use the .NET <Serializable()> attribute and the BinaryFormatter
- r XMLFormatter class to store the data
The first of these is likely to produce output that is easier
for a human to read, but involves a lot of work
The second requires less work, but produces Binary or
XML output. Binary can be difficult to fix if the data gets corrupted; XML contains a lot of redundant information.
XML
Serialization in general is not based on any specific standards
All programs/programmers/environments have their own variations,
based on ease of programming, efficiency (in storage) and other preferences
This makes it difficult to exchange data between programs
Two programs written by the same programmers can share data without
too much difficulty, but…
What about programs written by different programmers, in different
languages, or for different environments (e.g. .NET and Linux)
XML was created as a standard way of serializing data into files
XML uses plain text, so no problems about binary compatibility XML documents are ‘self-describing’, so the content of a document is
easy to interpret
XML is not a rigid language, but a format that allows new types of
document to be designed easily so that their content is described adequately for any given domain (e.g. finance, CAD) within the rules of XML
XML Format and Rules
An XML document has a tree structure
with a single root node (e.g. customer)
Each element of data is encolsed in an
- pening and a closing tag
<tag>Data</tag>
Null data can be represented by an
empty pair of tags <tag></tag> or an empty tag <tag/>
Elements can be nested, but this must
be done correctly e.g. <x><y>data</y></x>, not <x><y>data</x></y>
Tag names are case-sensitive e.g.
<Tag> is not the same as <TAG> or <tag>
Elements can have attributes, which
appear within the opening tag as a name and value – the value must be in quotes
<customer ID=”12345”> <name>Fred Bloggs</name> <address> <street>25 Glen Road</street> <town>Ayr</town> <postcode>KA11 1BG</postcode> </address> <lastorderdate>17/12/2002</lastorderdate> <email/> </customer>
Note empty email tag
System.XML
The System.XML namespace in .NET provides a
number of classes for reading, writing and formatting XML
Use XmlTextWriter class to create a XML document The XmlDocument class is used to read data from a Xml
file, and provides methods for extracting elements and attributes
The XmlNode class is used to accept single nodes
extracted from a XmlDocument or create new nodes
Since a XML element can be a complex item containing
collections and hierarchy, a XmlNode can house anything from an entire XML document to a single element containing one item
- f data
XML and Object Models
Best approach is to provide each class in an
application with methods for dealing with XML data
WriteXML() method can be used to pack the class
member data into a XML element and return it as a string
An overloaded New() method can be created to
accept an XmlNode as a parameter, and construct an
- bject from it
Using this approach, even complex hierarchies
can be dealt with easily in an application, since each class that needs to be persisted to and retrieved from XML can fend for itself
Example XML-Aware class
Class Subject Private mvarCode As String Private mvarTitle As String Private mvarMark As Integer Public Sub New(ByVal code As String, _ ByVal title As String, _ ByVal mark As Integer) mvarCode = code mvarTitle = title mvarMark = mark End Sub Public Sub New(ByVal subjectNode As XmlNode) Dim Code As String, Title As String, Mark As Integer mvarCode = subjectNode.Attributes("Code").Value mvarTitle = subjectNode.Item("Title").InnerText mvarMark = _ CType(subjectNode.Item("Mark").InnerText, Integer) End Sub ‘continues… Public Sub WriteXML(ByVal writer As XmlWriter) With writer .WriteStartElement("Subject") .WriteAttributeString("Code", mvarCode) .WriteElementString("Title", mvarTitle) .WriteElementString("Mark", _ mvarMark.ToString()) .WriteEndElement() End With End Sub End Class Constructor creates an
- bject from data in a
XML node WriteXML() method serializes object to a XmlWriter
Comparing persistence methods
Registry
Easy, good for small amounts of data only, text only
Files
Primitive, efficient, good for lots of data where structure does not
change – e.g. plain text, tables, possible to have random access
Serialization
More support from .NET, possible to store simple or complex data
structures, can be automated (using <Serializable()> attributes) or hand- coded (for human-readable structured output)
XML
Inefficient (can be more mark-up than data), but best for data
interchange due to self-descriptive nature
All of the above
Limitations in dealing with VERY LARGE data sets, where all can not be
held in main memory. Primitive file handling with Binary Access files can be used to provided access to small amounts of data from a large file, but coding is difficult and requires use of ancillary structures (indexes) to make it work
Summary
Programs that are expected to create significant outputs need some
persistence mechanism
Should distinguish between data stored for convenience (e.g. recent
files) and data stored for more strategic purposes
Files are the basis of saving all computer data The registry is a set of files for storing small amounts of data about
the computer, its configuration and applications
Simple file handling can be used for simple data More complex data must use more structured files – object models
require serializing
XML is a form of serialization where data is stored embedded in
descriptive tags, which makes the data easy to move to other systems
All forms of data storage have some limitations, either in scope
(registry), convenience (files), complexity (serialization) or storage efficiency (XML)