CSCI 3136 Principles of Programming Languages Data Types and Memory - - PowerPoint PPT Presentation

csci 3136 principles of programming languages
SMART_READER_LITE
LIVE PREVIEW

CSCI 3136 Principles of Programming Languages Data Types and Memory - - PowerPoint PPT Presentation

CSCI 3136 Principles of Programming Languages Data Types and Memory Management Summer 2013 Faculty of Computer Science Dalhousie University 1 / 21 What is a Type System? A type system is a mechanism for defining types and associating them


slide-1
SLIDE 1

CSCI 3136 Principles of Programming Languages

Data Types and Memory Management

Summer 2013 Faculty of Computer Science Dalhousie University

1 / 21

slide-2
SLIDE 2

What is a Type System?

A type system is a mechanism for defining types and associating them with operations that can be performed on objects of this type. A type system includes rules that specify

  • Type equivalence: Do two values have the same type?
  • Type compatibility: Can a value of a certain type be used in a

certain context?

  • Type inference: How is the type of an expression computed from the

types of its parts?

2 / 21

slide-3
SLIDE 3

Types in a Language

  • Strongly typed: Prohibits application of an operation to any object

not supporting this operation.

  • Statically typed: Strongly typed and type checking is performed at

compile time (Pascal, C, Haskell, . . . )

  • Dynamically typed:Types of operands of operations are checked at

run time (LISP, Smalltalk, . . . )

  • Programmer does not specify types at all, compiler infers types from

context (e.g., ML)

3 / 21

slide-4
SLIDE 4

Definition of Types

Similar to subroutines in many languages, defining a type has two parts:

  • A type’s declaration introduces its name into the current scope.
  • A type’s definition describes the type (the simpler types it is

composed of). Three ways to think about types:

  • Denotational: A type is a set of values.
  • Constructive: A type is built-in or composite.
  • Abstraction-based: A type is defined by an interface, the set of
  • perations it supports.

4 / 21

slide-5
SLIDE 5

Classification of Types

Built-in types

  • Integers, Booleans, characters, real numbers, . . .

Enumeration and range types (neither built-in nor composite)

  • C:

enum DAY /* Defines an enumeration type */ { saturday, /* Names day and declares a */ sunday = 0, /* variable named workday with */ monday, /* that type */ tuesday, wednesday, /* wednesday is associated with 3 */ thursday, friday } workday;

  • Pascal: 0..100

Composite types

  • Records, arrays, files, lists, sets, pointers, . . .

5 / 21

slide-6
SLIDE 6

Records

  • A nested record definition in Pascal:

type ore = record name : short_string; element_yielded : record name : two_chars; atomic_n : integer; atomic_weight : real; metallic : Boolean end end;

  • Accessing fields
  • re.element yielded.name

name of element yielded of ore

6 / 21

slide-7
SLIDE 7

Memory Layout of Records

Aligned (fixed ordering) Packed Aligned (optimized ordering)

− Potential waste of space + One machine operation per element access + Guaranteed layout in memory (good for systems programming) + No waste of space − Multiple machine operations per memory access + Guaranteed layout in memory (good for systems programming) ± Reduced space overhead + One machine operation per memory access − No guarantee of layout in memory (bad for systems programming)

7 / 21

slide-8
SLIDE 8

Contiguous Memory Layouts for 2-d Arrays

Row-major layout Column-major layout There are more sophisticated block-recursive layouts which, combined with the right algorithms, achieve much better cache efficiency than the above.

8 / 21

slide-9
SLIDE 9

Contiguous and Row-Pointer Memory Layout

char days[][10] = { "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday" }; days[2][3] == ’s’; char *days[] = { "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday" }; days[2][3] == ’s’;

Memory layout determines space usage and nature and efficiency of address calculations.

9 / 21

slide-10
SLIDE 10

Arrays

  • Time at which array shape is bound is important

For example,

  • static array is bound at compile time (stored in static memory)
  • local array is bound at either compile time or elaboration time (i.e.,

binding creation time) (stored in stack)

  • . . .
  • Issues
  • Memory allocation
  • Bounds checks
  • Index calculations (higher-dimensional arrays)

10 / 21

slide-11
SLIDE 11

Arrays, Lists, and Strings

A list

  • Most imperative languages provide excellent built-in support for

array manipulation but not for operations on lists.

  • Most functional languages provide excellent built-in support for list

manipulation but not for operations on arrays.

  • Arrays are a natural way to store sequences when manipulating

individual elements in place (i.e., imperatively).

  • Lists are naturally recursive and thus fit extremely well into the

recursive approach taken to most problems in functional programming.

  • Strings are arrays of characters in imperative languages and lists of

characters in functional languages.

11 / 21

slide-12
SLIDE 12

Pointers

  • Point to memory locations that store data (often of a specified type,

e.g., int*)

  • Are not required in languages with reference model of variables

(Lisp, ML, CLU, Java)

  • Are required for recursive types in languages with value model of

variables (C, Pascal, Ada) Storage reclamation

  • Explicit (manual)
  • Automatic (garbage collection)

Advantages and disadvantages of explicit reclamation + Garbage collection can incur serious run-time overhead − Potential for memory leaks − Potential for dangling pointers and segmentation faults

12 / 21

slide-13
SLIDE 13

Pointer Allocation and Deallocation

C

  • p = (element *)malloc(sizeof(element))
  • free(p)
  • Explicit deallocation

Pascal

  • new(p)
  • dispose(p)
  • Explicit deallocation

Java/C++

  • p = new element() (semantics different between Java and C++,

how?)

  • delete p (in C++)
  • Explicit deallocation in C++, garbage collection in Java

13 / 21

slide-14
SLIDE 14

Dangling References

  • A dangling reference is a pointer to an already reclaimed object.

Dangling references are notoriously hard to debug and a major source of program misbehaviour and security holes.

  • Techniques to catch them:
  • Tombstones
  • Keys and locks

14 / 21

slide-15
SLIDE 15

Tombstones

new(p) q := p delete(p) Issues:

  • Space overhead
  • Runtime overhead
  • Easy to change location
  • f object in heap (just

change tombstone address)

  • Invalid tombstones (RIP

= null pointer)

  • Deallocate the

tombstones (need reference count or other garbage collection strategy)

15 / 21

slide-16
SLIDE 16

Locks and Keys

new(p) q := p delete(p)

  • Pointer = address + key

Object has lock

  • When reclaiming object, change the

lock

  • Tombstones vs. locks/keys:

Efficiency comparison (space

  • verhead or run-time overhead)

unclear

Most compilers do not by default generate code to check for dangling

  • references. Most Pascal compilers allow the programmer to request

dynamic checks, which are usually implemented with locks and keys.

16 / 21

slide-17
SLIDE 17

Garbage Collection

Automatic reclamation of space/objects

  • Essential for functional languages
  • Popular in imperative languages (Clu, Ada, Modula-3, Java)
  • Difficult to implement
  • Slower than manual reclamation

Garbage collection methods

  • Reference counts
  • Mark and sweep
  • Mark and sweep variants
  • Stop and copy
  • Generational technique

17 / 21

slide-18
SLIDE 18

Reference Counts

a = new Obj(); b = new Obj(); b = a; a = null; b = null;

  • Associate reference count

with each object.

  • Set to 1 when object is

allocated.

  • Adjust
  • when one pointer

assigned to another.

  • on subroutine return.

Pros/cons + Fairly simple to implement + Fairly low cost − Does not work when there are circular references.

18 / 21

slide-19
SLIDE 19

Garbage

garbage root local var

  • bject4
  • bject1
  • bject2
  • bject3

static var live live live root

19 / 21

slide-20
SLIDE 20

Mark and Sweep

for each root variable r mark (r); sweep (); ———————————– void mark (Object p) if (!p.marked) p.marked = true; for each Object q referenced by p mark (q); ———————————– void sweep () for each Object p in the heap if (p.marked) p.marked = false else heap.release (p); ———————————–

20 / 21

slide-21
SLIDE 21

Mark and Sweep

Pros/cons: − More complicated to implement − Requires inspection of all allocated blocks in a sweep: costly. − High space usage if the recursion is deep. − Requires type descriptor at the beginning of each block to know the size of the block and to find the pointers in the block. + Works with circular data structures.

21 / 21