Strong whose properties are known at compilation Type conversions - - PowerPoint PPT Presentation

strong
SMART_READER_LITE
LIVE PREVIEW

Strong whose properties are known at compilation Type conversions - - PowerPoint PPT Presentation

A language is strongly typed, if: Every data element has a unique type, Strong whose properties are known at compilation Type conversions take place in a controlled typing manner, by interpreting value of one type as another Not


slide-1
SLIDE 1

146

Strong typing

  • A language is strongly typed, if:

– Every data element has a unique type, whose properties are known at compilation – Type conversions take place in a controlled manner, by interpreting value of one type as another

  • Not by (mis)interpreting bits in memory!
  • Type conversions are checked at

compilation

  • Static typing

– Every variable has a definite type at compile time: Strong typing →! compile time typing

Other definitions for "strong typing":

  • All type errors are

reported (compile- or runtime)

  • Operations that are

incompatible are prevented

slide-2
SLIDE 2

147

Typing in languages

  • Fortran (weak typing)

– No type checks in parameter passing – Equivalence-expression

  • Pascal (nearly strong typing)

– Records that are not type-safe

  • Ada (strong typing)

– Recors are type-safe

  • C and C++ (weak typing)

– union is not type safe – (Implicit type casting) – Type conversion between incompatible types

  • Java and C# (strong typing)
slide-3
SLIDE 3

152

Type conversions

  • Stronger typing (Ada)

– Explicit conversions allowed only in a limited sense – exception: Unchecked_Conversion

  • Weaker typing (C)

– Compiler can make automatic type conversion

  • coercive (implicit) type conversions

– Lack of consistency

  • When and what kind of automatic conversions

take place

  • When an explicit conversion is possible

Strong typing cast:

  • Explicit conversion

coersion:

  • Implicit conversion
slide-4
SLIDE 4

153

Types in languages

  • Scalar types
  • Enumerated types
  • Subtypes
  • Structured types (struct, classes, ...)

– Formed using type constructions

  • Pointer/reference types
  • Set types
  • Subroutine and function types

– We will return to this in the section about functions

  • Task types (Ada)

– Concurrency issue

slide-5
SLIDE 5

154

Numeric types

  • Representing numeric types

– Efficiency mandates internal representation – Variations in the representation makes portability harder

  • Twos complement for integers

– 2n-1..2n-1-1 e.g., 16 bits: -32768..32767

  • Floating points: (s * significand *

2^exponent)

– Standard for representation

  • bigger significand/mantissa yields more precision
  • bigger exponent yields a larger range
  • Decimals (Cobol, C#)

– Fixed numbers, fixed decimal point – Large accuracy, small value range

s exponent significand 1 8 23

slide-6
SLIDE 6

155

Character types

  • Representing one character in a

language

  • Common operations:

– Equality, lexicografical ordering – I/O – Conversions to string type and back – Upper to lower case and back, etc.

  • Representation in memory

– ASCII (ISO-646) (7 bits), ISO-latin etc. 8-bit

– Unicode codepoints (32 bit) (previously also 16-bit version)

slide-7
SLIDE 7

156

Char-type issues

  • Size and interpretation

– traditionally 8 bits (orig. 6-7), max. 256 chars – Not enough for non-English languages! – Solution 1: several encodings (iso-latin-X) – Solution 2: larger type (Unicode, 16- and 32-

bit)

  • What determines the encoding?

– Sourcefile encoding (char- and string literals,

identifiers)

– Memory representation?

  • Conversions with I/O
slide-8
SLIDE 8

157

Issues with char types

  • Comparison

– What determines alphabetical order, is < the

same?

– When are two characters equal?

  • One or many chars?

– Ligatures: ß, ij, Ľ, ŋ, œ, ä – upper/lower case problems: ß → SS – Many representations: ä (U+00E4) vs. ä = a¨

(U+0061, U+0308)

  • Does the programming language support many

character encodings?

– Several types for each encoding: Trouble – One type: needs to be large enough

(Unicode UCS4)

– Downwards compatibility issues...

slide-9
SLIDE 9

158

Source code encoding

  • Problem: The encoding of a text file is

not usually given

  • Char/string literals vs identifiers
  • Attempted solutions:

– Coding is set by the language(Java) – Coding determined by source code

(Python)

– Given in the command line (C++-

compilers)

– Not taken into account (C++, others)

slide-10
SLIDE 10

159

String types

  • Traditional: list/array of characters
  • Typical operations:

– Indexing – Iteration – Splicing – Printing etc.

  • How to encode?

– Char type problems remain

  • 32 bits quadruples memory

consumption

slide-11
SLIDE 11

160

Unicode and strings

  • Standard characters: 137 439 (ver 11.0)

(encoding allows 1 114 112)

  • Codes U+000000 – U+10FFFF for chars
  • Part of the codes reserved for other purpose
  • String encoding UTF-32 (UCS4)

– String is a sequence of 32-bit chars

(4 294 967 295 possible values)

– Easy, compare with 8-bit strings – Lots of memory consumed, wasted bits

slide-12
SLIDE 12

161

Unicode and strings

  • UTF-16

– String consists of 16-bit pairs of bytes – Unicode-symbols ≤ U+FFFF are coded

directly

– Rest are presented by surrogates, two 16-bit

bytepairs

– Surrogates are encoded so that first and

second code cannot be confused

– Consequence 1: A pair can immediately be

checked if it is a character, a surrogate beginning or a surrogate ending.

– Consequence 2: One symbol may be one or

two pairs of bytes!

slide-13
SLIDE 13

162

Unicode and strings

  • UTF-8

– 8-bit bytes form a symbol – Unicode-symbols ≤ U+7F directly (7-bit

ASCII)

– Rest up to ≤ U+7FF use 2 bytes (European

chars)

– Rest up to ≤ U+FFFF use 3 bytes (Asian

chars)

– Rest use 4 bytes (old languages, math

symbols ...)

– Consequence: One symbol is 1–4 bytes! – Multibyte symbols can be separated from 1-

byte symbols and so on.

– Consequence: bytes can be scanned until

beginning of a symbol is found.

slide-14
SLIDE 14

163

Unicode- challenges and problems

  • Indexing and other interpretation (UTF-8

and UTF-16)

– Symbols differ in size, s[i] is not the i+1th

symbol!

– String length and table size are not the

same!

– Indexing at the middle of a symbol – Replacing a symbol is complicated if of

different size

– → Encoding using strings is complicated – Unicode iterator: returns unicode

symbols and moves correctly in a string

slide-15
SLIDE 15

164

Unicode challenges

  • More memory and cache (esp. UTF-

32)

  • Alphabets

– Number-coded alphabetizing

impossible

  • Equality

– Strings must be normalised before

comparing

  • Conversions are needed if multiple

codings are supported

slide-16
SLIDE 16

165

Unicode in some languages

  • Java

– String coding UTF-16 – Source code coding UTF-8

  • Python

– Strings "raw" (Python 2) or Unicode (2 & 3) – Source code can present what encoding is

used

  • C++

– C++03: not defined (wchar_t = UTF-32?) – C++11: support for UTF-8/16/32, but not full – Source encoding compiler-dependent