strong
play

Strong whose properties are known at compilation Type conversions - PowerPoint PPT Presentation

A language is strongly typed, if: Every data element has a unique type, Strong whose properties are known at compilation Type conversions take place in a controlled typing manner, by interpreting value of one type as another Not


  1. ● A language is strongly typed, if: – Every data element has a unique type, Strong whose properties are known at compilation – Type conversions take place in a controlled typing manner, by interpreting value of one type as another • Not by (mis)interpreting bits in memory! Other definitions for • Type conversions are checked at "strong typing": compilation • All type errors are ● Static typing reported (compile- or – Every variable has a definite type at runtime) compile time: • Operations that are Strong typing →! compile time typing incompatible are prevented 146

  2. ● Fortran (weak typing) – No type checks in parameter passing Typing in – Equivalence -expression ● Pascal (nearly strong typing) languages – Records that are not type-safe ● Ada (strong typing) – Recors are type-safe ● C and C++ (weak typing) – union is not type safe – (Implicit type casting) – Type conversion between incompatible types ● Java and C# (strong typing) 147

  3. ● Stronger typing (Ada) Strong typing Type – Explicit conversions allowed only in a limited sense conversions – exception: Unchecked_Conversion ● Weaker typing (C) – Compiler can make automatic type conversion • coercive (implicit) type conversions – Lack of consistency • When and what kind of automatic conversions cast: take place • Explicit conversion coersion: • When an explicit conversion is possible • Implicit conversion 152

  4. ● Scalar types ● Enumerated types Types in ● Subtypes languages ● Structured types (struct, classes, ...) – Formed using type constructions ● Pointer/reference types ● Set types ● Subroutine and function types – We will return to this in the section about functions ● Task types (Ada) – Concurrency issue 153

  5. ● Representing numeric types – Efficiency mandates internal representation Numeric – Variations in the representation makes portability harder types ● Twos complement for integers – 2 n-1 ..2 n-1 -1 e.g., 16 bits: -32768..32767 ● Floating points: (s * significand * 2^exponent) – Standard for representation • bigger significand/mantissa yields more precision • bigger exponent yields a larger range 1 8 23 ● Decimals (Cobol, C#) s exponent significand – Fixed numbers, fixed decimal point – Large accuracy, small value range 154

  6. ● Representing one character in a language Character ● Common operations: types – Equality, lexicografical ordering – I/O – Conversions to string type and back – Upper to lower case and back, etc. ● Representation in memory – ASCII (ISO-646) (7 bits), ISO-latin etc. 8-bit – Unicode codepoints (32 bit) (previously also 16-bit version) 155

  7. ● Size and interpretation – traditionally 8 bits (orig. 6-7), max. 256 chars Char-type – Not enough for non-English languages! issues – Solution 1: several encodings (iso-latin-X) – Solution 2: larger type (Unicode, 16- and 32- bit) ● What determines the encoding? – Sourcefile encoding (char- and string literals, identifiers) – Memory representation? ● Conversions with I/O 156

  8. Comparison ● – What determines alphabetical order, is < the Issues with same? – When are two characters equal? char types One or many chars? ● – Ligatures : ß, ij, Ľ, ŋ, œ, ä – upper/lower case problems: ß → SS – Many representations: ä (U+00E4) vs. ä = a¨ (U+0061, U+0308) Does the programming language support many ● character encodings? – Several types for each encoding: Trouble – One type: needs to be large enough (Unicode UCS4) – Downwards compatibility issues... 157

  9. ● Problem: The encoding of a text file is not usually given Source code ● Char/string literals vs identifiers encoding ● Attempted solutions: – Coding is set by the language(Java) – Coding determined by source code (Python) – Given in the command line (C++- compilers) – Not taken into account (C++, others) 158

  10. ● Traditional: list/array of characters ● Typical operations: String types – Indexing – Iteration – Splicing – Printing etc. ● How to encode? – Char type problems remain ● 32 bits quadruples memory consumption 159

  11. ● Standard characters: 137 439 (ver 11.0) (encoding allows 1 114 112) Unicode and ● Codes U+000000 – U+10FFFF for chars strings ● Part of the codes reserved for other purpose ● String encoding UTF-32 (UCS4) – String is a sequence of 32-bit chars (4 294 967 295 possible values) – Easy, compare with 8-bit strings – Lots of memory consumed, wasted bits 160

  12. UTF-16 ● – String consists of 16-bit pairs of bytes Unicode and – Unicode-symbols ≤ U+FFFF are coded strings directly – Rest are presented by surrogates, two 16-bit bytepairs – Surrogates are encoded so that first and second code cannot be confused – Consequence 1: A pair can immediately be checked if it is a character, a surrogate beginning or a surrogate ending. – Consequence 2: One symbol may be one or two pairs of bytes! 161

  13. UTF-8 ● – 8-bit bytes form a symbol Unicode and – Unicode-symbols ≤ U+7F directly (7-bit strings ASCII) – Rest up to ≤ U+7FF use 2 bytes (European chars) – Rest up to ≤ U+FFFF use 3 bytes (Asian chars) – Rest use 4 bytes (old languages, math symbols ...) – Consequence: One symbol is 1–4 bytes! – Multibyte symbols can be separated from 1- byte symbols and so on. – Consequence: bytes can be scanned until beginning of a symbol is found. 162

  14. ● Indexing and other interpretation (UTF-8 and UTF-16) Unicode- – Symbols differ in size, s[i] is not the i+1 th challenges symbol! and problems – String length and table size are not the same! – Indexing at the middle of a symbol – Replacing a symbol is complicated if of different size – → Encoding using strings is complicated – Unicode iterator: returns unicode symbols and moves correctly in a string 163

  15. ● More memory and cache (esp. UTF- 32) Unicode ● Alphabets challenges – Number-coded alphabetizing impossible ● Equality – Strings must be normalised before comparing ● Conversions are needed if multiple codings are supported 164

  16. ● Java – String coding UTF-16 Unicode in – Source code coding UTF-8 some ● Python languages – Strings "raw" (Python 2) or Unicode (2 & 3) – Source code can present what encoding is used ● C++ – C++03: not defined (wchar_t = UTF-32?) – C++11: support for UTF-8/16/32, but not full – Source encoding compiler-dependent 165

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend