Information Theory Amount of information in a message by the - - PowerPoint PPT Presentation

information theory
SMART_READER_LITE
LIVE PREVIEW

Information Theory Amount of information in a message by the - - PowerPoint PPT Presentation

Information Theory Amount of information in a message by the average number of bits needed to encode all possible messages in an optimal encoding. In computer systems, programs and text files are usually encoded with 8-bit ASCII codes,


slide-1
SLIDE 1

Information Theory

  • Amount of information in a message by the

average number of bits needed to encode all possible messages in an optimal encoding.

  • In computer systems, programs and text files are

usually encoded with 8-bit ASCII codes, regardless

  • f the amount of information in them.
  • text files can be compressed by about 40%

without losing any information.

  • Amount of Information: Entropy

– function of the probability distribution over the set of all possible messages.

slide-2
SLIDE 2

Entropy H (X) = Example: Suppose there are two possibilities: Male and Female, both equally likely; Thus p(Male) = p(Female) = 1/2. Then

slide-3
SLIDE 3

Because 1/p(X) decreases as p(X) increases, an optimal encoding uses short codes for frequently occurring messages at the expense of using longer ones for infrequent messages. This principle is applied in Morse code, where the most frequently used letters are assigned the shortest codes. Huffman codes are optimal codes assigned to characters, words, machine instructions, or phrases. Single-character Huffman codes are frequently used to compact large files.

slide-4
SLIDE 4

An optimal encoding assigns a 1-bit code to A and 2-bit codes to B and C. For example, A can be encoded with the bit 0, while B and C can be encoded with two bits each, 10 and 11. Using this encoding, the 8-letter sequence ABAACABC is encoded as the 12-bit sequence 010001101011 AB AAC AB C 0 10 0 0 11 0 10 11 The average number of bits per letter is 12/8 = 1.5. The above encoding is optimal; the expected number of bits per letter would be at least 1.5 with any other encoding. Note that B, for example, cannot be encoded with the single bit 1, because it would then be impossible to decode the bit sequence 11 (it could be either BB

  • r C).

Morse code avoids this problem by separating letters with spaces. Because spaces (blanks) must be encoded in computer applications, this approach in the long run requires more storage,

slide-5
SLIDE 5

Example: Suppose all messages are equally likely; that is, p(Xi) = 1/n for i = 1 . . . . , n. Then H(X)= n[(1/2)log2 n] = log2 n. Thus, log2 n bits are needed to encode each message. and k bits are needed to encode each possible message, m Example: Let n = 1 and p(X) = 1. Then H(X) = log2 1 = 0. There is no information because there is no choice,

slide-6
SLIDE 6

Given n, H(X) is maximal for p(X1) = . . . = p(Xn) = 1/n; that is, when all messages are equally likely . H(X) decreases as the distribution of messages becomes more and more skewed, reaching a minimum of H(X) = 0 when p(Xi) = 1 for some message Xi Example: Suppose X – is a 32-bit integer variable. Then X can have at most 32 bits of information. If small values of X are more likely than larger ones (as is typical in most programs), then H(X) will be less than 32, and if the exact value of X is known, H(X) will be 0.

slide-7
SLIDE 7

The entropy of a message: measures its uncertainty in that it gives the number of bits of information that must be learned when the message has been distorted by a noisy channel or hidden in ciphertext. Example: if a cryptanalyst knows the ciphertext block "Z$JP7K" corresponds to either the plaintext "MALE" or the plaintext "FEMALE", the uncertainty is only one bit. The cryptanalyst need only determine one character, say the first, and because there are only two possibilities for that character, only the distinguishing bit of that character need be determined. If it is known that the block corresponds to a salary, then the uncertainty is more than one bit, but it can be no more than log2 n bits, where n is the number of possible salaries. Similarly Bank Pins

slide-8
SLIDE 8

For a given language, consider the set of all messages N characters long. The rate of the language for messages of length N is defined by r = H(X)/N;

  • that is, the average number of bits of information in each character.

For large N, estimates of r for English range from 1.0 bits/letter to 1.5 bits/letter. The absolute rate of the language = the maximum number of bits of information that could be encoded in each character assuming all possible sequences of characters are equally likely. If there are L characters in the language, then the absolute rate is given by R = log2 L, the maximum entropy of the individual characters. For English, R = log2 26 = 4.7 bits/letter. The actual rate of English is thus considerably less than its absolute rate. The reason is that English, like all natural languages, is highly redundant. For example, the phrase "occurring frequently" could be reduced by 58% to "crng frq" without loss of information.

slide-9
SLIDE 9

Statistical Properties (1)

  • Single letter frequency distributions: Certain letters such as E, T, and A
  • ccur much more frequently than others.
  • 2- Digram frequency distributions. Certain digrams (pairs of letters) such as

TH and EN occur much more frequently than others. Some digrams (e.g., QZ) never occur in meaningful messages even when word boundaries are ignored (acronyms are an exception).

  • 3-Trigram distributions. The proportion of meaningful sequences decreases

when trigrams are considered (e.g., BBB is not). Among the meaningful trigrams, certain sequences such as THE and ING occur much more frequently than others.

  • 4-N-gram distributions. As longer sequences are considered, the

proportion of meaningful messages to the total number of possible letter sequences decreases. Long messages are structured not only according to letter sequences within a word but also by word sequences (e.g., the phrase PROGRAMMING LANGUAGES is much more likely than the phrase LANGUAGES PROGRAMMING).

slide-10
SLIDE 10

Statistical properties (2)

  • Programming languages have a similar structure:

– Here there is more freedom in letter sequences – e.g., the variable name QZK is perfectly valid – but the language syntax imposes other rigid rules about the placement of keywords and delimiters.

  • The rate of a language (entropy per character) is determined by

estimating the entropy of N-grams for increasing values of N.

  • As N increases, the entropy per character decreases because there

are fewer choices and certain choices are much more likely. The decrease is sharp at first but tapers off quickly;

  • The rate is estimated by extrapolating for large N.
slide-11
SLIDE 11

Rate / Absolute rate of a Language

  • Rata of the language: for messages of length N =

H(X)/N where X is the message – average no. of bits of information in each character

– For large N, estimates of r for English range from 1.0 bits/letter to 1.5 bits/letter

  • Absolute Rate of the Language: maximum no. of

bits of info that can be encoded in each character assuming all possible sequences of characters are likely.

  • If there are L characters in the language then R =

log_2 (26) = 4.7 bits/ letter --

slide-12
SLIDE 12

Statistical Properties (3)

  • The redundancy of a language with rate r and

absolute rate R is defined by

  • D = R – r
  • For R = 4.7 and r = 1, D = 3.7, whence the ratio

D/R shows English to be about 79% redundant;

  • For r = 1.5, D = 3.2, implying a redundancy of

68%.

  • Often use conservative estimates in practice
slide-13
SLIDE 13

Statistical Properties (3)

  • The uncertainty of messages may be reduced

given additional information.

  • Ex: X - 32-bit integer such that all values are

equally likely;

  • Thus the entropy of X is H(X) = 32.
  • Suppose it is known that X is even.
  • Then the entropy is reduced by one bit

because the low order bit must be 0.

slide-14
SLIDE 14

Equivocation

slide-15
SLIDE 15

Equivocation

Example:

  • Let n = 4 and p(X) = 1/4 for each message X;
  • thus H(X) = log_2( 4) = 2.
  • Similarly, let m = 4 and p(Y) = 1/4 for each message Y.
  • Now, suppose each message Y narrows the choice of X

to two of the four messages where both messages are equally likely:

  • Y1: X1 or X2, Y2: X2 or X3
  • Y3:X3 or X4, Y4:X4 or X1 .
  • Then for each Y, py (X) = 1/2 for two of the X's and

py(X) = 0 for the remaining two X's.

slide-16
SLIDE 16

Equivocation is then given by

Thus knowledge of Y reduces the uncertainty of X to one bit, corresponding to the two remaining choices for X.

slide-17
SLIDE 17

Perfect Secrecy (Shannon)

= the probability that message M was sent given that C was received

  • thus C is the encryption of message M.

Perfect secrecy is defined by the condition

  • intercepting the ciphertext gives a cryptanalyst no additional information

= probability of receiving ciphertext C given that M was sent. Then Usually, with one key; but some can map many to one THUS For perfect secrecy

slide-18
SLIDE 18

P(M) = 1/4, and P_M(C) = p(C) = 1/4 for all M and C.

Example: Four messages, all equally likely, and four keys, also equally likely

A cryptanalyst intercepting one of the ciphertext messages C1, C2, C3, or C4 would have no way of determining which of the four keys was used and, therefore, whether the correct message is M1, M2, M3, or M4

slide-19
SLIDE 19

Perfect Secrecy

  • Perfect secrecy requires that the number of keys

must be at least as great as the number of possible messages.

slide-20
SLIDE 20

M = THERE IS NO OTHER LANGUAGE BUT FRENCH Only one of the keys (K = 18) produces a meaningful message,

slide-21
SLIDE 21
slide-22
SLIDE 22

Example:

  • With a slight modification to the preceding scheme,

we can create a cipher having perfect secrecy.

  • The trick is to shift each letter by a random amount.

Specifically, K is given by a stream

integer in the range [0, 25] giving the amount of shift for the ith letter Then the 31-character ciphertext C in the preceding example could correspond to any valid 31-character message, because each possible plaintext message is derived by some key stream. For example, the plaintext message THIS SPECIES HAS ALWAYS BEEN EXTINCT.]" is derived by the key stream 18, 18, 14, 17, 4 , . . . . Though most of the 31-character possible plaintext messages can be ruled

  • ut as not being valid English, this much is known even without the ciphertext.

Perfect secrecy is achieved because interception of the ciphertext does not reveal anything new about the plaintext message.

slide-23
SLIDE 23

The key stream must not repeat or be used to encipher another

  • message. Otherwise, it may be possible to break the cipher by

correlating two ciphertexts enciphered under the same portion

  • f the stream

A cipher using a nonrepeating random key stream such as the

  • ne described in the preceding example is called a one-time

pad. One-time pads are the only ciphers that achieve perfect secrecy.

slide-24
SLIDE 24

DFM = (S,O, SC, ≤, ⊕) S

  • S is a set of subjects /principals (active agents responsible for all info.

flow),

  • O is a set of objects (info. containers),
  • SC is a set of security classes ,
  • ≤ is a binary relation on security classes that specifies permissible info.

flows .

  • sc1≤ sc2 means: info. in security class sc1 is allowed/permitted to flow

into security class sc2,

  • ⊕ is the class-combining binary operator (assoc. & comm.) that specifies,

for any pair of operand classes, the class in which the result of any binary function on values from the operand classes belongs

26

Denning’s Information Flow Model (DFM)

slide-25
SLIDE 25

Clinical Information Systems Security Policy

  • Intended for medical records

– Conflict of interest not critical problem – Patient confidentiality, authentication of records and annotators, and integrity are

  • Entities:

– Patient: subject of medical records (or agent) – Personal health information: data about patient’s health or treatment enabling identification of patient – Clinician: health-care professional with access to personal health information while doing job

slide-26
SLIDE 26

Assumptions and Principles

  • Assumes health information involves 1 person

at a time

– Not always true; OB/GYN involves father as well as mother

  • Principles derived from medical ethics of

various societies, and from practicing clinicians

slide-27
SLIDE 27

Access

  • Principle 1: Each medical record has an access

control list naming the individuals or groups who may read and append information to the

  • record. The system must restrict access to

those identified on the access control list.

– Idea is that clinicians need access, but no-one

  • else. Auditors get access to copies, so they cannot

alter records

slide-28
SLIDE 28

Access

  • Principle 2: One of the clinicians on the access

control list must have the right to add other clinicians to the access control list.

– Called the responsible clinician

slide-29
SLIDE 29

Access

  • Principle 3: The responsible clinician must

notify the patient of the names on the access control list whenever the patient’s medical record is opened. Except for situations given in statutes, or in cases of emergency, the responsible clinician must obtain the patient’s consent.

– Patient must consent to all treatment, and must know of violations of security

slide-30
SLIDE 30

Access

  • Principle 4: The name of the clinician, the

date, and the time of the access of a medical record must be recorded. Similar information must be kept for deletions.

– This is for auditing. Don’t delete information; update it (last part is for deletion of records after death, for example, or deletion of information when required by statute). Record information about all accesses.

slide-31
SLIDE 31

Creation

  • Principle: A clinician may open a record, with

the clinician and the patient on the access control list. If a record is opened as a result of a referral, the referring clinician may also be

  • n the access control list.

– Creating clinician needs access, and patient should get it. If created from a referral, referring clinician needs access to get results of referral.

slide-32
SLIDE 32

Deletion

  • Principle: Clinical information cannot be

deleted from a medical record until the appropriate time has passed.

– This varies with circumstances.

slide-33
SLIDE 33

Confinement

  • Principle: Information from one medical

record may be appended to a different medical record if and only if the access control list of the second record is a subset of the access control list of the first.

– This keeps information from leaking to unauthorized users. All users have to be on the access control list.

slide-34
SLIDE 34

Aggregation

  • Principle: Measures for preventing aggregation of

patient data must be effective. In particular, a patient must be notified if anyone is to be added to the access control list for the patient’s record and if that person has access to a large number of medical records.

– Fear here is that a corrupt investigator may obtain access to a large number of records, correlate them, and discover private information about individuals which can then be used for nefarious purposes (such as blackmail)

slide-35
SLIDE 35

Enforcement

  • Principle: Any computer system that handles

medical records must have a subsystem that enforces the preceding principles. The effectiveness of this enforcement must be subject to evaluation by independent auditors.

– This policy has to be enforced, and the enforcement mechanisms must be auditable (and audited)

slide-36
SLIDE 36

Compare to Bell-LaPadula

  • Confinement Principle imposes lattice

structure on entities in model

– Similar to Bell-LaPadula

  • CISS focuses on objects being accessed; B-LP
  • n the subjects accessing the objects

– May matter when looking for insiders in the medical environment

slide-37
SLIDE 37

Originator Controlled Access Control (ORCON)

  • Problem: organization creating document

wants to control its dissemination

– Example: Secretary of Agriculture writes a memo for distribution to her immediate subordinates, and she must give permission for it to be disseminated further. This is “originator controlled” (here, the “originator” is a person).

slide-38
SLIDE 38

Requirements

  • Subject s  S marks object o  O as ORCON on

behalf of organization X. X allows o to be disclosed to subjects acting on behalf of organization Y with the following restrictions:

1.

  • cannot be released to subjects acting on behalf of
  • ther organizations without X’s permission; and

2. Any copies of o must have the same restrictions placed

  • n it.
slide-39
SLIDE 39

DAC Fails

  • Owner can set any desired permissions

– This makes 2 unenforceable

slide-40
SLIDE 40

MAC Fails

  • First problem: category explosion

– Category C contains o, X, Y, and nothing else. If a subject y  Y wants to read o, x  X makes a copy o. Note o has category C. If y wants to give z  Z a copy, z must be in Y— by definition, it’s not. If x wants to let w  W see the document, need a new category C containing o, X, W.

  • Second problem: abstraction

– MAC classification, categories centrally controlled, and access controlled by a centralized policy – ORCON controlled locally

slide-41
SLIDE 41

Combine Them

  • The owner of an object cannot change the access

controls of the object.

  • When an object is copied, the access control

restrictions of that source are copied and bound to the target of the copy.

– These are MAC (owner can’t control them)

  • The creator (originator) can alter the access control

restrictions on a per-subject and per-object basis.

– This is DAC (owner can control it)