Towards Analytical Analyzing Probability . . . Techniques for - - PowerPoint PPT Presentation

towards analytical
SMART_READER_LITE
LIVE PREVIEW

Towards Analytical Analyzing Probability . . . Techniques for - - PowerPoint PPT Presentation

Main Objectives of . . . Need for Analytical . . . Describing the Users . . . Current State and . . . Towards Analytical Analyzing Probability . . . Techniques for Systems Gauging Accuracy of . . . Data Mining Engineering Applications


slide-1
SLIDE 1

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 1 of 71 Go Back Full Screen Close Quit

Towards Analytical Techniques for Systems Engineering Applications

Griselda Valdepe˜ nas Acosta

Systems Engineering Program University of Texas at El Paso, El Paso, Texas 79968, USA gvacosta@miners.utep.edu

slide-2
SLIDE 2

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 2 of 71 Go Back Full Screen Close Quit

Part I

Formulation of the Problem and a General Overview of the Results

slide-3
SLIDE 3

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 3 of 71 Go Back Full Screen Close Quit

1. Main Objectives of Systems Engineering: a Brief Reminder

  • One of the main goals of systems engineering is to de-

sign, maintain, and analyze systems to help users.

  • To design an appropriate system for an application do-

main, we need to know: – what are the users’ desires and preferences, and – what is the current state and what is the dynamics

  • f this application domain, and

– how to use all this information to select the best alternatives for the system design and maintenance.

slide-4
SLIDE 4

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 4 of 71 Go Back Full Screen Close Quit

2. Need for Analytical Techniques

  • Designing a system includes selecting numerical values

for many of the parameters describing this system.

  • At present, in many cases, this selection is made by

following semi-heuristic recommendations

  • Experience shows that such heuristic imprecise recom-

mendations often lead to less-than-perfect results.

  • It is therefore desirable to come up with analytical

techniques for system design, techniques based: – on valid numerical analysis and – on the solution of the corresponding optimization problems.

slide-5
SLIDE 5

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 5 of 71 Go Back Full Screen Close Quit

3. What We Do in This Dissertation: General Idea

  • System engineering is a very broad discipline, with

many different application domains.

  • Each domain has its own specifics and requires its own

analysis and, probably, it own analytical techniques.

  • In this dissertation, we:

– formulate and analyze general problems of system design, implementation, testing, and monitoring, – and show how the corresponding analytical tech- niques can be applied to different application do- mains.

slide-6
SLIDE 6

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 6 of 71 Go Back Full Screen Close Quit

4. Describing the User’s Preferences

  • In the ideal world, we should be able to ask each user’s
  • pinion about each of the alternatives.
  • However, for large systems, with many possible alter-

natives, this is not realistic.

  • Therefore, we need to extrapolate the user’s prefer-

ences based on available partial information.

  • There are analytical techniques for such extrapolation

– e.g., the widely used matrix factorization technique.

  • However, this technique is purely empirical – and thus,

not very reliable.

  • We provide a theoretical explanation for this technique.
  • The existence of such an explanation makes it more

reliable.

slide-7
SLIDE 7

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 7 of 71 Go Back Full Screen Close Quit

5. Describing the User’s Preferences (cont-d)

  • We need to take into account that the user’s prefer-

ences are usually not very detailed.

  • Thus, because of their approximate nature, we should

not waste time trying to fit them optimally.

  • This approximate nature is usually captured by the

empirical 7 plus minus 2 law.

  • According to this law, in the first approximation, a user

usually divides alternatives into 7 plus minus 2 groups.

  • This law is purely empirical – and thus, its use is not

as reliable as we would like it to be.

  • To make this law more reliable, we provide a partial

theoretical explanation of this law.

slide-8
SLIDE 8

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 8 of 71 Go Back Full Screen Close Quit

Figure 1: Why Seven?

slide-9
SLIDE 9

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 9 of 71 Go Back Full Screen Close Quit

6. What Is the Current State and Dynamics of an Application Domain?

  • We also need to know what is the current state and

what is the dynamics of this application domain.

  • This information comes from two main sources:

– from measurements and – from expert estimates.

slide-10
SLIDE 10

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 10 of 71 Go Back Full Screen Close Quit

7. Analytical Techniques for Analyzing Probabil- ity Distributions

  • It is important to take into account that many real-

world processes are probabilistic.

  • In many cases, the corresponding probability distribu-

tions are Gaussian (normal).

  • This makes perfect sense, since such processes are af-

fected by many independent factors.

  • It is known that in such cases, the distributions should

be close to normal.

  • However, there are cases when the corresponding dis-

tribution is different – e.g., uniform.

  • On a practical example, we explain why such distribu-

tions appear.

slide-11
SLIDE 11

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 11 of 71 Go Back Full Screen Close Quit

8. Analytical Techniques for Analyzing How Sys- tems Change with Time

  • In general, systems change with time, and the corre-

sponding probability distributions change.

  • There are some general rules about such changes, some
  • f them well-explained, some more empirical.
  • It is well-known (and well-explained) that the entropy
  • f a closed system increases with time.
  • This is known as the Second Law of Thermodynamics.
  • Interestingly, there is another empirical observation –

which is not as well justified – that – while the entropy increases, – its rate of increase is often the smallest possible.

  • This is known as the minimum entropy production

principle.

slide-12
SLIDE 12

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 12 of 71 Go Back Full Screen Close Quit

Figure 2: Blame Entropy. Part 1

slide-13
SLIDE 13

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 13 of 71 Go Back Full Screen Close Quit

slide-14
SLIDE 14

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 14 of 71 Go Back Full Screen Close Quit

9. Minimum Entropy Production Principle

  • This principle was first formulated and explained by a

future Nobelist Ilya Prigogine.

  • Since then, many possible explanations of this principle

appeared.

  • However, all these explanations are very technical, based
  • n complex analysis of differential equations.
  • Since this phenomenon is ubiquitous, it is desirable to

look for a general system-based explanation.

  • We provide an explanation, based on the importance

to keep as many solution options open as possible.

  • In decision making, one of the main errors is to focus

too quickly and to become blind to alternatives.

slide-15
SLIDE 15

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 15 of 71 Go Back Full Screen Close Quit

10. Analytical Techniques for Gauging Accuracy

  • f Expert Knowledge
  • Dealing with expert estimates bring additional chal-

lenges; for example: – while measurement results come with guaranteed bounds on the measurement inaccuracy, – the only estimates of the inaccuracy of expert esti- mates come from the experts themselves.

  • It turns out that experts often misjudge the inaccuracy
  • f their estimates.
  • This phenomenon is known as the Dunning-Kruger ef-

fect, after the two psychologists who discovered it.

  • Which this phenomenon has been confirmed by many

follow-up experiments, it remains largely unexplained.

  • We present an analytical model that provides a simple

system-based explanation for this effect.

slide-16
SLIDE 16

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 16 of 71 Go Back Full Screen Close Quit

11. Analytical Techniques Help Enhance the Re- sults of Data Mining

  • First, we collect information about the system from

measurements and from expert estimates.

  • Then, we use this information to come up with a model

describing the system.

  • The usual way to come up with such a model is:

– to formulate several different hypotheses and – to select the one that best fits the data.

  • Techniques for formulating hypotheses based on the

available information are known as data mining.

  • Often, the amount of data is not sufficient to make

statistically justified conclusions.

  • In such cases, the dependencies produced by data min-

ing are often caused by accidental coincidences.

slide-17
SLIDE 17

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 17 of 71 Go Back Full Screen Close Quit

12. Data Mining (cont-d)

  • We need to separate such accidental coincidences from

true dependencies.

  • It is thus important to look for possible theoretical ex-

planation for these empirical dependencies.

  • If such an explanation is possible and natural, this

means that this dependence is probably real.

  • If no such natural explanation is possible, this is prob-

ably an accidental coincidence.

  • In this dissertation, we illustrate this general approach
  • n four examples, all four biology-related.
  • In the first three examples, we have found a natural

explanation for the observed phenomenon.

  • Thus, we confirmed the conclusions of data mining.
slide-18
SLIDE 18

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 18 of 71 Go Back Full Screen Close Quit

13. Data Mining (cont-d)

  • The first example is a surprising observation that was

made from the analysis of records of cow insemination.

  • The second example is an empirical fact that pink noise

enhances sleep and memory in humans.

  • The third example is that filtering out higher frequen-

cies makes it easier for a human to carry a tune.

  • The fourth example is related to an observed decline

in IQ scores.

  • In this example, a natural explanation invalidates the

conclusion of data mining.

slide-19
SLIDE 19

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 19 of 71 Go Back Full Screen Close Quit

Figure 4: Happy Cow

slide-20
SLIDE 20

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 20 of 71 Go Back Full Screen Close Quit

slide-21
SLIDE 21

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 21 of 71 Go Back Full Screen Close Quit

Figure 6: Happy Singer

slide-22
SLIDE 22

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 22 of 71 Go Back Full Screen Close Quit

Figure 7: Decline in IQ Scores?

slide-23
SLIDE 23

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 23 of 71 Go Back Full Screen Close Quit

14. Analytical Techniques in Hypothesis Testing

  • Once we have come with several reasonable models, we

need to select the one that best fits the data.

  • There are many statistical techniques for selecting the

model: – most of them well-justified but – some more heuristic – and thus, less reliable.

  • One of such techniques is a widely used area-under-

the-curve method.

  • We use analytical techniques to provide a theoretical

explanation for this method.

  • Thus, we make this method more reliable.
slide-24
SLIDE 24

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 24 of 71 Go Back Full Screen Close Quit

15. Hypothesis Testing (cont-d)

  • We also use analytical techniques to explain:

– why, upon getting new data, it is desirable to revisit the selection of the best model, and – why a usual practice of sticking to the original model is faulty.

  • We illustrate the need for a careful comparison between

different hypotheses on a historical example: – epicycles versus – more modern techniques in celestial mechanics.

  • Our conclusion is that, contrary to what one may read

in modern astronomy and physics textbooks: – epicycles were a very efficient tool, foreseeing mod- ern techniques such as Fourier series, – while not exactly as efficient as Fourier series.

slide-25
SLIDE 25

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 25 of 71 Go Back Full Screen Close Quit

16. Analytical Techniques for System Design

  • First, we get an adequate description of the users’ pref-

erences and of the corresponding application domain.

  • Then, we need to come up with a system design which

the most appropriate for this setting.

  • One idea is to use the experience of successful similar

systems – engineering and even biological.

  • For example, many situations in engineering and in life

require constant monitoring.

  • At first glance, this would necessitate the need for the

system to maintain the same alert level.

  • However, interestingly, recent experiments have shown

that the attention level constantly oscillates.

  • We show that such an oscillation is indeed helpful.
slide-26
SLIDE 26

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 26 of 71 Go Back Full Screen Close Quit

17. System Design (cont-d)

  • Thus, it is necessary to emulate such an oscillation

when designing automatic systems, e.g., for driving.

  • In design, we need to take into account many different

aspects of the resulting system.

  • In many practical situations, for each aspect, we have

well-defined optimal design strategies.

  • However, there is no analytical techniques for taking

all the aspects into account.

  • We show how several aspects can be taken into account
  • n the example of a tradeoff between:

– computation needs and – communication needs.

slide-27
SLIDE 27

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 27 of 71 Go Back Full Screen Close Quit

18. Analytical Techniques for Making Recommen- dations More Acceptable to Users

  • People do not necessarily follow the expert advice.
  • We provide that an analytical model that explains the
  • bserved non-compliance.
  • We also use analytical techniques to explain how to

make recommendations more acceptable.

slide-28
SLIDE 28

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 28 of 71 Go Back Full Screen Close Quit

19. Analytical Techniques for Testing

  • On all design stages, we need to test the designed sys-

tem.

  • This testing has to be done on all levels:

– from the original big-picture design draft – to the level of final detailed implementation.

  • On each level, there are numerous known techniques

and methods for testing.

  • The problem is that our resources are limited.
  • So, we need to optimally distribute these testing re-

sources between different levels.

  • We explain how to optimally distribute testing resources

between different system levels.

slide-29
SLIDE 29

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 29 of 71 Go Back Full Screen Close Quit

Part II

First Detailed Example: Optimal Distribution of Testing Resources Between Different System Levels

slide-30
SLIDE 30

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 30 of 71 Go Back Full Screen Close Quit

20. The Cost of Errors on Different Levels

  • Errors can occur on all the levels:

– we can make an error on the highest level, by de- ciding on a faulty overall design; – we can also make an error on the most detailed level, e.g., when manufacturing a component.

  • An error on a higher level is very costly:

– we have to redo the overall design and – thus, redo all the details – i.e., largely, start “from scratch”.

  • On the other hand, errors on the lower levels are not

that costly.

  • If we erred in designing one small component, then all

we need to do is re-design this small component.

slide-31
SLIDE 31

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 31 of 71 Go Back Full Screen Close Quit

21. Errors on Different Levels (cont-d)

  • Let us number the levels from the most general one –

which will be Level 1.

  • Let us denote the overall number of levels by n; then,

the most detailed level is Level n.

  • In general, an error on each level i leads to the need of

redoing several details on the next-detailed level i + 1.

  • Let us denote the average number of details that need

to be redone by q.

  • Then, an error on Level i necessitates redoing q details
  • n the next-detailed Level i + 1.
  • Each of these re-doings requires redoing q details on

the next level i + 2.

  • Thus, an error on Level i requires re-doing q2 details
  • n Level i + 2.
slide-32
SLIDE 32

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 32 of 71 Go Back Full Screen Close Quit

22. Errors on Different Levels (cont-d)

  • Similarly, we conclude that it requires re-doing q3 de-

tails on Level i + 3.

  • In general, we need to re-do qk details on level i + k.
  • In particular, for k = n−i, an error on Level i requires

redoing qn−i details on Level n.

  • Let c denote the average cost of redoing a single detail
  • n the most-detailed Level n.
  • Then, the overall cost of an error on Level i can be
  • btained by multiplying:

– this per-error cost c by – the total number of details qn−i that need to be corrected.

  • The resulting cost is c · qn−i.
slide-33
SLIDE 33

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 33 of 71 Go Back Full Screen Close Quit

23. The Cost of Discovering Errors

  • How does the number N(t) of remaining errors depend
  • n the time t spent to find them?
  • There are different ways to count errors; e.g., when we

talk about software errors: – we can count the number of modules that do not perform as we intended, – we can count the number of lines of code where we made a mistake, – or we can count the number of erroneous operations

  • n each line of code.
  • All three (and other) ways of counting errors make

sense – but they differ by a factor.

slide-34
SLIDE 34

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 34 of 71 Go Back Full Screen Close Quit

24. The Cost of Discovering Errors (cont-d)

  • E.g., to go from the number of erroneous moduli to the

number of erroneous lines of code, we need to multiply: – the number of erroneous moduli by – the average number of erroneous lines of code in an erroneous modulus.

  • Thus, if we change the way we count errors, we go from

the original number N(t) to the new number C · N(t).

  • Here C is the corresponding factor.
  • Both the original function N(t) and the new function

C · N(t) make sense; thus: – instead of a single function N(t) for describing how the number of remaining errors depends on time t, – we should consider the whole family of functions {C · N(t)}C corresponding to all values C > 0.

slide-35
SLIDE 35

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 35 of 71 Go Back Full Screen Close Quit

25. The Cost of Discovering Errors (cont-d)

  • The time t is the time from the moment when we

started testing.

  • This may sound well-defined, but in practice, it changes

from one person to another.

  • Some programmers try to run the very first version of

the program that they wrote.

  • Thus, they start debugging the code right away.
  • Other programmers:

– first try some on-paper tests and – only start running when they are reasonably sure that they eliminate the most obvious bugs.

slide-36
SLIDE 36

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 36 of 71 Go Back Full Screen Close Quit

26. The Cost of Discovering Errors (cont-d)

  • While the results of both programmers may be similar,

the starting times for measuring t are different: – what happened for the first programmer at time t, – for the second programmer, happens at time t − t0.

  • Here t0 is the time the second programmer spent ana-

lyzing his/her code before running it.

  • This value t0 may be different for different program-

mers.

  • It is therefore reasonable to require that:

– if we simply change the way we measure the time, i.e., if we go from t to t − t0, – then the approximating family {C · N(t)}C should not change.

slide-37
SLIDE 37

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 37 of 71 Go Back Full Screen Close Quit

27. The Cost of Discovering Errors (cont-d)

  • In other words, the family {C · N(t − t0)}C should co-

incide with the original family {C · N(t)}C.

  • This means, in particular, that the function N(t − t0)

should belong to the original family.

  • So, we should have N(t − t0) = C(t0) · N(t) for some

value C(t0) depending on t0.

  • The function N(t) describes the number of remaining

errors after time t.

  • It is thus (non-strictly) decreasing: when t < t′, then

we should have N(t) ≥ N(t′).

  • Thus, it is measurable.
  • Therefore, the function C(t0) = N(t − t0)/N(t) is also

measurable, as the ratio of two measurable functions.

slide-38
SLIDE 38

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 38 of 71 Go Back Full Screen Close Quit

28. The Cost of Discovering Errors (cont-d)

  • We know that N(t − t0) = C(t0) · N(t) for some value

C(t0) depending on t0.

  • It is known that for measurable functions, the only

solutions to above equation have the form N(t) = N0 · exp(−a · t) for some N0 and a.

  • Now, we are ready to formulate the problem in precise

terms.

slide-39
SLIDE 39

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 39 of 71 Go Back Full Screen Close Quit

29. Formulation of the Problem in Precise Terms

  • We want to divide the overall testing people-time T

into times t1, . . . , tn for testing on different levels: t1 + . . . + tn = T.

  • According to the above formulas, for each level i, after

the testing, we will have N0 · exp(−a · ti) errors.

  • The cost of each error on this level is c · qn−i.
  • So the overall cost of all these errors is

c · qn−i · N0 · exp(−a · ti).

  • We want to minimize the overall cost E:

E =

n

  • i=1

c · qn−i · N0 · exp(−a · ti).

slide-40
SLIDE 40

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 40 of 71 Go Back Full Screen Close Quit

30. Solving the Resulting Optimization Problem

  • A usual way to solve a constraint optimization problem

is to use Lagrange multipliers: – the problem of minimizing a function f(x) under a constraint g(x) = 0 – is reduced to the unconstrained problem of mini- mizing an expression f(x) + λ · g(x).

  • Here the parameter λ (known as Lagrange multiplier)

has to be determined from the condition g(x) = 0.

  • In our case, the constraint is

n

  • i=1

ti − T = 0.

  • So, the corresponding unconstrained optimization prob-

lem means minimizing the expression

n

  • i=1

c · qn−i · N0 · exp(−a · ti) + λ · n

  • i=1

ti − T

  • .
slide-41
SLIDE 41

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 41 of 71 Go Back Full Screen Close Quit

31. Solving the Problem (cont-d)

  • To find the minimum of this expression, we:

– differentiate it with respect to each unknown ti and – equate the resulting (partial) derivative to 0.

  • As a result, we get the following formula:

c · qn−i · N0 · (−a) · exp(−a · ti) + λ = 0, i.e., exp(−a · ti) = λ a · c · N0 · qn−i.

  • Taking logarithms of both sides and dividing the result

by −a, we get ti = (n−i)·| ln(q)| a +c1, where c1

def

= −1 a·ln

  • λ

a · c · N0

  • .
  • Combining terms not depending on i into a single ex-

pression, we get ti = c2 − i · | ln(q)| a , where c2

def

= c1 + n · | ln(q)| a .

slide-42
SLIDE 42

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 42 of 71 Go Back Full Screen Close Quit

32. Solving the Problem (cont-d)

  • In line with the main idea of the Lagrange multiplier

technique, we substitute ti into the constraint: T =

n

  • i=1

ti = n · c2 − n

  • i=1

i

  • · | ln(q)|

a .

  • Here,

n

  • i=1

i = 1 + 2 + . . . + n = n · (n + 1) 2 , thus T = n · c2 − n · (n + 1) 2 · | ln(q)| a .

  • So, c2 = T

n + n + 1 2 · | ln(q)| a .

  • Thus, we arrive at the following formula.
slide-43
SLIDE 43

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 43 of 71 Go Back Full Screen Close Quit

33. Resulting Solution

  • We consider situations where:

– an error on the next level costs q times less than the error on the previous level, and – the number of undetected errors decreases with de- tection time as exp(−a · t).

  • In this case, the optimal allocation of the overall testing

time T into times t1, . . . , tn allocated to each level is: ti = T n + n + 1 2 · | ln(q)| a

  • − i · | ln(q)|

a .

  • In other words, the time linearly decreases from ab-

stract level to more detailed levels.

  • The fact that we allocate most of the testing time to

the highest level makes perfect sense.

  • Indeed, as we have mentioned, errors on this level are

the costliest ones.

slide-44
SLIDE 44

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 44 of 71 Go Back Full Screen Close Quit

Part III

Second Detailed Example: Why Filtering Out Higher Harmonics Makes It Easier to Carry a Tune

slide-45
SLIDE 45

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 45 of 71 Go Back Full Screen Close Quit

34. Formulation of the Problem

  • According to the patent description, a patent by Freuden-

thal et al.: – “greatly improves the singing abilities of both novice and experienced singers – by amplifying the fundamental frequency of one’s voice to correct tone deafness.”

  • Amplifying the fundamental frequency is equivalent to

filtering out the higher harmonics.

  • The device has been successfully tested, it clearly works,

but why?

  • Our answer will be based on the general engineering

signal processing ideas.

  • We also show that such filtering is the optimal way to

make it easier for a person to carry a tune.

slide-46
SLIDE 46

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 46 of 71 Go Back Full Screen Close Quit

Figure 8: Happy Singer

slide-47
SLIDE 47

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 47 of 71 Go Back Full Screen Close Quit

35. Higher Harmonics: A Brief Reminder

  • Each note corresponds to a certain fundamental fre-

quency f0.

  • The resulting signal is periodic with the same frequency.
  • Thus, if we perform the Fourier transform, we only get

components corresponding to multiples of f0.

  • Components corresponding to frequencies 2f0, 3f0, etc.,

are known as higher harmonics.

  • The fact that the frequency f0 is fundamental means

that it has the largest energy: S(2f0) < S(f0); S(3f0) < S(f0), . . .

slide-48
SLIDE 48

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 48 of 71 Go Back Full Screen Close Quit

36. Why Is It Not Always Easy to Carry a Tune: Signal-Processing Analysis

  • In general, in signal processing, the quality of signal

detection depends on the signal-to-noise ratio; thus: – if a singing person does not understand that he/she is singing out of tune, – this means that for the sound produced by this singing person, the signal-to-noise ratio is too low to detect this.

  • The overall energy S of the signal is the sum of energies
  • f fundamental frequencies:

S = S(f0) + S(2f0) + S(3f0) + . . .

  • Similarly, the overall energy N of the noise is:

N = N(f0) + N(2f0) + N(3f0) + . . .

slide-49
SLIDE 49

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 49 of 71 Go Back Full Screen Close Quit

37. Signal-Processing Analysis (cont-d)

  • The energy of the noise is usually changing very little

from one frequency to another.

  • In the first approximation, we can therefore assume

that this energy is the same: N(f0) = N(2f0) = N(3f0) = . . .

  • Thus, N = k · N(f0), where k is the overall number of

harmonics.

  • The corresponding signal-to-noise ratio of the original

singing signal is thus equal to S N = S(f0) + S(2f0) + S(3f0) + . . . + S(k · f0) k · N(f0) .

  • The fact that a person has difficulty correctly carrying

a tune means that this signal-to-noise ratio is too small.

  • We need to increase it.
slide-50
SLIDE 50

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 50 of 71 Go Back Full Screen Close Quit

38. Let Us Apply Filtering

  • In signal processing, a usual way to increase the signal-

to-noise ratio is to perform some filtering.

  • Filtering means that we either amplify or decrease cer-

tain frequencies.

  • This amplification or damping is applied to the com-

bination of signal and noise, so it equally affects both.

  • The energy of the signal changes from S(f) to c(f) ·

S(f), and the energy of the noise changes from N(f) to c(f) · N(f).

  • Thus, the new signal-to-noise ratio is equal to

c(f0) · S(f0) + c(2f0) · S(2f0) + . . . + c(k · f0) · S(k · f0) (c(f0) + c(2f0) + . . . + c(k · f0)) · N(f0) .

slide-51
SLIDE 51

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 51 of 71 Go Back Full Screen Close Quit

39. Which Filter Is Optimal: Formulation of the Problem

  • We want to find the coefficients c(f0), c(2f0), . . . , for

which the signal-to-noise ratio is the largest possible.

  • This will lead to the best possible chance of a person

recognizing inaccuracies in his/her own signing.

slide-52
SLIDE 52

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 52 of 71 Go Back Full Screen Close Quit

40. Solving the Problem

  • Let us prove that the optimal filter is exactly the one

that filters out all higher harmonics: c(f0) > 0 and c(2f0) = . . . = c(k · f0) = 0.

  • For this filter, the signal-to-noise ratio is equal to

S′ N ′ = c(f0) · S(f0) c(f0) · N(f0) = S(f0) N(f0).

  • What if one of the higher harmonics is not completely

filtered out, c(i · f0) > 0?

  • In this case, for all such i, S(i · f0) < S(f0) hence

c(i · f0) · S(i · f0) < c(i · f0) · S(f0).

slide-53
SLIDE 53

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 53 of 71 Go Back Full Screen Close Quit

41. Solving the Problem (cont-d)

  • By adding up these inequalities, we conclude that

S′ = c(f0)·S(f0)+c(2f0)·S(2f0)+. . .+c(k·f0)·S(k·f0) < c(f0) · S(f0) + c(2f0) · S(f0) + . . . + c(k · f0) · S(f0) = (c(f0) + c(2f0) + . . .c (k · f0)) · S(f0).

  • Dividing both sides of this inequality by the noise N ′,

we conclude that S′ N ′ < (c(f0) + c(2f0) + . . . + c(k · f0)) · S(f0) (c(f0) + c(2f0) + . . . + c(k · f0)) · N(f0), so S′ N ′ < S(f0) N(f0).

  • Thus, the signal-to-noise ratio is smaller than when all

these harmonics are filtered out.

slide-54
SLIDE 54

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 54 of 71 Go Back Full Screen Close Quit

42. Conclusion of This Part

  • We showed that filtering out higher harmonics increases

signal-to-noise ratio.

  • This explains why this filtering enhances detecting out-
  • f-tune singing.
  • We also showed that filtering out higher harmonics is

indeed the optimal approach – in the sense that: – it leads to the largest possible increase in the signal- to-noise ratio, and – thus, to the best chance of detecting out-of-tune deviations).

slide-55
SLIDE 55

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 55 of 71 Go Back Full Screen Close Quit

Part IV

Third Detailed Example: Why Pink Noise Is Best for Enhancing Sleep and Memory

slide-56
SLIDE 56

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 56 of 71 Go Back Full Screen Close Quit

43. Formulation of the Problem

  • Several researchers found out that acoustic stimulation

during sleep enhances sleep and memory.

  • The very fact that exercising some organ is good should

be expected.

  • It helps to exercise muscles, it helps to exercise brain

activities, it helps to exercise ears, etc.

  • The best results were obtained with pink noise, with

power spectral density S(f) = c f .

  • But why?
slide-57
SLIDE 57

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 57 of 71 Go Back Full Screen Close Quit

slide-58
SLIDE 58

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 58 of 71 Go Back Full Screen Close Quit

44. Our Explanation: Main Idea

  • For exercise in general, the best results are usually

achieved when different organs are all involved.

  • This means that the biological acoustic sensors corre-

sponding to all the frequencies should be involved.

  • If on one of the frequencies, we have a very weak signal,

this sensor is practically not exercised.

  • So, it is reasonable to require that each of these sensors

is exercised similarly.

  • Thus, the energy of the part of the signal affecting each

sensor should be the same for all sensors.

  • Let us describe this idea in precise terms.
slide-59
SLIDE 59

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 59 of 71 Go Back Full Screen Close Quit

45. Our Explanation: Details

  • In general, our perception – be it visual or acoustic or

any other – follows the Weber’s law:

  • For each perceived quantity x, the just noticeable dif-

ference ∆x is ∆x = δ · x for some δ > 0.

  • In particular, the just noticeable difference in frequency

∆f(f) should be proportional to the frequency itself: ∆f(f) = δ · f.

  • So, a biological acoustic sensor takes in all the frequen-

cies from f to f + ∆f(f) = f + δ · f.

  • By definition, S(f) is energy per unit frequency.
  • Thus, the overall energy E(f) affecting this sensor is

S(f) times the width δ · f of the frequency interval: E(f) = S(f) · ∆f(f) = S(f) · δ · f.

slide-60
SLIDE 60

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 60 of 71 Go Back Full Screen Close Quit

46. Our Explanation: Details (cont-d)

  • The best effect is expected when each sensor gets the

exact same amount of energy, i.e., when E(f) = const.

  • For the above expression E(f) = S(f) · δ · f, the re-

quirement E(f) = const means that S(f) = const δ · f .

  • This is exactly the pink noise.
  • Thus, we have explained why the pink noise leads to

the most efficient stimulation of sleep and memory.

slide-61
SLIDE 61

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 61 of 71 Go Back Full Screen Close Quit

Part V

Fourth Detailed Example: Case Study of Cow Insemination

slide-62
SLIDE 62

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 62 of 71 Go Back Full Screen Close Quit

47. Unexpected Empirical Fact

  • Computer-based systems are ubiquitous; e.g., farmers

use sensors to identify cows in heat.

  • They then apply artificial insemination to these par-

ticular cows.

  • This guarantees that all the inseminated cows will be-

come pregnant.

  • When analyzing the results of applying this technique,

researchers found out an unexpected phenomenon.

  • The gender of the resulting calves depends on the in-

semination time within the 16-hour heat window.

  • For inseminations at the beginning of the window, most

resulting calves are female.

  • For inseminations during the later part of the window,

most resulting calves are male. Why?

slide-63
SLIDE 63

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 63 of 71 Go Back Full Screen Close Quit

slide-64
SLIDE 64

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 64 of 71 Go Back Full Screen Close Quit

48. What Is a Perfect Bull/Cow Mix

  • Until recently, cows were inseminated by bulls.
  • From this viewpoint:

– to understand why cows sometimes give birth to male calves and sometimes to female calves, – we need to understand what proportion of cows and bulls would be biologically perfect.

  • Each species aims to reproduce as much as possible –

as much as the food and other resources allow.

  • From this viewpoint:

– if there are too few bulls in the herd, – many cows will not be inseminated and – thus, the herd will not achieve its reproductive po- tential.

slide-65
SLIDE 65

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 65 of 71 Go Back Full Screen Close Quit

49. What Is a Perfect Bull/Cow Mix (cont-d)

  • On the other hand:

– if there are too many bulls in the herd, – much more than needed to inseminate all the cows, – the herd reproductive potential will also be wasted.

  • Indeed, the same herd would reproduce more if instead
  • f the extra useless bulls, we would have cows.
  • Thus, the ideal situation is when there are exactly as

many bulls as needed to inseminate all the cows.

slide-66
SLIDE 66

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 66 of 71 Go Back Full Screen Close Quit

50. What If the Mix Is Imperfect: How to Balance the Situation

  • In real life, the cow-bull proportion may be not ideal.
  • If there are too many bulls, it is desirable to make sure

that the majority of newborn calves are female.

  • If there are too few bulls, it is desirable to make sure

that the majority of newborn calves are male.

slide-67
SLIDE 67

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 67 of 71 Go Back Full Screen Close Quit

51. How Can an Individual Cow Know That the Balance Is Imperfect

  • The gender of a calf is determined by the biological

processes in the cow’s body.

  • How does the cow’s body know when there are too

many bulls or too few bulls?

  • At first glance, it may seem that the cow does not have

this information.

  • However, a detailed analysis of the situation shows that

a cow can get this information.

  • Indeed, as we have mentioned, cows have a sixteen-

hour period during which they can be inseminated.

  • If all the cows are inseminated in a shorter period of

time, this means that we have an excess of bulls.

slide-68
SLIDE 68

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 68 of 71 Go Back Full Screen Close Quit

52. Detecting Gender Imbalance (cont-d)

  • With fewer bulls, we would still be able to inseminate

all the cows by using the remaining unused time; e.g.: – if all the cows are inseminated during the first eight hours of their insemination period, – this means that we could use half as many bulls.

  • In the ideal cow-bull mix, all sixteen hours of the cow’s

insemination period can be used.

  • Thus, the average time ∆t before insemination will be

8 hours.

  • If there are too many bulls, this means that, in general,

all the cows in heat will be inseminated earlier.

  • Thus, ∆t will be, in general, smaller.
  • On the other hand, if there are too few bulls, this means

that many cows will be not inseminated at all.

slide-69
SLIDE 69

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 69 of 71 Go Back Full Screen Close Quit

53. Detecting Gender Imbalance (cont-d)

  • And those “lucky” ones will be inseminated closer to

the end of their insemination window.

  • In this case, the average value of the time ∆t will be

larger.

  • So, if there are too many bulls in the herd, most cows

will be inseminated earlier.

  • If there are too few bulls in the herd, most cows will

be inseminated later.

  • Now, we are ready to explain the above phenomenon.
slide-70
SLIDE 70

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 70 of 71 Go Back Full Screen Close Quit

54. Our Explanation

  • Let us consider a cow inseminated during the earlier

part of its insemination window.

  • To the cow’s organism, this is an indication that there

may be too many bulls in the heard.

  • Thus, a natural biological reaction is to decrease this

dis-balance by producing mostly female calves.

  • Let us consider a cow inseminated during the later part
  • f its insemination window.
  • To the cow’s organism, this is an indication that there

may be too few bulls in the heard.

  • Thus, a natural biological reaction is to decrease this

dis-balance by producing mostly male calves.

  • This is exactly the phenomenon that has been ob-

served.

slide-71
SLIDE 71

Main Objectives of . . . Need for Analytical . . . Describing the User’s . . . Current State and . . . Analyzing Probability . . . Gauging Accuracy of . . . Data Mining Hypothesis Testing Testing Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 71 of 71 Go Back Full Screen Close Quit

55. Thanks

  • I would like to express my profound gratitude:

– to my parents Griselda and Felipe H. Acosta, – to my siblings Rolando, Felipe, and Adriana, – to my Committee Members, Drs. Eric D. Smith, Vladik Kreinovich, Deidra Hodges, and Bill Tseng, – to all the faculty, staff, and students: ∗ of the Department of Electrical and Computer Engineering and ∗ of the Systems Engineering Program.

  • Their unwavering help, support, and encouragement

helped me a lot.