Robust Data Processing in What If We Only Have . . . the Presence - - PowerPoint PPT Presentation

robust data processing in
SMART_READER_LITE
LIVE PREVIEW

Robust Data Processing in What If We Only Have . . . the Presence - - PowerPoint PPT Presentation

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . Robust Data Processing in What If We Only Have . . . the Presence of Uncertainty What If We Only Have . . . Need for Interval . . . and Outliers: Case of What


slide-1
SLIDE 1

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 1 of 25 Go Back Full Screen Close Quit

Robust Data Processing in the Presence of Uncertainty and Outliers: Case of Localization Problems

Anthony Welte1, Luc Jaulin1, Martine Ceberio2, and Vladik Kreinovich2

1Lab STICC, ´

Ecole Nationale Sup´ erieure de Techniques Avanc´ ees Bretagne (ENSTA Bretagne), 2 rue Fran¸ cois Verny 29806 Brest, France, tony.welte@gmail.com, lucjaulin@gmail.com

2Department of Computer Science, University of Texas at El Paso

El Paso, Texas 79968, USA, mceberio@utep.edu, vladik@utep.edu

slide-2
SLIDE 2

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 2 of 25 Go Back Full Screen Close Quit

1. Outline

  • To properly process data, we need to take into account:

– the measurement errors and – the fact that some of the observations may be out- liers.

  • This is especially important in radar-based localiza-

tion, where some signals may reflect: – not from the analyzed object, – but from some nearby object.

  • There are known methods for situations when we have

full information about the probabilities.

  • There are methods for dealing with measurement er-

rors when we only have partial info about prob.

  • In this talk, we extend these methods to situations with
  • utliers.
slide-3
SLIDE 3

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 3 of 25 Go Back Full Screen Close Quit

2. Need for Data Processing

  • We are often interested in the quantities p1, . . . , pm

which are difficult to measure directly.

  • We find a measurable quantity y that depends on pi

and settings xj: y = f(p1, . . . , pm, x1, . . . , xn).

  • For example, locating an object (robot, satellite, etc.),

means finding its coordinates p1, . . .

  • We cannot directly measure coordinates, but we can

measure, e.g., a distance y =

  • 3
  • i=1

(pi − xi)2.

  • In general, we measure yk under different settings

(xk1, . . .), and reconstruct pi from the condition yk = f(p1, . . . , pm, xk1, . . . , xkn).

  • This is an important case of data processing.
slide-4
SLIDE 4

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 4 of 25 Go Back Full Screen Close Quit

3. Need to Take into Account Measurement Un- certainty and Outliers

  • Measurement are never absolutely accurate.
  • There is always a non-zero difference between the mea-

surement result yk and the actual (unknown) value: ∆yk

def

= yk − f(p1, . . . , pm, xk1, . . . , xkn) = 0.

  • Sometimes, the measuring instrument malfunctions.
  • Then, we get outliers – values which are very different

from the actual quantity.

  • This is especially important in radar-based localiza-

tion, where some signals may reflect: – not from the analyzed object, – but from some nearby object.

slide-5
SLIDE 5

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 5 of 25 Go Back Full Screen Close Quit

4. Case When We Know the Probability Distri- bution ρ(∆y) of the Measurement Error

  • In this case, for each p, the probability to observe yk is

proportional to ρ(∆yk) = ρ(yk − f(p, xk)).

  • Measurement errors corresponding to different mea-

surements are usually independent.

  • So, the prob. of observing all the observed values

y1, . . . , yK is equal to the product

K

  • k=1

ρ(yk − f(p, xk)).

  • It is reasonable to select the most probable value p, for

which this product is the largest.

  • This idea is known as the Maximum Likelihood Method.
  • For Gaussian distributions, this leads to the usual

Least Squares Method

K

  • k=1

(∆yk)2 → min .

slide-6
SLIDE 6

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 6 of 25 Go Back Full Screen Close Quit

5. What If We Only Have Partial Information About the Probabilities: First Case

  • Sometimes, we know that the probability distribution

belongs has the form ρ(∆y, θ) for some θ = (θ1, . . . , θℓ).

  • In this case, the corresponding “likelihood function” L

takes the form L =

K

  • k=1

ρ(∆yk, θ).

  • We then select a pair (p, θ) for which the probability is

the largest: L =

K

  • k=1

ρ(yk − f(p, xk), θ) → max

p,θ .

slide-7
SLIDE 7

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 7 of 25 Go Back Full Screen Close Quit

6. What If We Only Have Partial Information About the Probabilities: Non-Parametric Case

  • In many practical situations, we do not know the finite-

parametric family containing the actual distribution.

  • Each possible distribution ρ(∆y) can be characterized

by its entropy S = −

  • ρ(∆y) · ln(ρ(∆y)) d∆y.
  • Entropy describes how many binary questions we need

to ask to uniquely determine ∆y.

  • We want to select a distribution that to the largest

extent reflects this uncertainty.

  • In other words, it is reasonable to select a distribution

for which the entropy is the largest possible.

  • For example, among the distributions ρ(∆y) located on

[−∆, ∆], uniform distribution has the largest entropy.

slide-8
SLIDE 8

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 8 of 25 Go Back Full Screen Close Quit

7. Need for Interval Computations

  • For uniform distributions:

– the value ρ(∆yk) = 0 if ∆yk is outside the interval [−∆, ∆] and – it is equal to a constant when ∆yk is inside this interval.

  • Thus, the product L of these probabilities is constant

when |∆yk| ≤ ∆ for all k.

  • So, instead of a single tuple p, we now need to describe

all the tuples p for which |yk − f(p, xk)| ≤ ∆ for all k = 1, . . . , k.

  • This is a particular case of interval computations.
slide-9
SLIDE 9

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 9 of 25 Go Back Full Screen Close Quit

8. What If We Have No Information About the Probabilities of Measurement Errors

  • This situation is similar to the previous one, except

that now, we do not know the bound ∆.

  • A reasonable idea is to select ∆ for which the corre-

sponding likelihood L = 1 (2∆)K is the largest possible.

  • Selecting the largest possible L is equivalent to select-

ing the smallest possible ∆.

  • The only constraint on ∆ is that ∆ ≥ |∆yk| for all k.
  • The smallest ∆ satisfying it is ∆ = max

k

|∆yk|.

  • Thus, minimizing ∆ means selecting p for which

max

k

|∆yk| = max

k

|yk − f(p, xk)| is the smallest.

  • This minimax approach is indeed frequently used in

data processing.

slide-10
SLIDE 10

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 10 of 25 Go Back Full Screen Close Quit

9. How to Take Both Uncertainty and Outliers into Account

  • We considered 4 cases:

– we know the exact distribution; – we know the finite-parametric family of distribu- tions; – we know the upper bound on the (absolute value)

  • f the corresponding difference; and

– we have no information whatsoever, not even the upper bound.

  • In principle, we may have the same four possible types
  • f information about the outlier probabilities ρ0(∆y).
  • At first glance, it may therefore seem that we can have

4 × 4 = 16 possible combinations.

  • In reality, however, not all such combinations are pos-

sible.

slide-11
SLIDE 11

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 11 of 25 Go Back Full Screen Close Quit

10. Which Combinations Are Possible?

  • Indeed, once we gather enough data, we can determine

the corresponding probability distributions. Thus: – that we do not know the probability distribution of the measurement error – means that we have not yet collected a sufficient number of measurement results.

  • The number of outliers is usually much smaller than

the number of actual measurement results. So: – if we cannot determine the probability distribution for the measurement errors, – then we cannot determine the probability distribu- tion for the outliers either.

  • In general, we have less info about outliers than about

the measurement errors.

slide-12
SLIDE 12

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 12 of 25 Go Back Full Screen Close Quit

11. Case When We Know Distributions ρ(∆y) of the Measurement Error and ρ0(∆y) of Outliers

  • If we know the set M ⊆ {1, . . . , K} of indices k of non-
  • utliers, then L =

k∈M

ρ(∆yk)

  • ·
  • k∈M

ρ0(∆yk)

  • .
  • Now, we can use the Maximum Likelihood approach to

determine both the parameter tuple p and the set M.

  • Max L is when we assign k to M if ρ0(∆yk) < ρ(∆yk),

thus L =

K

  • k=1

max(ρ(∆yk)), ρ0(∆yk)) → max

p

.

  • From the computational viewpoint, this is similar to

the usual maximum likelihood, with g(∆y)

def

= max(ρ(∆y), ρ0(∆y)) instead of ρ(∆y).

  • The difference is that
  • g(∆y) dy >
  • ρ(∆y) dy = 1.
slide-13
SLIDE 13

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 13 of 25 Go Back Full Screen Close Quit

12. Full Information about ρ(∆y), Finite- Parametric Family ρ0(∆y, ϕ) for ρ0(∆y)

  • We can determine all the parameters (p and ϕ) from

the requirement that the likelihood is the largest: L =

K

  • k=1

max(ρ(∆yk), ρ0(∆yk, ϕ)) =

K

  • k=1

max(ρ(yk − f(p, xk)), ρ0(yk − f(p, xk), ϕ)) → max

p,ϕ .

slide-14
SLIDE 14

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 14 of 25 Go Back Full Screen Close Quit

13. Full information about ρ(∆y), bound W on the

  • utlier-related differences ∆yk
  • Maximum

entropy approach selects uniform

  • distr. ρ0(∆y) on [−W, W], with ρ0(∆yk) =

1 2W .

  • We determine p that maximizes the likelihood

L =

K

  • k=1

max

  • ρ(∆yk), 1

2W

  • =

K

  • k=1

max

  • ρ(yk − f(p, xk)), 1

2W

  • under the constraint |∆yk| = |yk−f(p, xk)| ≤ W for all k.
slide-15
SLIDE 15

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 15 of 25 Go Back Full Screen Close Quit

14. Full Information about ρ(∆y), No Information About the Outlier-Related Differences ∆yk

  • As before, in this case, we take

W = max

|∆yℓ| = max

|yℓ − f(p, xℓ)|.

  • Thus, we select the parameters p that maximize the

likelihood L =

K

  • k=1

max  ρ(yk − f(p, xk), 1 2 · max

|yℓ − f(p, xℓ)|   .

slide-16
SLIDE 16

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 16 of 25 Go Back Full Screen Close Quit

15. Finite-Parametric Information About ρ(∆y) and About ρ0(∆)

  • We

have families

  • f

distributions ρ(∆y, θ) and ρ0(∆y, ϕ) with unknown parameters θ and ϕ.

  • In such a situation, we find the parameters p, θ, and ϕ

that maximize the likelihood L =

K

  • k=1

max(ρ(∆yk, θ)), ρ0(∆yk, ϕ)) =

K

  • k=1

max(ρ(yk − f(p, xk), θ), ρ0(yk − f(p, xk), ϕ)).

slide-17
SLIDE 17

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 17 of 25 Go Back Full Screen Close Quit

16. Finite-Parametric ρ(∆y), Bound W

  • n the

Outlier-Related Differences ∆yk

  • We have a family of distributions ρ(∆y, θ) with un-

known parameters θ.

  • In such a situation, we find the parameters p and θ

that maximize the likelihood L =

K

  • k=1

max

  • ρ(∆yk, θ), 1

2W

  • =

K

  • k=1

max

  • ρ(yk − f(p, xk), θ), 1

2W

  • under the constraint |∆yk=yk−f(p, xk)| ≤ W for all k.
slide-18
SLIDE 18

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 18 of 25 Go Back Full Screen Close Quit

17. Finite-Parametric ρ(∆y), No Information About the Outlier-Related Differences ∆yk

  • Like in similar cases, we should select the smallest pos-

sible W: W = max

|∆yℓ|.

  • Thus, we need to select the parameters p and θ that

maximize the likelihood L =

K

  • k=1

max  ρ(yk − f(p, xk), θ), 1 2 · max

|yℓ − f(p, xℓ)|   .

slide-19
SLIDE 19

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 19 of 25 Go Back Full Screen Close Quit

18. Bound ∆ on the Measurement Errors, Bound W on the Outlier-Related Differences ∆yk

  • In this case, by using the maximum entropy approach,

we select the following distributions: – the measurement errors are uniformly distributed

  • n the interval [−∆, ∆], with ρ(∆y) = 1

2∆; – the differences ∆yk are uniformly distributed on the interval [−W, W]: ρ0(∆y) = 1 2W .

  • In this case, we need to select the parameters p that

maximize the likelihood L =

K

  • k=1

g(∆y), where g(∆y) = max(ρ(∆y), ρ0(∆y)).

  • Here, g(∆y) =

1 2∆ when |∆y| ≤ ∆, g(∆y) = 1 2W when ∆ < |∆y| ≤ W, and g(∆y) = 0 else.

slide-20
SLIDE 20

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 20 of 25 Go Back Full Screen Close Quit

19. Bound ∆ on the Measurement Errors, Bound W on the Differences ∆yk (cont-d)

  • Thus, maximizing the product L =

k=1

g(∆yk) means minimizing the number of outliers under the constraint |∆yk| = |yk − f(p, xk)| ≤ W for all k.

  • So, we select p for which:

– under these constraints, – the number of observations with |yk−f(p, xk)| > ∆ is the smallest.

slide-21
SLIDE 21

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 21 of 25 Go Back Full Screen Close Quit

20. Bound ∆ on the Measurement Errors, No In- formation About the Outlier Differences ∆yk

  • In this case, since we take W = max

|yℓ − f(p, xℓ)|, there are no longer any limitations on p.

  • Thus, in this case, the maximum likelihood method

simply means: – electing the values of the parameters p – for which the number of outliers (i.e., values for which |yk − f(p, xk)| > ∆) is the smallest possible.

  • This idea has been effectively used, as a heuristic idea,

to deal with data processing under outliers.

  • Thus, we get a probability-based justification for this

heuristics.

slide-22
SLIDE 22

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 22 of 25 Go Back Full Screen Close Quit

21. Final Case, When We Have No Information About the Probabilities

  • Finally, let us consider the case when we have no in-

formation about the probabilities: – neither about the probabilities of different values of the measurement errors, – nor about the probabilities of different outlier- related differences ∆y = y − f(p, x).

  • In this case, we need to select the corresponding

bounds ∆ and W for which the likelihood is the largest.

  • For each parameter tuple p, the maximum of L is at-

tained when W(p) = max

|∆yℓ|.

  • So, it only remains to select p and ∆.
  • For each p and ∆, let us denote by n(p, ∆) the number
  • f values k for which |yk − f(p, xk)| ≤ ∆.
slide-23
SLIDE 23

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 23 of 25 Go Back Full Screen Close Quit

22. Final Case (cont-d)

  • In terms of this notation, the desired likelihood value

L(p, ∆) =

K

  • k=1

g(yk − f(p, xk)) has the form L(p, ∆) = 1 (2∆)n(p,∆) · 1 (2W(p))K−n(p,∆).

  • Maximizing this expression is equivalent to minimizing

its minus logarithm ψ(p, ∆) = − ln(L(p, ∆)) = K · ln(2W(p)) + n(p, ∆) · (ln(∆) − ln(W(p))).

  • Thus, we then select p for which the following expres-

sion is the smallest possible: ψ(p) = min

∆ (K·ln(2W(p))+n(p, ∆)·(ln(∆)−ln(W(p)))),

where W(p) = max

|yℓ − f(p, xℓ)| and n(p, ∆) = #{k : |yk − f(p, xk)| ≤ ∆}.

slide-24
SLIDE 24

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 24 of 25 Go Back Full Screen Close Quit

23. Final Case: Checking How Well This Method Works

  • We applied this idea to situations when ∆yk are dis-

tributed according to several reasonable distributions:

  • normal,
  • heavy-tailed power law, etc.
  • In all these cases, we get 5-20% values classified as
  • utliers.
  • This is in line with the usual case of normal distribu-

tion, where:

  • 5% of the values lie outside the 2σ interval and
  • are, thus, usually dismissed as outliers.
slide-25
SLIDE 25

Outline Need for Data Processing Need to Take into . . . Case When We Know . . . What If We Only Have . . . What If We Only Have . . . Need for Interval . . . What If We Have No . . . How to Take Both . . . Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 25 of 25 Go Back Full Screen Close Quit

24. Acknowledgments

  • This work was supported in part:

– by the National Science Foundation grants:

  • HRD-0734825 and HRD-1242122

(Cyber-ShARE Center of Excellence) and

  • DUE-0926721, and

– by an award from Prudential Foundation.

  • This research was performed during Anthony Welte’s

visit to the University of Texas at El Paso.

  • The authors are also thankful:

– to all the participants of the Summer Workshop on Interval Methods SWIM’2016 (Lyon, France) – for valuable discussions.