selecting
play

Selecting Statistics the Most Representative How to describe - PowerPoint PPT Presentation

Introduction to the . . . Population: exact . . . Statistical characteristics Sample Selecting Statistics the Most Representative How to describe closeness Formulation of the . . . Sample Main results Auxiliary result is NP-Hard: Proof:


  1. Introduction to the . . . Population: exact . . . Statistical characteristics Sample Selecting Statistics the Most Representative How to describe closeness Formulation of the . . . Sample Main results Auxiliary result is NP-Hard: Proof: main idea Need for Expert (Fuzzy) Proof (cont-d) Title Page Knowledge ◭◭ ◮◮ ◭ ◮ J. Esteban Gamez 1 , Fran¸ cois Modave 1 , and Olga Kosheleva 2 Page 1 of 13 Departments of 1 Computer Science and 2 Teacher Education Go Back University of Texas, El Paso, TX 79968, USA contact email olgak@utep.edu Full Screen Close Quit

  2. Introduction to the . . . Population: exact . . . 1. Outline Statistical characteristics • One of the main applications of fuzzy is to formalize Sample the notions of “typical”, “representative”, etc. Statistics How to describe closeness • The main idea behind fuzzy: formalize expert knowl- Formulation of the . . . edge expressed by words from natural language. Main results • In this talk, we show that Auxiliary result – if we do not use this knowledge, i.e., if we only use Proof: main idea the data, Proof (cont-d) – then selecting the most representative sample be- Title Page comes computationally difficult (NP-hard). ◭◭ ◮◮ • Thus, the need to find such samples in reasonable time ◭ ◮ justifies the use of fuzzy techniques. Page 2 of 13 Go Back Full Screen Close Quit

  3. Introduction to the . . . Population: exact . . . 2. Introduction to the problem Statistical characteristics • In practice: the population is often large, so we analyze Sample a sample. Statistics • Examples: poll, educational survey. How to describe closeness Formulation of the . . . • Idea: the more “representative” the sample, the larger Main results our confidence in the statistical results. Auxiliary result • Requirement: a representative sample should have the Proof: main idea same averages as the population. Proof (cont-d) • Example: the same average age, average income, etc. Title Page • Additional requirement: the sample should exhibit the ◭◭ ◮◮ same variety as the population. ◭ ◮ • Example: the sample should include both poorer and Page 3 of 13 reacher people. Go Back • Formalization: a representative sample should have the Full Screen same variance as the population. Close Quit

  4. Introduction to the . . . Population: exact . . . 3. Population: exact description Statistical characteristics By a population , we mean a tuple Sample Statistics def = � N, k, { x j,i }� , p How to describe closeness where: Formulation of the . . . Main results • N is an integer; this integer will be called the popula- Auxiliary result tion size; Proof: main idea • k is an integer; this integer is called the number of Proof (cont-d) characteristics ; Title Page • x j,i (1 ≤ j ≤ k, 1 ≤ i ≤ N ) are real numbers; ◭◭ ◮◮ • the real number x j,i will be called the value of the j -th ◭ ◮ characteristic for the i -th object. Page 4 of 13 Go Back Full Screen Close Quit

  5. Introduction to the . . . Population: exact . . . 4. Statistical characteristics Statistical characteristics • Let p = � N, k, { x j,i }� be a population, and let j be an Sample integer from 1 to k . Statistics • By the population mean E j of the j -th characteristic, How to describe closeness N Formulation of the . . . we mean the value E j = 1 � N · x j,i . Main results i =1 Auxiliary result • By the population variance V j of the j -th characteristic, Proof: main idea we mean the value Proof (cont-d) N V j = 1 Title Page � ( x j,i − E j ) 2 . N · ◭◭ ◮◮ i =1 • For every integer d ≥ 1, by the central moment M (2 d ) ◭ ◮ of j order 2 d of the j -th characteristic, we mean the value Page 5 of 13 N = 1 Go Back M (2 d ) � ( x j,i − E j ) 2 d . N · j Full Screen i =1 Close Quit

  6. Introduction to the . . . Population: exact . . . 5. Sample Statistical characteristics • Let N be a population size. Sample Statistics • By a sample , we mean a non-empty subset I ⊆ { 1 , 2 , . . . , N } . How to describe closeness • For every sample I , by its size , we mean the number Formulation of the . . . of elements in I . Main results • By the sample mean E j ( I ) of the j -th characteristic, Auxiliary result we mean the value E j ( I ) = 1 � n · x j,i . Proof: main idea i ∈ I Proof (cont-d) • By the sample variance V j ( I ) of the j -th characteristic, Title Page we mean the value V j ( I ) = 1 � ( x j,i − E j ( I )) 2 . ◭◭ ◮◮ n · i ∈ I ◭ ◮ • For every d ≥ 1, by the sample central moment M (2 d ) ( I ) Page 6 of 13 j of order 2 d of the j -th characteristic, we mean the value Go Back ( I ) = 1 M (2 d ) � ( x j,i − E j ( I )) 2 d . n · Full Screen j i ∈ I Close Quit

  7. Introduction to the . . . Population: exact . . . 6. Statistics Statistical characteristics • Let p = � N, k, { x j,i }� be a population, and let I be a Sample sample. Statistics How to describe closeness • By an E -statistics tuple corresponding to p , we mean a tuple t (1) def Formulation of the . . . = ( E 1 , . . . , E k ) . Main results • By an E -statistics tuple corresponding to I , we mean Auxiliary result def a tuple t (1) ( I ) = ( E 1 ( I ) , . . . , E k ( I )) . Proof: main idea • By an ( E, V ) -statistics tuple corresponding to p , we Proof (cont-d) mean a tuple t (2) def Title Page = ( E 1 , . . . , E k , V 1 , . . . , V k ) . ◭◭ ◮◮ • By an ( E, V ) -statistics tuple corresponding to I , we def ◭ ◮ mean a tuple t (2) ( I ) = ( E 1 ( I ) , . . . , E k ( I ) , V 1 ( I ) , . . . , V k ( I )) . Page 7 of 13 • For every integer d ≥ 1, we can similarly define a statis- tics tuple of order 2 d . Go Back Full Screen Close Quit

  8. Introduction to the . . . Population: exact . . . 7. How to describe closeness Statistical characteristics • By a distance function , we mean a mapping ρ that Sample maps tuples t and t ′ into a real value ρ ( t, t ′ ) s.t. Statistics How to describe closeness • ρ ( t, t ) = 0 for all tuples t and Formulation of the . . . • ρ ( t, t ′ ) > 0 for all t � = t ′ . Main results • Example: Euclidean metric between the tuples t = Auxiliary result ( t 1 , t 2 , . . . ) and t ′ = ( t ′ 1 , t ′ 2 , . . . ): Proof: main idea �� Proof (cont-d) ρ ( t, t ′ ) = ( t j − t ′ j ) 2 . Title Page j ◭◭ ◮◮ ◭ ◮ Page 8 of 13 Go Back Full Screen Close Quit

  9. Introduction to the . . . Population: exact . . . 8. Formulation of the problem Statistical characteristics • Let ρ be a distance function. Sample Statistics • E -sample selection problem corresponding to ρ : How to describe closeness – Given: Formulation of the . . . ∗ a population p = � N, k, { x j,i }� , and Main results ∗ an integer n < N . Auxiliary result – Find: a sample I ⊆ { 1 , . . . , N } of size n for which Proof: main idea the distance ρ ( t (1) ( I ) , t (1) ) is the smallest possible. Proof (cont-d) Title Page • ( E, V ) -sample selection problem corresponding to ρ : ◭◭ ◮◮ – Given: ◭ ◮ ∗ a population p = � N, k, { x j,i }� , and Page 9 of 13 ∗ an integer n < N . – Find: a sample I ⊆ { 1 , . . . , N } of size n for which Go Back the distance ρ ( t (2) ( I ) , t (2) ) is the smallest possible. Full Screen Close Quit

  10. Introduction to the . . . Population: exact . . . 9. Main results Statistical characteristics • For every distance function ρ , the corresponding E - Sample sample selection problem is NP-hard. Statistics How to describe closeness • For every distance function ρ , the corresponding ( E, V )- Formulation of the . . . sample selection problem is NP-hard. Main results • For every distance function ρ and for every d ≥ 1, the Auxiliary result (2 d )-th order sample selection problem is NP-hard. Proof: main idea Proof (cont-d) Title Page ◭◭ ◮◮ ◭ ◮ Page 10 of 13 Go Back Full Screen Close Quit

  11. Introduction to the . . . Population: exact . . . 10. Auxiliary result Statistical characteristics • In our proofs: we considered the case when the desired Sample sample contains half of the original population. Statistics How to describe closeness • In practice: samples usually form a much smaller por- Formulation of the . . . tion of the population. Main results • A natural question: Auxiliary result – fix 2 P ≫ 2, and Proof: main idea – look for samples which constitute the (2 P )-th part Proof (cont-d) of the original population. Title Page ◭◭ ◮◮ • Result: the resulting problems of selecting the most representative sample are still NP-hard. ◭ ◮ Page 11 of 13 Go Back Full Screen Close Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend