density based
play

Density-Based Alternative Explanation Fuzzy Clustering as a - PowerPoint PPT Presentation

Clustering is How We . . . Resulting Clustering . . . Discussion Why Gaussian Kernel: . . . Density-Based Alternative Explanation Fuzzy Clustering as a Explaining the . . . What If Not All . . . First Step to Learning Rules: Towards Fuzzy


  1. Clustering is How We . . . Resulting Clustering . . . Discussion Why Gaussian Kernel: . . . Density-Based Alternative Explanation Fuzzy Clustering as a Explaining the . . . What If Not All . . . First Step to Learning Rules: Towards Fuzzy Clusters Towards Hierarchical . . . Challenges and Solutions Home Page Title Page ozde Ulutagay 1 and Vladik Kreinovich 2 G¨ 1 Department of Industrial Engineering, Izmir University ◭◭ ◮◮ Izmir, Turkey, gozde.ulutagay@gmail.com ◭ ◮ 2 University of Texas at El Paso El Paso, Texas 79968, USA, vladik@utep.edu Page 1 of 16 Go Back Full Screen Close Quit

  2. Clustering is How We . . . Resulting Clustering . . . 1. Clustering is How We Humans Make Decisions Discussion • Most algorithms for control and decision making take, Why Gaussian Kernel: . . . as input, the values of the input parameters. Alternative Explanation Explaining the . . . • In contrast, we normally only use a category to which What If Not All . . . this value belongs; e.g., when we select a place to eat: Towards Fuzzy Clusters – instead of exact prices, we consider whether the Towards Hierarchical . . . restaurant is cheap, medium, or expensive; Home Page – instead of details of food, we check whether it is Title Page Mexican, Chinese, etc. ◭◭ ◮◮ • When we select a hotel, we take into account how many ◭ ◮ stars it has, is it walking distance from the conf. site. Page 2 of 16 • First, we cluster possible situations, i.e., divide them into a few groups. Go Back • Then, we make a decision based on the group to which Full Screen the current situation belongs. Close Quit

  3. Clustering is How We . . . Resulting Clustering . . . 2. Clustering is a Natural First Step to Learning Discussion the Rules Why Gaussian Kernel: . . . • Computers process data much faster than we humans. Alternative Explanation Explaining the . . . • However, e.g., in face recognition, we are much better What If Not All . . . than the best of the known computer programs. Towards Fuzzy Clusters • It is thus reasonable to emulate the way we humans Towards Hierarchical . . . make the corresponding decisions; e.g.: Home Page – to first cluster possible situations, Title Page – and then make a decision based on the cluster con- ◭◭ ◮◮ taining the current situation. ◭ ◮ Page 3 of 16 Go Back Full Screen Close Quit

  4. Clustering is How We . . . Resulting Clustering . . . 3. Clustering: Ideal Case Discussion • Each known situation is described by the values x = Why Gaussian Kernel: . . . ( x 1 , . . . , x n ) of n known quantities. Alternative Explanation Explaining the . . . • When we have many situations, we can talk about the What If Not All . . . density d ( x ): # situations per unit volume. Towards Fuzzy Clusters • Clusters are separated by voids: there are cats, there Towards Hierarchical . . . are dogs, but there is no continuous transition. Home Page • Within each cluster, we have d ( x ) > 0. Title Page • Outside clusters, we have d ( x ) = 0. So: ◭◭ ◮◮ – once we know the density d ( x ) at each point x , ◭ ◮ – we can find each cluster as the connected compo- Page 4 of 16 nent of the set { x : d ( x ) > 0 } . Go Back Full Screen Close Quit

  5. Clustering is How We . . . Resulting Clustering . . . 4. Clustering: A More Realistic Case Discussion • We often have objects in between clusters. Why Gaussian Kernel: . . . Alternative Explanation • For example, coughing and sneezing patients can be Explaining the . . . classified into cold, allergy, flu, etc. What If Not All . . . • However, there are also rare diseases. Towards Fuzzy Clusters • Let t be the density of such rare cases. Towards Hierarchical . . . Home Page – If d ( x ) < t , then most probably x is not in any Title Page major cluster. ◭◭ ◮◮ – If d ( x ) > t , then some examples come from one of the clusters that we are trying to form. ◭ ◮ • Resulting clustering algorithm: Page 5 of 16 – we select a threshold t , and Go Back – we find each cluster as a connected component of Full Screen the set { x : d ( x ) ≥ t } . Close Quit

  6. Clustering is How We . . . Resulting Clustering . . . 5. How to Estimate the Density d ( x ) Discussion • In practice, we only have finitely many examples Why Gaussian Kernel: . . . x (1) , . . . , x ( N ) . Alternative Explanation • The measured values x ( j ) are, in general, different from Explaining the . . . What If Not All . . . the actual (unknown) values x . Towards Fuzzy Clusters • Let ρ (∆ x ) be the probability density of meas. errors. Towards Hierarchical . . . • Then, for each j , the probability density of actual val- Home Page ues is ρ ( x ( j ) − x ). Title Page • Observations are equally probable, so ◭◭ ◮◮ d ( x ) = p ( x (1) ) · ρ ( x (1) − x )+ . . . + p ( x ( N ) ) · ρ ( x ( N ) − x ) = ◭ ◮ N 1 ρ ( x ( j ) − x ) . Page 6 of 16 � N · Go Back j =1 • This formula is known as the Parzen window . Full Screen • The corresponding function ρ ( x ) is known as a kernel . Close Quit

  7. Clustering is How We . . . Resulting Clustering . . . 6. Resulting Clustering Algorithm Discussion • At first, we select a function ρ ( x ). Why Gaussian Kernel: . . . Alternative Explanation • Then, based on the observed examples x (1) , x (2) , . . . , Explaining the . . . x ( N ) , we form a density function What If Not All . . . N d ( x ) = 1 ρ ( x ( j ) − x ) . Towards Fuzzy Clusters � N · Towards Hierarchical . . . j =1 Home Page • After that, we select a threshold t . Title Page • We find the clusters as the connected components of ◭◭ ◮◮ the set { x : d ( x ) ≥ t } . ◭ ◮ • For imprecise (“fuzzy”) expert estimates, instead of Page 7 of 16 probabilities, we have membership functions. Go Back • So, we get similar formulas. Full Screen Close Quit

  8. Clustering is How We . . . Resulting Clustering . . . 7. Discussion Discussion • Empirical results: Why Gaussian Kernel: . . . Alternative Explanation – The best kernel is the Gaussian function Explaining the . . . ρ ( x ) ∼ exp( − const · x 2 ). What If Not All . . . – The best threshold t is the one for which clustering Towards Fuzzy Clusters is the most robust to selecting t . Towards Hierarchical . . . • Our 1st challenge is to provide a theoretical explana- Home Page tion for these empirical results. Title Page • 2nd challenge: take into account that some observa- ◭◭ ◮◮ tions may be erroneous. ◭ ◮ • 3rd challenge: clustering algorithms should return de- Page 8 of 16 grees of belonging to different clusters. Go Back • 4th challenge: hierarchy – animals should be first clas- Full Screen sified into dangerous and harmless, then further. Close Quit

  9. Clustering is How We . . . Resulting Clustering . . . 8. Why Gaussian Kernel: A Solution to the 1st Discussion Part of the 1st Challenge Why Gaussian Kernel: . . . • The Gaussian distribution of the measurement error is Alternative Explanation indeed frequently occurring in practice. Explaining the . . . What If Not All . . . • This empirical fact has a known explanation: Towards Fuzzy Clusters – a measurement error usually consists of a large num- Towards Hierarchical . . . ber of small independent components, and, Home Page – according to the Central Limit theorem: Title Page ∗ the distribution of the sum of a large number of ◭◭ ◮◮ small independent components ◭ ◮ ∗ is close to Gaussian. Page 9 of 16 • Expert inaccuracy is also caused by a large number of relatively small independent factors. Go Back Full Screen Close Quit

  10. Clustering is How We . . . Resulting Clustering . . . 9. Alternative Explanation Discussion • We start with the discrete empirical distribution d N ( x ) Why Gaussian Kernel: . . . in which we get N values x ( j ) with equal probability. Alternative Explanation Explaining the . . . • We “smoothen” d N ( x ) by convolving it with the kernel What If Not All . . . � function ρ ( x ): d ( x ) = d N ( y ) · ρ ( x − y ) dy. Towards Fuzzy Clusters • This works if we properly select the half-width σ of the Towards Hierarchical . . . kernel: Home Page – if we select a very narrow half-width, then each Title Page original point x ( j ) becomes its own cluster; ◭◭ ◮◮ – if we select a very wide half-width, then we end up ◭ ◮ with a single cluster. Page 10 of 16 • The choice of this half-width is usually performed em- pirically: Go Back – we start with a small value of half-width, and Full Screen – we gradually increase it. Close Quit

  11. Clustering is How We . . . Resulting Clustering . . . 10. Alternative Explanation (cont-d) Discussion • Since the kernel functions are close to each other, the Why Gaussian Kernel: . . . resulting convolutions are also close. Alternative Explanation Explaining the . . . • So, it is computationally efficient to apply a small mod- What If Not All . . . ifying convolution to the previous convolution result. Towards Fuzzy Clusters • The resulting convolution is the result of applying a Towards Hierarchical . . . large number of minor convolutions. Home Page • From the mathematical viewpoint, a convolution means Title Page adding an independent random variable. ◭◭ ◮◮ • Applying a large number of convolutions is equivalent ◭ ◮ to adding many small random variables. Page 11 of 16 • Thus, it is equivalent to adding Gaussian variable – Go Back i.e., to Gaussian convolution. Full Screen Close Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend