general purpose database summarization
play

General Purpose Database Summarization A web service architecture - PowerPoint PPT Presentation

Table of Content General Purpose Database Summarization A web service architecture for on-line database summarization R egis Saint-Paul ( speaker ), Guillaume Raschia, Noureddine Mouaddib LINA - PolytechNantes - INRIA ATLAS-GRIM Group


  1. Table of Content General Purpose Database Summarization A web service architecture for on-line database summarization R´ egis Saint-Paul ( speaker ), Guillaume Raschia, Noureddine Mouaddib LINA - Polytech’Nantes - INRIA ATLAS-GRIM Group VLDB Conference — Sept. 1st 2005 ATLAS-GRIM General Purpose Database Summarization VLDB 2005 1 / 28

  2. Table of Content Table of Content 1 Introduction Generalities Related works 2 Summary model Description space Building the summaries 3 System architecture Web service organization Complexity and performances 4 Conclusion ATLAS-GRIM General Purpose Database Summarization VLDB 2005 2 / 28

  3. Introduction Summary model Generalities System architecture Related works Conclusion Table of Content 1 Introduction Generalities Related works 2 Summary model Description space Building the summaries 3 System architecture Web service organization Complexity and performances 4 Conclusion ATLAS-GRIM General Purpose Database Summarization VLDB 2005 3 / 28

  4. Introduction Summary model Generalities System architecture Related works Conclusion Motivations Provide small versions of very large databases Descriptive ability : scientific studies (epidemiology) ; commercial and marketing studies (customer segmentation) ; log analysis (connection/operation profile) ; data obfuscation ; data personalization and filtering . Data size reduction ability : approximate querying (hotel booking), database browsing (image database), storing rough view of the data on devices with low memory capacity (tourism GPS data). ATLAS-GRIM General Purpose Database Summarization VLDB 2005 4 / 28

  5. Introduction Summary model Generalities System architecture Related works Conclusion Motivations Provide small versions of very large databases Descriptive ability : scientific studies (epidemiology) ; commercial and marketing studies (customer segmentation) ; log analysis (connection/operation profile) ; data obfuscation ; data personalization and filtering . Data size reduction ability : approximate querying (hotel booking), database browsing (image database), storing rough view of the data on devices with low memory capacity (tourism GPS data). ATLAS-GRIM General Purpose Database Summarization VLDB 2005 4 / 28

  6. Introduction Summary model Generalities System architecture Related works Conclusion Motivations Provide small versions of very large databases Descriptive ability : scientific studies (epidemiology) ; commercial and marketing studies (customer segmentation) ; log analysis (connection/operation profile) ; data obfuscation ; data personalization and filtering . Data size reduction ability : approximate querying (hotel booking), database browsing (image database), storing rough view of the data on devices with low memory capacity (tourism GPS data). ATLAS-GRIM General Purpose Database Summarization VLDB 2005 4 / 28

  7. Introduction Summary model Generalities System architecture Related works Conclusion What is a summary ? Occupation Income Ph.D. Student 1 000 Lecturer 2 000 Managing Director 8 500 Politician xx xxx Definition A summary is a concise Tab. : Relation R representation of a set of structured data. ⇒ Semantic Compression Occupation Income Research Miserable Executive Enormous Tab. : Summary R ∗ ATLAS-GRIM General Purpose Database Summarization VLDB 2005 5 / 28

  8. Introduction Summary model Generalities System architecture Related works Conclusion Aggregate computation ����������������� �������������� Aggregate computation ���� ���� SDB, OLAP [Codd et al. 93], DataCubes [Gray et al. 93] Datacube summarization QuotientCube [Lakshmanan et al. ���� 2002] ������� Limitations Do not preserve the initial data schema ; Subject oriented, has to be designed ; Fixed and crisp granularity, threshold effect. ATLAS-GRIM General Purpose Database Summarization VLDB 2005 6 / 28

  9. Introduction Summary model Generalities System architecture Related works Conclusion Clustering approaches for semantic compression intuition Describe groups rather than individual observation. Clustering – ItCompress [Jagadish et al. 1999] Bayesian network classifier – Spartan [Babu et al. 2001] Association rules – Fascicule [Jagadish et al. 1999] Limitations Classes shape depends on the selected criteria [Fasulo 1999] ; Single granularity of the compressed relation ; Non-intuitive intentional description of classes. ATLAS-GRIM General Purpose Database Summarization VLDB 2005 7 / 28

  10. Introduction Summary model Generalities System architecture Related works Conclusion Foundations of our approach Intuition Trying to reproduce the human learning mechanisms. Formal concept analysis [Barbut et al. 1970, Wille 1982] Conceptual clustering – [Michalski et Stepp 1983] Unimem [Lebowitz 1986], Cobweb [Fisher 1987], Fuzz [Chen & Lu 1997] Limitations Approaches were validated only on small data samples ; Lack of maintenance capabilities. ATLAS-GRIM General Purpose Database Summarization VLDB 2005 8 / 28

  11. Introduction Summary model Description space System architecture Building the summaries Conclusion Table of Content 1 Introduction Generalities Related works 2 Summary model Description space Building the summaries 3 System architecture Web service organization Complexity and performances 4 Conclusion ATLAS-GRIM General Purpose Database Summarization VLDB 2005 9 / 28

  12. Introduction Summary model Description space System architecture Building the summaries Conclusion Possibilistic Data Representation Theoretical foundation : Fuzzy-set theory (Zadeh, 1965) et Possibility theory (Zadeh 1978, Dubois&Prade 1985) Management of uncertain, incomplete and gradual information : “John’s age should approximately be between 16 and 20 , but that’s not sure .” Possibility distribution 1.0 1.0 0.0 0.0 AGE 16 20 a b c d e f Dom ATLAS-GRIM General Purpose Database Summarization VLDB 2005 10 / 28

  13. Introduction Summary model Description space System architecture Building the summaries Conclusion Background knowledge For each attribute A with domain D A , a set of Linguistic Labels is defined together with their membership function over D A . Example, on attribute income : D income = [0 , 200000] D + = { none , miserable , modest , . . . } income comfortable outrageous none modest miserable reasonable enormous 1 0 D INCOME (K$) 0 20 40 60 80 100 ATLAS-GRIM General Purpose Database Summarization VLDB 2005 11 / 28

  14. Introduction Summary model Description space System architecture Building the summaries Conclusion Summary representation space Original tuple (raw data) t = � t . A 1 , . . . , t . A k � , t ∈ R R ( A 1 , . . . , A k ) = � k { t } D A i =1 D A i       � � � F ( D + R ∗ ( A 1 , . . . , A k ) = � k i =1 F ( D + { z } A ) A i ) Summarized tuple z ∈ R ∗ z = � z . A 1 , . . . , z . A k � , ATLAS-GRIM General Purpose Database Summarization VLDB 2005 12 / 28

  15. Introduction Summary model Description space System architecture Building the summaries Conclusion Summary model A summary is a 3-uple z = ( I z , R z , E z ) with : I z : the intentional content ; R z : the extensional content, subset of the relation R ; E z : a set of edges toward other summaries. Example of a summary Label satisfaction support intention I z 1.83 OCCUPATION employee 0.2 1.25 manager 1.0 0.33 managing director 0.7 0.25 INCOME comfortable 1.0 1.50 high 1.0 0.33 extension R z { t 1 , t 2 , t 5 , t 13 } 4 ATLAS-GRIM General Purpose Database Summarization VLDB 2005 13 / 28

  16. Introduction Summary model Description space System architecture Building the summaries Conclusion Partial order on summaries Subsumption relation : z ⊑ z ′ ⇐ ⇒ R z ⊆ R z ′ Hierarchical organization : root : most general summary ; leaves : most specific summaries. The user-defined Background Knowledge fixes the finest level and, consequently, the maximal hierarchy size. ATLAS-GRIM General Purpose Database Summarization VLDB 2005 14 / 28

  17. Introduction Summary model Description space System architecture Building the summaries Conclusion Algorithm outline hierarchical conceptual classification incremental process top-down approach selective local search Advantages summary freshness through incremental maintenance linear time complexity w.r.t. the number of tuples Weaknesses sub-optimal model (dynamic environment) order effect (use of bidirectional learning operators) ATLAS-GRIM General Purpose Database Summarization VLDB 2005 15 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend