knowledge management and applications
play

KNOWLEDGE MANAGEMENT AND APPLICATIONS David Snchez Department of - PowerPoint PPT Presentation

KNOWLEDGE MANAGEMENT AND APPLICATIONS David Snchez Department of Computer April 2013 Science and Mathematics Tarragona 2 The university 3 Created in 1991 52 programmes of study Over 12,000 students The faculty 4


  1. KNOWLEDGE MANAGEMENT AND APPLICATIONS David Sánchez Department of Computer April 2013 Science and Mathematics

  2. Tarragona 2

  3. The university 3  Created in 1991  52 programmes of study  Over 12,000 students

  4. The faculty 4  Engineering degress  Computer science  Telematics  Masters  Computer Security and Intelligent Systems  Artificial Intelligence  Security of the Information and Communication technologies  Doctoral program  Computer Engineering

  5. Research group 5  9 professors and lecturers  6 post doctoral researchers  7 Ph.D. students  7 Research assistants  Data privacy and electronic commerce  Privacy and security in mobile environments  Private information recovery and codes

  6. Contents 6  Introduction  Knowledge acquisition  Semantic operators  Applications to privacy

  7. Motivation 7  Numerical data is easy to manage and transform  3<4 = true  (1+2)/2 = 1.5  {3, 2, 5} -> {2, 3, 5}  A plethora of algorithms rely on aritmetical functions to deal with numerical data

  8. Motivation 8  What about text?  Car ¿>? bike  (apple + orange) / 2 = ??  {flu, cold, pneumonia} -> {?, ?, ?}  Arithmetical functions do not make sense  Text (words, noun phrases) refers to concepts  Concepts should be managed according to their formal semantics

  9. Ontologies 9  Provide a structured representation of a shared conceptualization  Elements  Classes (concepts)  Instances (individuals)  Semantics  Properties (semantic relationships)  Restrictions (logical definition of meanings)

  10. Contents 10  Introduction  Knowledge acquisition  Semantic operators  Applications to privacy

  11. Creating ontologies 11  Manually  Knowledge formalization is challenging  Knowledge can be subjective  Time consuming  Assisted  Proactive knowledge modelling tools  Wizards  Reasoners to check knowledge consistency  Knowledge engineering methods  101, METHONTOLOGY, On-To-Knowledge

  12. Ontology learning 12  Semantics are implicitly referred in text  Textual corpora can be analysed to acquire knowledge  Discover concepts and individuals  Discover and label relations  Taxonomic ( cancer is a disease )  Non-taxonomic ( cancer is treated with radiotherapy )  Attributes ( cancer is non-contagious )  Discover restrictions  Axioms ( Spain borders France -> France borders Spain )

  13. Ontology learning from the Web 13  Corpora: the Web  The largest electronic repository  Heterogenous  It approximates the distribution of information at a social scale  Availability of massive IR tools: Web search engines

  14. Knowledge discovery from text 14  NL processing tools to identify nouns, noun phrases and named entities  Concepts and individuals  Linguistic patterns to discover semantics  Taxonomic  “ cities such as (Nimes)”, “ cancers likes (melanoma)”  Non taxonomic  “ cancer is treated with (surgery)”  Attributes  “ camera has (10MP resolution)”, “ camera features (3x zoom)”  Axioms (functionality, transitivity, symmetry, reflexibity, etc.)  “ Spain borders France ”, “ France borders Spain ” -> Symmetry

  15. Retrieval of suitable corpora 15  Create appropriate web search queries  Taxonomic: “cities such as” […]  Non taxonomic: “cancer is treated with” […]  Attributes: “camera features” […]  Axioms: “Spain borders” & “France borders”

  16. Statistical assessment 16  Statistical assessor  WSE page count approximates query probabilities at a social scale  Use an association score to filter noisy extractions  Point-wise mutual information

  17. References 17  Taxonomic learning  David Sánchez, Antonio Moreno: Pattern-based automatic taxonomy learning from the Web. AI Commununications 21(1): 27-48 (2008)  Non-taxonomic learning  David Sánchez, Antonio Moreno: Learning non-taxonomic relationships from web documents for domain ontology construction. Data & Knowledge Engineering 64(3): 600-623 (2008)  Attribute learning  David Sánchez: A methodology to learn ontological attributes from the Web. Data & Knowledge Engineering 69(6): 573-597 (2010)  Axiom learning  David Sánchez, Antonio Moreno, Luis Del Vasto Terrientes: Learning relation axioms from text: An automatic Web-based approach. Expert Systems with Applications 39(5): 5792-5805 (2012)

  18. Contents 18  Introduction  Knowledge acquisition  Semantic operators  Applications to privacy

  19. Exploiting ontologies  Structured knowledge enables a semantically-coherent interpretation of textual data by  Defining semantically-grounded operators  Semantic similarity is the most basic operator  Similarity(apple, orange) > Similarity(apple, bike)

  20. Semantic similarity 20  Semantic similarity  Degree of taxonomical resemblance  e.g ., dogs and cats are similar as they are mammals  Semantic relatedness  Other non taxonomic relationships are also considered  e.g ., car and wheel or pencil and paper  Similarity measures can be grouped in several families according to  the type of knowledge exploited  the principles in which similarity estimation relies

  21. Ontology-based similarity 21

  22. Edge-counting measures = ( , ) | min_ ( , ) | Distance a b path a b 22

  23. IC-based measures = ( , ) ( ( , )) Sim a b IC LCS a b Least Common Subsumer (LCS) 23

  24. IC-based semantic similarity 24  IC calculus relies on probability assessments = − ( ) log ( ) IC c p c  Based on corpora  Requires general and heterogeneous corpora  Language ambiguity hampers results  Data sparseness produce weak statistics

  25. Ontology-based IC computation 25  Assumption: concepts with many hyponyms in an ontology are more probable to appear in corpora  Concept probabilities are intrinsically approximated according to taxonomic knowledge  Number of hyponyms ( ) log hyponyms c = − ( ) IC c ontology_size

  26. Feature-based measures common_features(a,b) = ( , ) Sim a b disjoint_features(a,b) 26

  27. References 27 Feature-based similarity measures   Montserrat Batet, David Sánchez, Aïda Valls: An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 44(1): 118-125 (2011)  David Sánchez, Montserrat Batet, David Isern, Aïda Valls: Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications 39(9): 7718-7728 (2012) IC-based similarity mesures   Based on corpora  David Sánchez, Montserrat Batet, Aïda Valls, Karina Gibert: Ontology-driven web-based semantic similarity. Journal of Intelligent Information Systems 35(3): 383-413 (2010)  Based on ontologies  David Sánchez, Montserrat Batet, David Isern: Ontology-based information content computation. Knowledge-Based Systems 24(2): 297-303 (2011)  David Sánchez, Montserrat Batet: A New Model to Compute the Information Content of Concepts from Taxonomic Knowledge. International Journal on Semantic Web and Information Systems 8(2): 34-50 (2012)  David Sánchez, Montserrat Batet: Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. Journal of Biomedical Informatics 44(5): 749-759 (2011)

  28. Other semantic operators 28  Semantic similarity/distance is the base to develop other semantically-grounded operators over a sample of textual data  Aggregation (mean/centroid)   n ∑ =   ( , ,..., ) arg min ( , ) Mean x x x distance c x 1 2 n c i   = i 1

  29. Aggregation 29 Sample colic lumbago lumbago migraine pain appendicitis gastritis Mean colic lumbago migraine appendicitis gastritis pain Sum candidates (1) (3) (2) (1) (1) (1) dist lumbago colic 0 3 3 4 4 1 24 migraine lumbago 3 0 2 5 5 2 19 migraine 3 2 0 5 5 2 21 appendicitis 4 5 5 0 2 3 34 gastritis 4 5 5 2 0 3 34 pain 1 2 2 3 3 0 17 ache 2 1 1 4 4 1 16 inflammation 3 4 4 1 1 2 27 symptom 2 3 3 2 2 1 22

  30. Sorting algorithm 30 Algorithm. Sorting procedure Inputs: P (dataset) Output: P ’ ( P sorted) 1 Compute the mean of all values in P 2 Consider the most distant value f to the mean 3 Add f to P’ and remove it from P 4 while (| P | > 0) do 5 Obtain the least distant value r to f 6 Add r to P’ and remove it from P 7 Output P’

  31. References 31  Sergio Martínez, Aïda Valls, David Sánchez: Semantically- grounded construction of centroids for datasets with textual attributes. Knowledge-Based Systems 35: 160-172 (2012)  Sergio Martínez, David Sánchez and Aida Valls: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. Journal of Biomedical Informatics 46(2): 294-303  Josep Domingo-Ferrer, David Sánchez, Guillem Rufian- Torrell: Anonymization of Nominal Data Based on Semantic Marginality. Information Sciences. To Appear

  32. Contents 32  Introduction  Knowledge acquisition  Semantic operators  Applications to privacy

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend