search for appropriate
play

Search for Appropriate Textual Information Sources Adam ALBERT 1 , - PowerPoint PPT Presentation

Search for Appropriate Textual Information Sources Adam ALBERT 1 , MARIE DU 1 , Marek MENK 1 , Miroslav PAJR 2 , Vojtch PATSCHKA 1 1 VSB-Technical University Ostrava, Department of Computer Science FEI 17. listopadu 15, 708 33 Ostrava,


  1. Search for Appropriate Textual Information Sources Adam ALBERT 1 , MARIE DUŽÍ 1 , Marek MENŠÍK 1 , Miroslav PAJR 2 , Vojtěch PATSCHKA 1 1 VSB-Technical University Ostrava, Department of Computer Science FEI 17. listopadu 15, 708 33 Ostrava, Czech Republic 2 Silesian University in Opava, Institute of Computer Science, Bezručovo nám. 13, 746 01 Opava, Czech Republic

  2. Problem to be solved • One aspect of globalization is the dissemination of knowledge • There is a huge amount of information in the textual resources • Search for relevant information resources in the labyrinth of input textual data • For instance, by googling ‘cat’ we obtain: • Approximate number of results 3 180 000 000, of these types: • Computer-assisted translation • Well-known excavator brand • Animal • Too much information  information overload

  3. How to deal with the problem • Our system generates explications of the concept in question extracted from many textual resources • Background theory – Transparent Intensional Logic (TIL) • Procedural semantics • Concepts are defined as meaning procedures • Explication of a concept is a molecular procedure defining the object in question; • “Cat is a feline animal”  ‘ Cat =  w  t  x [[ ‘Feline ‘Animal ] wt x ] explicandum explication • Based on a chosen explication the system computes and recommends the most relevant textual resources • By applying a data mining method of association rules

  4. Search for Appropriate Textual Information Sources • Input : an atomic concept (explicandum) + textual resources • Extraction and TIL formalization of sentences that mention the concept in question (explicandum) • Generating Carnapian explications • Machine-learning methods applied to the formalized sentences • Results -- molecular concepts, i.e. closed TIL constructions, that explicate the atomic concept • Evaluation of the results (relevant documents can be overlooked in large amount of data): • Checking inconsistencies and/or looking for similarities, etc. • Based on associations between the constituents of the molecular concepts the algorithm computes and recommends other relevant resources

  5. Machine learning (generating explications) • Symbolic method of supervised machine learning • Based on positive / negative examples - inserting or adjusting constituents of a molecular concept • Three heuristic methods: • Negative example  Specialization inserts negated concepts. • Positive example  Refinement inserts new constituents into the molecular construction learned so far • Generalization adjusts the constituents .

  6. Example; explication of Wild Cat [‘ Typ-p  w  t  x [[‘  [‘ Weight wt x ] ‘11]  [‘  [‘ Weight wt x ] ‘1.2]] [‘ Wild ‘ Cat ]]  [‘ Req ‘ Mammal [‘ Wild ‘ Cat ]]  [‘ Req ‘ Has-fur [‘ Wild ‘ Cat ]]  [‘ Typ-p  w  t  x [[‘  [[‘ Average ‘ Body-Length ] wt x ] ‘80]  [‘  [[‘ Average ‘ Body-Length ] wt x ] ‘47]] [‘ Wild ‘ Cat ]]  [‘ Typ-p  w  t  x [‘= [[‘ Average ‘ Skull-Size ] wt x ] ’41.25] [‘ Wild ‘ Cat ]]  [‘ Typ-p  w  t  x [‘= [[‘ Average ‘ Height ] wt x ] ’37.6] [‘ Wild ‘ Cat ]]

  7. Association rule 𝐵 ⟹ 𝐶 • Association between items occurring in a dataset that satisfies a predefined minimal support and confidence. • Support is an indication of how frequently the itemset appears in the dataset. 𝑢 ∈ 𝐸: 𝐵 ⊆ 𝑢 𝑡𝑣𝑞𝑞(𝐵) = 𝐸 • Confidence is an indication of how we can rely on the validity of the rule 𝑑𝑝𝑜𝑔 𝐵 ⟹ 𝐶 = 𝑡𝑣𝑞𝑞 𝐵 ∪ 𝐶 𝑡𝑣𝑞𝑞 𝐵 • By computing the rules that are valid at least with user-defined minimal confidence, the algorithm proposes other textual resources that might be relevant as well.

  8. Incidence matrix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 e 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 e 2 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 e 3 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 e 4 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 e 5 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 e 6 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 e 7 1 1 1 0 0 0 0 0 1 0 1 0 0 1 1 0 0 e 8 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 5. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≥ ′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐶𝑝𝑒𝑧 − 𝑀𝑓𝑜𝑕𝑢ℎ 𝑥𝑢 𝑦 ′47 1. ′𝑁𝑏𝑛𝑛𝑏𝑚 6. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≤ ′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐶𝑝𝑒𝑧 − 𝑀𝑓𝑜𝑕𝑢ℎ 𝑥𝑢 𝑦 ′80 2. ′𝐼𝑏s − fur 3. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≤ ′𝑋𝑓𝑗𝑕ℎ𝑢 𝑥𝑢 𝑦 ′11 7. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′= ′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝑇𝑙𝑣𝑚 − 𝑇𝑗𝑨𝑓 𝑥𝑢 𝑦 ′41.25 4. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≥ ′𝑋𝑓𝑗𝑕ℎ𝑢 𝑥𝑢 𝑦 ′1.2 8. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′= ′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐼𝑓𝑗𝑕ℎ𝑢 𝑥𝑢 𝑦 ′37.6

  9. Simple example of the computed results Wild-cat : a tomic concept that has been explicated • eight resources and thus eight explications • User voted for the first one ( e1 ) , biological explication (mammal, weight, body length, skull size, etc.) • Confidence = 0.66 • The system computed s 4 and s 7 describing behaviour of wild cats {′ 𝑁𝑏𝑛𝑛𝑏𝑚 ,′ 𝐼𝑏 s-fur} ⟹ 𝑓 1 { 𝜇𝑥 𝜇𝑢 𝜇𝑦 [[′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′ 𝐷𝑚𝑏𝑥𝑗𝑜𝑕 ] ∨ [′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′ 𝑉𝑠𝑗𝑜𝑏𝑢𝑗𝑜𝑕 ] ∨ [′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′ 𝑀𝑓𝑏𝑤𝑓𝑡 - 𝐸𝑠𝑝𝑞𝑞𝑗𝑜𝑕𝑡 ]]}

  10. Thank you for your attentio ion

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend