Search for Appropriate Textual Information Sources Adam ALBERT 1 , - PowerPoint PPT Presentation

Search for Appropriate Textual Information Sources Adam ALBERT 1 , MARIE DUŽÍ 1 , Marek MENŠÍK 1 , Miroslav PAJR 2 , Vojtěch PATSCHKA 1 1 VSB-Technical University Ostrava, Department of Computer Science FEI 17. listopadu 15, 708 33 Ostrava, Czech Republic 2 Silesian University in Opava, Institute of Computer Science, Bezručovo nám. 13, 746 01 Opava, Czech Republic

Problem to be solved • One aspect of globalization is the dissemination of knowledge • There is a huge amount of information in the textual resources • Search for relevant information resources in the labyrinth of input textual data • For instance, by googling ‘cat’ we obtain: • Approximate number of results 3 180 000 000, of these types: • Computer-assisted translation • Well-known excavator brand • Animal • Too much information  information overload

How to deal with the problem • Our system generates explications of the concept in question extracted from many textual resources • Background theory – Transparent Intensional Logic (TIL) • Procedural semantics • Concepts are defined as meaning procedures • Explication of a concept is a molecular procedure defining the object in question; • “Cat is a feline animal”  ‘ Cat =  w  t  x [[ ‘Feline ‘Animal ] wt x ] explicandum explication • Based on a chosen explication the system computes and recommends the most relevant textual resources • By applying a data mining method of association rules

Search for Appropriate Textual Information Sources • Input : an atomic concept (explicandum) + textual resources • Extraction and TIL formalization of sentences that mention the concept in question (explicandum) • Generating Carnapian explications • Machine-learning methods applied to the formalized sentences • Results -- molecular concepts, i.e. closed TIL constructions, that explicate the atomic concept • Evaluation of the results (relevant documents can be overlooked in large amount of data): • Checking inconsistencies and/or looking for similarities, etc. • Based on associations between the constituents of the molecular concepts the algorithm computes and recommends other relevant resources

Machine learning (generating explications) • Symbolic method of supervised machine learning • Based on positive / negative examples - inserting or adjusting constituents of a molecular concept • Three heuristic methods: • Negative example  Specialization inserts negated concepts. • Positive example  Refinement inserts new constituents into the molecular construction learned so far • Generalization adjusts the constituents .

Example; explication of Wild Cat [‘ Typ-p  w  t  x [[‘  [‘ Weight wt x ] ‘11]  [‘  [‘ Weight wt x ] ‘1.2]] [‘ Wild ‘ Cat ]]  [‘ Req ‘ Mammal [‘ Wild ‘ Cat ]]  [‘ Req ‘ Has-fur [‘ Wild ‘ Cat ]]  [‘ Typ-p  w  t  x [[‘  [[‘ Average ‘ Body-Length ] wt x ] ‘80]  [‘  [[‘ Average ‘ Body-Length ] wt x ] ‘47]] [‘ Wild ‘ Cat ]]  [‘ Typ-p  w  t  x [‘= [[‘ Average ‘ Skull-Size ] wt x ] ’41.25] [‘ Wild ‘ Cat ]]  [‘ Typ-p  w  t  x [‘= [[‘ Average ‘ Height ] wt x ] ’37.6] [‘ Wild ‘ Cat ]]

Association rule 𝐵 ⟹ 𝐶 • Association between items occurring in a dataset that satisfies a predefined minimal support and confidence. • Support is an indication of how frequently the itemset appears in the dataset. 𝑢 ∈ 𝐸: 𝐵 ⊆ 𝑢 𝑡𝑣𝑞𝑞(𝐵) = 𝐸 • Confidence is an indication of how we can rely on the validity of the rule 𝑑𝑝𝑜𝑔 𝐵 ⟹ 𝐶 = 𝑡𝑣𝑞𝑞 𝐵 ∪ 𝐶 𝑡𝑣𝑞𝑞 𝐵 • By computing the rules that are valid at least with user-defined minimal confidence, the algorithm proposes other textual resources that might be relevant as well.

Incidence matrix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 e 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 e 2 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 e 3 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 e 4 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 e 5 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 e 6 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 e 7 1 1 1 0 0 0 0 0 1 0 1 0 0 1 1 0 0 e 8 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 5. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≥ ′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐶𝑝𝑒𝑧 − 𝑀𝑓𝑜𝑕𝑢ℎ 𝑥𝑢 𝑦 ′47 1. ′𝑁𝑏𝑛𝑛𝑏𝑚 6. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≤ ′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐶𝑝𝑒𝑧 − 𝑀𝑓𝑜𝑕𝑢ℎ 𝑥𝑢 𝑦 ′80 2. ′𝐼𝑏s − fur 3. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≤ ′𝑋𝑓𝑗𝑕ℎ𝑢 𝑥𝑢 𝑦 ′11 7. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′= ′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝑇𝑙𝑣𝑚 − 𝑇𝑗𝑨𝑓 𝑥𝑢 𝑦 ′41.25 4. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≥ ′𝑋𝑓𝑗𝑕ℎ𝑢 𝑥𝑢 𝑦 ′1.2 8. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′= ′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐼𝑓𝑗𝑕ℎ𝑢 𝑥𝑢 𝑦 ′37.6

Simple example of the computed results Wild-cat : a tomic concept that has been explicated • eight resources and thus eight explications • User voted for the first one ( e1 ) , biological explication (mammal, weight, body length, skull size, etc.) • Confidence = 0.66 • The system computed s 4 and s 7 describing behaviour of wild cats {′ 𝑁𝑏𝑛𝑛𝑏𝑚 ,′ 𝐼𝑏 s-fur} ⟹ 𝑓 1 { 𝜇𝑥 𝜇𝑢 𝜇𝑦 [[′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′ 𝐷𝑚𝑏𝑥𝑗𝑜𝑕 ] ∨ [′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′ 𝑉𝑠𝑗𝑜𝑏𝑢𝑗𝑜𝑕 ] ∨ [′ 𝑈𝑓𝑠 - 𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′ 𝑀𝑓𝑏𝑤𝑓𝑡 - 𝐸𝑠𝑝𝑞𝑞𝑗𝑜𝑕𝑡 ]]}

Thank you for your attentio ion

Search for Appropriate Textual Information Sources Adam ALBERT 1 , - PowerPoint PPT Presentation

Search for Appropriate Textual Information Sources Adam ALBERT 1 , MARIE DU 1 , Marek MENK 1 , Miroslav PAJR 2 , Vojtch PATSCHKA 1 1 VSB-Technical University Ostrava, Department of Computer Science FEI 17. listopadu 15, 708 33 Ostrava,

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

2 EBI Search 3 EBI Search 4 EBI

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Search Algorithms 3 AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 1 3 Search Algorithms

Query DB structures Manipulation queries DB search Hits Memory search 2 Standardization of

Search 3 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 3 1 3 Search 3.1 Problem-solving

Informed Search strategies AIMA sections 3.5, 3.6 Summary Informed Search strategies

Search Overview Introduction to Search Blind Search Techniques Heuristic Search

The Economics of Internet Search Hal R. Varian Sept 31, 2007 Search engine use Search

search engine optimization ABOUT ME HOLISTIC SEARCH 2.0 ECOSYSTEM eRetail Search Platform

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Summary of the Week Joanne Brown Scientific Secretary Assessment and Management if Radioactive

INTERNET WORKS! Anthony Bianco - TheTravelTart.com What we dont want A website without

Our Journey Through The TLLP, PKE And Beyond Overview TLLP - Collaborative Inquiry PKE -

Washington Information System for Architectural and Archaeological Data (WISAARD) Expediting the

A C A U p d a t e Including How to Avoid Receiving an IRS Penalty Notice J E N N I F E R R E E

A RISK WORTH TAKING Leading a diverse and disparate team to achieve significant change. Bronwyn

Dep Depar artmen ent of of Lo Loca cal Go Governmen ent Finan Finance ce Welco come t

FY20 Results Presentation Attached is Corporate Travel Management Limited s full year results