Search for Appropriate Textual Information Sources Adam ALBERT 1 , - - PowerPoint PPT Presentation

search for appropriate
SMART_READER_LITE
LIVE PREVIEW

Search for Appropriate Textual Information Sources Adam ALBERT 1 , - - PowerPoint PPT Presentation

Search for Appropriate Textual Information Sources Adam ALBERT 1 , MARIE DU 1 , Marek MENK 1 , Miroslav PAJR 2 , Vojtch PATSCHKA 1 1 VSB-Technical University Ostrava, Department of Computer Science FEI 17. listopadu 15, 708 33 Ostrava,


slide-1
SLIDE 1

Search for Appropriate Textual Information Sources

Adam ALBERT1, MARIE DUŽÍ1, Marek MENŠÍK1, Miroslav PAJR2, Vojtěch PATSCHKA1

1VSB-Technical University Ostrava, Department of Computer Science FEI

  • 17. listopadu 15, 708 33 Ostrava, Czech Republic

2Silesian University in Opava, Institute of Computer Science,

Bezručovo nám. 13, 746 01 Opava, Czech Republic

slide-2
SLIDE 2

Problem to be solved

  • One aspect of globalization is the dissemination of knowledge
  • There is a huge amount of information in the textual resources
  • Search for relevant information resources in the labyrinth of input

textual data

  • For instance, by googling ‘cat’ we obtain:
  • Approximate number of results 3 180 000 000, of these types:
  • Computer-assisted translation
  • Well-known excavator brand
  • Animal
  • Too much information  information overload
slide-3
SLIDE 3

How to deal with the problem

  • Our system generates explications of the concept in question extracted

from many textual resources

  • Background theory – Transparent Intensional Logic (TIL)
  • Procedural semantics
  • Concepts are defined as meaning procedures
  • Explication of a concept is a molecular procedure defining the object in question;
  • “Cat is a feline animal”  ‘Cat = wt x [[‘Feline ‘Animal]wt x]

explicandum explication

  • Based on a chosen explication the system computes and recommends the

most relevant textual resources

  • By applying a data mining method of association rules
slide-4
SLIDE 4

Search for Appropriate Textual Information Sources

  • Input: an atomic concept (explicandum) + textual resources
  • Extraction and TIL formalization of sentences that mention the concept in

question (explicandum)

  • Generating Carnapian explications
  • Machine-learning methods applied to the formalized sentences
  • Results -- molecular concepts, i.e. closed TIL constructions, that explicate the atomic

concept

  • Evaluation of the results (relevant documents can be overlooked in large

amount of data):

  • Checking inconsistencies and/or looking for similarities, etc.
  • Based on associations between the constituents of the molecular concepts the

algorithm computes and recommends other relevant resources

slide-5
SLIDE 5

Machine learning (generating explications)

  • Symbolic method of supervised machine learning
  • Based on positive / negative examples - inserting or adjusting

constituents of a molecular concept

  • Three heuristic methods:
  • Negative example  Specialization inserts negated concepts.
  • Positive example  Refinement inserts new constituents into the molecular

construction learned so far

  • Generalization adjusts the constituents.
slide-6
SLIDE 6

Example; explication of Wild Cat

[‘Typ-p wt x [[‘ [‘Weightwt x] ‘11]  [‘ [‘Weightwt x] ‘1.2]] [‘Wild ‘Cat]]  [‘Req ‘Mammal [‘Wild ‘Cat]]  [‘Req ‘Has-fur [‘Wild ‘Cat]]  [‘Typ-p wt x [[‘ [[‘Average ‘Body-Length]wt x] ‘80]  [‘ [[‘Average ‘Body-Length]wt x] ‘47]] [‘Wild ‘Cat]]  [‘Typ-p wt x [‘= [[‘Average ‘Skull-Size]wt x] ’41.25] [‘Wild ‘Cat]]  [‘Typ-p wt x [‘= [[‘Average ‘Height]wt x] ’37.6] [‘Wild ‘Cat]]

slide-7
SLIDE 7

Association rule 𝐵 ⟹ 𝐶

  • Association between items occurring in a dataset that satisfies a predefined

minimal support and confidence.

  • Support is an indication of how frequently the itemset appears in the dataset.

𝑡𝑣𝑞𝑞(𝐵) = 𝑢 ∈ 𝐸: 𝐵 ⊆ 𝑢 𝐸

  • Confidence is an indication of how we can rely on the validity of the rule

𝑑𝑝𝑜𝑔 𝐵 ⟹ 𝐶 = 𝑡𝑣𝑞𝑞 𝐵 ∪ 𝐶 𝑡𝑣𝑞𝑞 𝐵

  • By computing the rules that are valid at least with user-defined minimal

confidence, the algorithm proposes other textual resources that might be relevant as well.

slide-8
SLIDE 8

Incidence matrix

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

e1

1 1 1 1 1 1 1 1

e2

1 1 1

e3

1 1 1 1 1 1

e4

1 1 1 1 1 1

e5

1 1 1 1

e6

1 1 1 1

e7

1 1 1 1 1 1 1

e8

1 1 1

  • 1. ′𝑁𝑏𝑛𝑛𝑏𝑚
  • 2. ′𝐼𝑏s−fur
  • 3. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≤ ′𝑋𝑓𝑗𝑕ℎ𝑢𝑥𝑢 𝑦

′11

  • 4. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≥ ′𝑋𝑓𝑗𝑕ℎ𝑢𝑥𝑢 𝑦

′1.2

  • 5. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≥

′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐶𝑝𝑒𝑧−𝑀𝑓𝑜𝑕𝑢ℎ 𝑥𝑢 𝑦 ′47

  • 6. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′≤

′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐶𝑝𝑒𝑧−𝑀𝑓𝑜𝑕𝑢ℎ 𝑥𝑢 𝑦 ′80

  • 7. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′=

′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝑇𝑙𝑣𝑚−𝑇𝑗𝑨𝑓

𝑥𝑢 𝑦 ′41.25

  • 8. 𝜇𝑥 𝜇𝑢 𝜇𝑦 ′=

′𝐵𝑤𝑓𝑠𝑏𝑕𝑓 ′𝐼𝑓𝑗𝑕ℎ𝑢

𝑥𝑢 𝑦 ′37.6

slide-9
SLIDE 9

Simple example of the computed results

Wild-cat: atomic concept that has been explicated

  • eight resources and thus eight explications
  • User voted for the first one (e1), biological explication (mammal,

weight, body length, skull size, etc.)

  • Confidence = 0.66
  • The system computed s4 and s7 describing behaviour of wild cats

{′𝑁𝑏𝑛𝑛𝑏𝑚,′𝐼𝑏s-fur} ⟹𝑓1 {𝜇𝑥 𝜇𝑢 𝜇𝑦 [[′𝑈𝑓𝑠-𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′𝐷𝑚𝑏𝑥𝑗𝑜𝑕 ] ∨ [′𝑈𝑓𝑠-𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′𝑉𝑠𝑗𝑜𝑏𝑢𝑗𝑜𝑕 ] ∨ [′𝑈𝑓𝑠-𝑁𝑏𝑠𝑙𝑗𝑜𝑕𝑥𝑢 𝑦 ′𝑀𝑓𝑏𝑤𝑓𝑡-𝐸𝑠𝑝𝑞𝑞𝑗𝑜𝑕𝑡 ]]}

slide-10
SLIDE 10

Thank you for your attentio ion