Web Mining For Knowledge Discovery Using Ontology as a Background - - PowerPoint PPT Presentation

web mining for knowledge discovery
SMART_READER_LITE
LIVE PREVIEW

Web Mining For Knowledge Discovery Using Ontology as a Background - - PowerPoint PPT Presentation

Web Mining For Knowledge Discovery Using Ontology as a Background knowledge Searching using Googles rules Google searches for all words Google ignores many common words (stop words) Google finds results anywhere in a document,


slide-1
SLIDE 1

Web Mining For Knowledge Discovery

Using Ontology as a Background knowledge

slide-2
SLIDE 2

Searching using Google’s rules

  • Google searches for all words
  • Google ignores many common words (stop words)
  • Google finds results anywhere in a document, not

just in its text

  • Google returns pages ordered by PageRank, a

measure that Google uses to gauge a page’s popularity

  • Proximity matters
  • Simple Google searches are limited to ten

keywords

  • Google finds its results depending on words that
  • ccur in Web pages, not by analyzing your search

phrase for its meaning

slide-3
SLIDE 3

Search Engine Optimization

  • Search Engine Optimization (SEO) is the black

magic, craft, or art (depending upon whom you ask) of writing or editing Web pages and sites so that they move up in search engine rankings and are returned at the top of a list of search results.

  • This is an important subject because if a Web

page is not in the top search results, very few people can find it. Webmasters want to know about SEO to improve their rankings and increase traffic to their sites.

  • As a general rule, people don’t look past the first

three pages (or 30 listings) of search results

slide-4
SLIDE 4

SEO & Google Recommendations

  • Determine the most important keywords that

are relevant to your content and use them to titles, URL, Heading, and image tags on each page.

  • Pages with content that is often renewed tend

to get more attention than pages that don’t have anything new

  • Simple site designs are better than busy

pages.

  • Create links from your pages out to relevant,

popular Web pages (Outbound Links)

  • Request that sites that have content related

to your pages link to you (Inbound Links)

slide-5
SLIDE 5

Unwanted Results

  • SPAM Pages
  • Commercial Pages
  • Error Pages
  • Login Pages
slide-6
SLIDE 6

Occurrence Operators

slide-7
SLIDE 7

Synonym Operator

  • When you place the synonym operator, ~,

directly in front of a search term (without any spaces), the search matches Web synonyms as well as the given search term

  • Google does not use a synonym lookup

table, or a thesaurus. Instead, synonyms are determined by Web usage of the term.

  • Accordingly, This method of discovering

synonyms sometimes leads to some pretty weird results. (try ~patient, ~zebra, and ~cheap)

slide-8
SLIDE 8

Interpreting User Query

  • Part of the Semantic Web vision is to provide

web-scale access to semantically described content.

  • In particular, this implies understanding

users’ information needs accurately enough to allow for retrieving a precise answer using semantic technologies.

  • Currently, most web search engines are

however based on purely statistical techniques.

slide-9
SLIDE 9

Interpreting User Query

  • For restricted domains which can be

formalized using ontologies, there is nevertheless hope that semantic technologies can be put into work to allow for more semantics based search

  • Users are definitely used to express their

information need via simple queries based on keywords.

  • There is substantial recent work on

interpreting full natural language questions semantically w.r.t. an ontology

slide-10
SLIDE 10

Available Approaches

  • Approaches for interpreting keyword queries

using background knowledge available in

  • ntologies.
  • One approach translates a keyword query

into a DL conjunctive query which can be evaluated with respect to an underlying knowledge base (KB)

  • Another approach exists work on the

translation of keywords to XML-based queries, e.g. to interpret keywords as X- Queries on XML data.

slide-11
SLIDE 11

Available Contribution

  • there has already been work on

translating keywords to semantic queries. The approach proposes to map keywords to corresponding WordNet synsets.

  • SemSearch also aim at answering

complex keyword queries by translating them into a logical query.

slide-12
SLIDE 12

Available Contributions

  • My approach is divided into three folds. The first

part, like previous approaches, translate user keywords into formal query, using ontology. They use ready made ontology. My contribution attempts to automatically or semi-automatically build domain ontology using the same search results or using other search results attempted beforehand.

  • The second fold is to use the built ontology to build

a hierarchical structure for the results. This structure is built from the keywords on extracted from the results that maps to their counterparts in the ontology.

slide-13
SLIDE 13

My Approach

  • Bush->Afghanistan->Democracy ,

Violence

  • Bush-> Afghanistan->NATO Secretary

General

  • Bush-> Afghanistan->Sending Troops

– Bush-> Afghanistan->Sending Troops-> More Troops by 2009 – Bush-> Afghanistan->Sending Troops-> Casualty of War

slide-14
SLIDE 14

My Approach

  • The final fold is summarization
  • Summarization would be based on

document level and on collection of document level