information filtering
play

Information Filtering Information Systems M Prof. Paolo Ciaccia - PDF document

Information Filtering Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/


  1. Information Filtering Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/ ���������������������������������������������� The Information Filtering (IF) problem: � Deliver to users only the information that is relevant to them, filtering � out all irrelevant new data items (news, papers, advertisments, …) Although IF and IR share the common goal to provide users with relevant � information, there are important differences: IR IF Selecting relevant items Filtering out the many Goal (docs) for each query irrelevant data items Type of use Ad-hoc use Repetitive use Type of users One-time users Long-term users Representation of Queries User profiles information needs Index Items User profiles ��������������������� ��������������������� � 1

  2. ������������������ IF techniques find applications in a variety of scenarios, including: � Automatic delivery of news/alerts � Online display advertising � Publish/subscribe systems � … � Recommender systems are a specific type of IF systems that will be � discussed later on ��������������������� ��������������������� � ������������ Due to its similarity with IR, it is not surprise that the most common � approaches to IF are based on the Boolean and the Vector Space models However, a more detailed and structured description of the user profile is � now needed, in order to improve the effectiveness of matching In the sequel we will sketch the details of a recent approach based on the � Boolean model; examples of use of the VSM will be given in the context of recommender systems ��������������������� ��������������������� � 2

  3. �������������������������������������������� Reference: [WBS+09] � Scenario: A (profiled) user visiting a web site (also called an “assignment”) � Many advertisement campaigns managed by the site � Both specified using Boolean expressions (BE’s) over a multi)attribute � space Alternatively (pub/sub system): An incoming item � Many stored user profiles � One “assignment” to be efficiently matched against many stored BE’s index BE Assignment Matched BE’s ��������������������� ��������������������� � �������� Two types of Boolean predicates: ∈ and ∉ � E.g.: state ∈ {CA,NY}, state ∉ {NY} � Ranges of values are converted into ∈ and ∉ predicates � age < 30 converted into age ∈ {0,1,2} (0 = [0,9], 1 = [10,19], …) � A BE is either in DNF or in CNF normal form, e.g.: � (state ∈ {CA,NY} & age ∈ {1,2}) | (state ∉ {NY} & gender ∈ {F}) & = AND; | = OR � In the following we only discuss the DNF case � An assignment S is a set (conjunction) of attribute and value pairs � E.g.: S: state = CA & gender = F � An attribute-value pair is also called a key � E.g. (state,CA) is a key � ��������������������� ��������������������� 3

  4. ����������� A BE E is satisfied by an assignment S if S makes E true � S: state = CA & gender = F � E1: state ∈ {CA,NY} satisfied � E2: state ∈ {CA,NY} & gender ∈ {M} not satisfied � Since an assignment needs not to specify a value for all the attributes, the � semantics of matching needs to be refined (state ∈ {NY} & gender ∈ {F}) is satisfied by gender = F? NO � (state ∉ {NY} & gender ∈ {F}) is satisfied by gender = F? MAYBE… � Two alternative interpretations for ∉ predicates: � Strong) ∉ predicate: violated if no value is specified for the attribute � Weak) ∉ predicate: satisfied if no value is specified for the attribute � The default are weak- ∉ predicates; � The strong- ∉ semantics can be enforced by writing, e.g.: state ∉ {NY,NULL}, � which requires a value for state to be present in the assignment ��������������������� ��������������������� ! ��������������"�#�������������� The basic idea is to build an inverted index on BE’s that, for each key, stores � the BE’s containing it The basic case is when BE’s are simple conjunctions of ∈ predicates � E1: A ∈ {1} Inverted Index E2: A ∈ {1} & B ∈ {2} & C ∈ {3,4} Key Posting list (A,1) E1, E2 (B,2) E2 S: A = 1 & B = 2 (C,3) E2 (C,4) E2 The problem is that neither intersection nor union of posting lists work here: - Intersection: E2 - Union: E1 and E2 ��������������������� ��������������������� $ 4

  5. ��������������"�����%���&'������(����� Entries are partitioned based on the number of conjuncts K in each BE � The partition of the inverted index storing information of BE’s with K � conjuncts is called the “K-index” BE’s (conjunctions) Inverted Index K Key Posting list ID BE K C1 age ∈ {3} & state ∈ {NY} 2 0 (state,CA) (C6, ∉ ) C2 age ∈ {3} & gender ∈ {F} 2 (state,NY) (C6, ∉ ) C3 age ∈ {3} & gender ∈ {M} & state ∉ {CA} 2 Z (C6, ∈ ) C4 2 1 (age,3) (C5, ∈ ) state ∈ {CA} & gender ∈ {M} C5 1 (age,4) (C5, ∈ ) age ∈ {3,4} C6 state ∉ {CA,NY} 0 2 (state,NY) (C1, ∈ ) (C1, ∈ ), (C2, ∈ ), (age,3) (C3, ∈ ) The “Z key” is used to handle the case � (gender,F) (C2, ∈ ) K = 0 (notice that ∉ predicates do not (state,CA) (C3, ∉ ) ,(C4, ∈ ) concur to determine the value of K) (gender,M) (C3, ∈ ), (C4, ∈ ) ��������������������� ��������������������� ) *���%+��&'����������������("�#���������� Given an assignment S with t keys, two basic conditions are used to check if � a conjunction C matches S: 1. For a K)index with K ≤ t, a conjunction C matches S only if there are K posting lists such that: � Each list refers to a key (A,v) in S, and (C, ∈ ) is in the posting list 2. For no (A,v) key in S there is a posting list in which (C, ∉ ) appears Example: � C1: (age ∈ {3} & gender ∈ {M}) matches � S: age ∈ {3} & gender ∈ {M} & state ∈ {CA} C2: (age ∈ {3} & gender ∈ {M} & state ∉ {CA}) � does not match S, since the posting list of the key (state,CA) includes the entry (C2, ∉ ) The Conjunction algorithm iterates through the K)indexes by checking that � above conditions are satisfied Further, it does not consider at all K)indexes with K > t � ��������������������� ��������������������� ,- 5

  6. *���%+��&'����������������("�������� Inverted Index S: age =3 & state = CA & gender = M K Key Posting list First, all the relevant posting lists are � obtained (one K-index at a time) 0 (state,CA) (C6, ∉ ) Z (C6, ∈ ) For K=2 it is recognized that neither � 1 (age,3) (C5, ∈ ) C1 nor C2 can be satisfied by S 2 (age,3) (C1, ∈ ), (C2, ∈ ), (C3, ∈ ) Although C3 satisfies condition 1, � (state,CA) (C3, ∉ ) ,(C4, ∈ ) it violates cond. 2 (gender,M) (C3, ∈ ), (C4, ∈ ) C4 satisfies both conditions � BE’s (conjunctions) The same holds for C5 (K=1) � ID BE K C6 violates condition 2 � C1 age ∈ {3} & state ∈ {NY} 2 C2 age ∈ {3} & gender ∈ {F} 2 Result: {C4,C5} C3 age ∈ {3} & gender ∈ {M} & state ∉ {CA} 2 C4 state ∈ {CA} & gender ∈ {M} 2 C5 age ∈ {3,4} 1 C6 state ∉ {CA,NY} 0 ��������������������� ��������������������� ,, *���./������ To process BE’s in DNF it is sufficient to observe that a BE E is satisfied by an � assignment S iff at least one of its conjunctions of predicates is satisfied by S Example: � (state ∈ {CA} & gender ∈ {M}) | (state ∈ {NY} & gender ∈ {F}) is satisfied by S: age =3 & state = CA & gender = M ��������������������� ��������������������� ,� 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend