towards automa cally se3ng language bias in rela onal
play

Towards Automa-cally Se3ng Language Bias in Rela-onal Learning Jose - PowerPoint PPT Presentation

Towards Automa-cally Se3ng Language Bias in Rela-onal Learning Jose Picado, Arash Termehchy, Alan Fern, Sudhanshu Pathak Informa-on and Data Management and Analy-cs (IDEA) Lab Design a drug to treat HIV What is the structure of compounds that


  1. Towards Automa-cally Se3ng Language Bias in Rela-onal Learning Jose Picado, Arash Termehchy, Alan Fern, Sudhanshu Pathak Informa-on and Data Management and Analy-cs (IDEA) Lab

  2. Design a drug to treat HIV What is the structure of compounds that have an#-HIV ac-vity? A compound has an#-HIV ac-vity if it has the following substructure: Oracle N O N 2

  3. Rela-onal learning can learn defini-on for an--HIV compound atom Training data: compId atomId atomId element an#-HIV no-an#-HIV c1 a1 a1 N compId compId c2 a10 a2 O c1 c2 bond c3 c4 atomId1 atomId2 type a1 a2 single a2 a3 single an--HIV(x) :- compound(x,u), atom(u,N), compound(x,v), atom(v,O), Rela-onal learning compound(x,w), atom(w,N), algorithm bond(u,v,single), bond(v,w,single). 3

  4. Benefits of rela-onal learning ü Leverage the structure of compound atom data and learn over complex compId atomId atomId element schemas with mul-ple tables c1 a1 a1 N c2 a10 a2 O ü Automa-c feature extrac-on and selec-on bond atomId1 atomId2 type ü Results are interpretable a1 a2 single (Datalog) a2 a3 single an--HIV(x) :- compound(x,u), atom(u,N), compound(x,v), atom(v,O), Rela-onal learning compound(x,w), atom(w,N), algorithm bond(u,v,single), bond(v,w,single). 4

  5. How rela-onal learning works What is the defini-on of the advisedBy rela-on? paperAuthor professor student advisedBy paperId authorId id posi-on id phase year studId profId p1 f1 f1 faculty s1 post_quals 3 s1 f1 p1 s1 f2 faculty s2 pre_quals 2 s3 f3 p2 s3 f3 adjunct s3 post_prelims 5 not-advisedBy p2 f3 studId profId … s2 f3 s1 f3 Rela-onal learning ? algorithm 5

  6. Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on paperAuthor paperId authorId student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- true. 6

  7. Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor paperId authorId student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- true. 7

  8. Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor paperId authorId student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- paperAuthor(z,x). 8

  9. Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor f=1 f=2 f=0 paperId authorId student(x,v,w) paperAuthor(z,y) student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- paperAuthor(z,x). 9

  10. Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor f=1 f=2 f=0 paperId authorId student(x,v,w) paperAuthor(z,y) student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- paperAuthor(z,x), paperAuthor(z,y). 10

  11. Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor f=1 f=2 f=0 paperId authorId student(x,v,w) paperAuthor(z,y) student f=2 f=1 f=1 id phase year No improvement Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- paperAuthor(z,x), paperAuthor(z,y). 11

  12. Learned defini-on What is the defini-on of the advisedBy rela-on? paperAuthor professor student advisedBy paperId authorId id posi-on id phase year studId profId p1 f1 f1 faculty s1 post_quals 3 s1 f1 p1 s1 f2 faculty s2 pre_quals 2 s3 f3 p2 s3 f3 adjunct s3 post_prelims 5 not-advisedBy p2 f3 studId profId … s2 f3 s1 f3 Rela-onal learning advisedBy(x,y) :- algorithm paperAuthor(z,x), paperAuthor(z,y). 12

  13. Hypothesis space in rela-onal learning algorithms is huge • Hypothesis space: all Datalog defini-ons containing rela-ons in the schema • Current solu-on: users must set language bias to restrict the hypothesis space professor advisedBy(x,y) :- id posi-on … paperAuthor paperAuthor(x,x) professor(x,z) paperId authorId paperAuthor(z,x) professor(x,y) paperAuthor(z,y) student(x,v,w) paperAuthor(x,y) student(x,y,z) student paperAuthor(z,v) … id phase year 13

  14. Syntac-c bias restricts the structure of learned Datalog defini-ons • Which rela-ons to query? • Which rela-ons to join and over which agributes? • Should an agribute be a constant or a variable? join paperId with professor id? professor id posi-on advisedBy(x,y) :- paperAuthor(z,x), professor(z,v). paperAuthor advisedBy(x,y) :- paperId authorId professor(y,z), professor(y,faculty). student constant variable id phase year 14

  15. Predicate defini-ons • Assign types to each agribute in every rela-on • Only agributes with same type can join professor a;ribute type id posi-on professor[id] professor professor[posi-on] posi-on paperAuthor paperAuthor[paperId] paper paperId authorId paperAuthor[authorId] student paperAuthor[authorId] professor student student[id] student id phase year … 15

  16. Predicate defini-ons • Assign types to each agribute in every rela-on • Only agributes with same type can join input to the algorithm a;ribute type professor(professor,posi-on) professor[id] professor paperAuthor(paper,student) professor[posi-on] posi-on paperAuthor(paper,professor) paperAuthor[paperId] paper student(student,phase,year) … paperAuthor[authorId] student paperAuthor[authorId] professor student[id] student advisedBy(x,y) :- … paperAuthor(z,x), professor(z,v). 16

  17. Mode defini-ons • Define the mode to call rela-ons and create literals • Each agribute can be: – an exis-ng variable (+) – an exis-ng or new variable (-) – a constant (#) input to the algorithm professor id posi-on professor(+,-) paperAuthor professor(-,+) professor(+,#) paperId authorId … student id phase year 17

  18. Predicate and mode defini-ons are the “black magic” of rela-onal learning • All rela-onal learning algorithms require syntac-c bias • Manually wrigen by the user Rewrite Learn Evaluate Difficult and Requires exper-se Trial-and-error -me-consuming 18

  19. Many lines of code to specify defini-ons movies(+movieid,--tle,-year) movies2composers(+movieid,-composer) cer-ficates(+movieid,#country,#cer-ficate) movies2genres(+movieid,-genreid) movies2composers(-movieid,+composer) countries(+countryid,-country) movies2prodcompanies(+movieid,- composers(+composer,-name) countries(+countryid,#country) prodcompanyid) movies2costdes(+movieid,-costdes) running-mes(+movieid,--me) movies2colors(+,movieid,-colorid) movies2costdes(-movieid,+costdes) running-mes(+movieid,#-me) movies2directors(+movieid,-director) costdesigners(+costdes,-name) aka-tles(+movieid,-languageid,--tle) movies2directors(-movieid,+director) movies2editors(+movieid,-editor) akanames(+name,-name) movies2producers(+movieid,-producer) movies2editors(-movieid,+editor) altversions(+movieid,-text) movies2producers(-movieid,+producer) editors(+editor,-name) business(+movieid,-text) producers(+producer,-name) movies2misc(+movieid,-misc) plots(+movieid,-text) directors(+director,-name) misc(+misc,-name) biographies(+bio,-name,-text) colorinfo(+colorid,-color) movies2proddes(+movieid,-proddes) distributors(+movieid,-name) colorinfo(+colorid,#color) movies2proddes(-movieid,+proddes) mpaara-ngs(+movieid,-text) movies2writers(+movieid,-writer) proddesigners(+proddes,-name) mpaara-ngs(+movieid,#text) movies2writers(-movieid,+writer) genres(+genreid,-genre) releasedates(+movieid,-countryid,-date) writers(+writer,-name) genres(+genreid,#genre) releasedates(+movieid,-countryid,#date) movies2actors(+movieid,-actor,-character) prodcompanies(+prodcompanyid,- technical(+movieid,-text) actors(+actor,-name,-sex) prodcompany) technical(+movieid,#text) actors(+actor,-name,#sex) ra-ngs(+movieid,-rank,-votes) language(+languageid,-language) movies2cinematgrs(+movieid,-cinemat) cer-ficates(+movieid,-country,-cer-ficate) language(+languageid,#language) movies2cinematgrs(-movieid,+cinemat) cer-ficates(+movieid,#country,-cer-ficate) movies2languages(+movieid,-languageid) cinematgrs(+cinemat,-name) cer-ficates(+movieid,-country,#cer-ficate) movies2countries(+movieid,-countryid) 19

  20. AutoMode: automa-cally induce syntac-c bias • Leverage informa-on in the schema and content of the database AutoMode Exact IND Discovery Predicate and mode Approximate defini-ons IND Discovery Rela-onal learning algorithm 20

  21. AutoMode: generate predicate defini-ons • Use inclusion dependencies (referen-al integrity constraints) to find types of agributes • Key idea: the most frequently used joins are the ones over the agributes that par-cipate in an IND – E.g., primary-key to foreign-key rela-onship professor taughtBy id posi-on courseId profId term f1 faculty c1 f1 Fall16 f2 faculty c2 f2 Fall16 f3 adjunct taughtBy[profId] professor[id] ⊆ 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend