t echniques and rule p atterns fo r decla ratively
play

T echniques and Rule P atterns fo r Decla ratively Querying - PowerPoint PPT Presentation

T echniques and Rule P atterns fo r Decla ratively Querying W eb Data with FLORID Bertram Lud ascher Rainer Himmer oder W olfgang Ma y Institut f ur Info rmatik Universit at F reiburg


  1. T echniques and Rule P atterns fo r Decla ratively Querying W eb Data with FLORID Bertram Lud � ascher Rainer Himmer� oder W olfgang Ma y Institut f � ur Info rmatik� Universit � at F reiburg� Germany Overview � Intro duction � FLORID W eb mo del � Integration of W eb Access with DOOD pa radigm � Data Integration� A Case Study � Navigation � Conclusions

  2. INTRODUCTION � Goal� A unifo rm framew o rk�system fo r � Querying the W eb� � exp ress decla ratively ho w to query�navigate on the W eb � extract data from W eb pages fo r p opulating a database � W eb�data w a rehousing � � Management of Semistructured Data� � structure is irregula r� pa rtial� unkno wn� implicit in the data � example� HTML pages � querying�navigation using general path exp ressions �b oth in the w eb �via links� and in the database� � discover structure � Info rmation Integration� � heterogeneous sources with di�erent structure � wrapp ers� mediato rs

  3. QUERYING THE WEB WITH F�LOGIC�FLORID � DOOD P a radigm� � deduction � data�driven explo ration of the W eb and high level querying � object�o rientation � �exible mo deling of semistructured data �optional metho ds instead of NULLs� � extension of F�logic fo r querying and restructuring the W eb� W eb�FLORID � decla rative rule�based p rogramming st yle� unifo rm language fo r wrapp ers � mediato rs � meta features� schema b ro wsing�reasoning� va riables at class�metho d p ositions � restructuring of info rmation � navigation b y �general� path exp ressions � � unifo rm access to lo cal db � W eb data integration of heterogenous info rmation

  4. F�LOGIC IN A NUTSHELL � Basic Constructs� � ISA�relation � � � � Object�Class � SUBCLASS�relation � � � SubClass��Class Class � � � SIGNA TURE� single�valued Method��P�types� �� R�type Class � � � ��� and multi�valued Method��P�types� ��� R�types Object � � � D A T A� single�valued Method��Params� �� R Object � f R��R� g � � ��� and multi�valued Method��Params� ��� M���P�� � � �� M���P�� � � � P A TH EXPRESSION Obj� Spec� Spec� Object Creation via P ath Exp ressions in the Head� X�father�man X�person� � X�mother�woman X�person� � �� �person�M�C� M�father� C�man� M�mother� C�woman

  5. WEB MODEL � The W eb � Graph� consisting of no des �urls� containing w eb do cuments � and links hrefs��label� � � �url�� �url�� �HTML��HEAD������HEA D� �HTML��HEAD������HEAD� ��� ��� �A HREF��url�� �label ��A� �A HREF��������� ��A� ��� ��� ��HTML� ��HTML� � �z � � �z � wd� wd� Link Structure� Signature � webdoc � � hrefs��string� ��� url Example � wd��webdoc � � hrefs���label�� ��� �url�� F urther A ttributes� webdoc � � � self �� url� address �� string� modif �� string� ��� � error ��� string Additional� user�p rogrammed evaluation of the w eb do cuments�

  6. INTEGRA TION OF THE WEB MODEL IN THE DEDUCTIVE SYSTEM � F�LOGIC�DB url webdoc hrefs u get � � � � address url��string � � get �� webdoc Rule�Based Explo ration � U�get � � � generate OID U�url� ��� � � � ��� add to U�get�webdoc webdoc � U�get � � � ��� �ll in slots address �� ���� hrefs������ ��� ��� U�explored U�url�get � � � � U�url � � � NewU�url hrefs�� � ��� NewU �

  7. SEMANTICS � � P ath Exp ressions �FLU�VLDB���� � HB closure axioms extended Herb rand universe � Herb rand base U � W eb Interface � set of reserved names � get � url � ���� R hrefs U RL � P � HB � U RL � � explo re � � � � maps URLs to sets of new facts R � W eb Access Axiom � fo r � HB � H j � � � � j � fo r all facts � explo re � u � H u url u �get H � � new new �if is de�ned fo r a URL u � then all explo red data is in � get H � minimal Herb rand W eb Mo del � Integration with Bottom�up Evaluation � � W � H � �� � � H � � explo re � u � T H T � � P P u � u � � H � url � get � T � P � decla rative semantics � if explo re �� � then W eb�FLORID � FLORID

  8. EXAMPLE� INTEGRA TION CIA W ORLD F A CTBOOK and W ORLD ONLINE � CIA W ORLD F A CTBOOK �CIA� � geography � p eople� government� economy � ��� no cities �apa rt from country capitals� � info rmation� link structure� fo rmatted text � �at �text� structure� quite regula r� only �BR� �tags used fo r structuring �B�� �I�� W ORLD ONLINE �W OL� � administrative divisions� main cities � info rmation� link structure� tables � structured �tables�� but not regula r �di�erent table la y out� columns�

  9. EXAMPLE� INTEGRA TION CIA W ORLD F A CTBOOK and W ORLD ONLINE �

  10. INTEGRA TION METHODOLOGY� T ypical Steps and Rules � CIA F actb o ok� Matching via Regula r Exp ressions� accessing relevant pages� C�url��cia���U� �� C�continent�file��cia���FN�� strcat�cia�src�FN�U�� U�url�get �� C�continent�url��cia���U�� cid�C��country�url��cia� �� U� name��cia��� Label� continent �� CT � �� CT�continent�url��cia��get� href s��L abel � ��� U�� U�url�get �� ��country�url��cia���U�� extracting �ra w data�� pattern�capital�name���Cap ital ����n ���� ���� pattern�total�area���total area����n���� sq km���� C�Method �� X� �� pattern�Method� RegEx�� pmatch�C�country�url��cia��ge t� RegEx� ����� X�� restructuring and data cleaning� C�real�country �� C�country�capital�name��CA�� not substr��none�� CA�� � P atterns and rules fo r commalists �ethic groups� languages�

  11. INTEGRA TION METHODOLOGY� T ypical Steps and Rules �� W OL P ages� P a rsing �nsgmls�P a rser integrated into FLORID � and Evaluating� Accessing � pa rsing relevant pages� U�url�parse �� C�country�url��wol���U�� �� Generates parsetree of the document �� Tab��U�url�parse�table� element�Tab�Row�Col��conte nts� �Cont �typ e��T ype� �� Tab��U�parse�table�� Tab�table�����tbody��Row���� �tr� �Col ���X �Type ���� ��Co nt�� �� Identifying Main�Cities�T able and column attributes C�main�city�tab �� T�header�row��HZ�pop�year��P S���Y �cit y�co l��� CS�po p�co l��� PS�� �� C�country�url��wol���U�� T��U�parse�table�� element�T������contents��Co nt�� substr�Cont��main cities��� element�T�HZ�CS��contents�� Heade r��t ype� �th� � substr��city��Header��� element�T�HZ�PS��contents�� Heade r��t ype� �th� � substr��pop��Header��� pmatch�Header�������������� ����� ���� ���Y �� Evaluation of main�cities�table� C�main�cities ��� cty�C�CN��city�country��C�nam estr� �N�p opul atio n��Y� ��P� � �� C�country�main�city�tab �� T�city�col���CS�pop�col���PS�p op�y ear�� PS�� �Y�� �� element�T�DZ�CS��contents�� CN�ty pe�� td�� element�T�DZ�PS��contents�� P�typ e��t d��

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend