1
Advanced Topics in Database Management (INFSCI 2711)
Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008
Vladimir Zadorozhny, DINS, SCI, University of Pittsburgh
Web Data Management
The Web document collection
I
No design/co-ordination
I
Unstructured (text, html, …), semi-structured (XML, annotated photos), structured (Databases)…
I
Distributed content creation, linking, democratization of publishing
I
Content includes truth, lies, obsolete information, contradictions …
I
Scale much larger than previous text collections
I
Growth – slowed down from initial “volume doubling every few months” but still expanding
I