Towards Big Data Provenance
Daniel Deutch
Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty
- f Exact Sciences
Tel Aviv University
cartoon by T. Gregorius
Big Data, Big Problem Data-intensive systems are highly complex - - PowerPoint PPT Presentation
Towards Big Data Provenance Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University cartoon by T. Gregorius Big Data, Big Problem Data-intensive systems are highly
Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty
Tel Aviv University
cartoon by T. Gregorius
– Manipulate large-scale data in intricate ways – Machine Learning, Data Mining systems are often black box, even black magic?
– Errors in input (measurements, crowd, text) – Errors in processing (ambiguities, imperfect text understanding, “bugs”)
legitimate data
getting out of control
variable named PROSSURG. It turned out this represented whether the patient had received prostate surgery, an incredibly predictive but
customer would open an account at a bank.”
problem.
– […] It turns out that a specific salesperson was assigned to take over cases where customers had already notified they intend to churn.” “Leakage in Data Mining: Formulation, Detection, and Avoidance”, Kaufman,Rosset, Perlich, ACM Transactions on Knowledge Discovery from Data (TKDD) 6.4 (2012): 15
Bouhris, D., Moskovitch, Analyzing Data-Centric Applications: Why, What-if, and How-to, ICDE ‘16 (to appear) D., Moskovitch, Tannen, A Provenance Framework for Data-Dependent Process Analysis, VLDB ‘14 D., Roy, Milo, Tannen, Provenance Circuits for Datalog, ICDT ‘14
D., Gilad, Moskovitch, Selective Provenance for Datalog Programs Using Top-K Queries, VLDB ‘15
Ainy, Bourhis, Davidson, D., Milo, Approximated Summarization of Data Provenance, CIKM ‘15
D., Frost, Gilad, NLProv: Natural Language Provenance, Submitted
cartoon by T. Gregorius