Aggregation and Ordering in Factorized Databases B akibayev, K o y, - PowerPoint PPT Presentation

Aggregation and Ordering in Factorized Databases B akibayev, K oˇ y, O lteanu, and Z ´ cisk´ avodn´ y University of Oxford VLDB Sept 2, 2014 http://www.cs.ox.ac.uk/projects/FDB/ 1 / 19

Outline What are Factorized Databases? Applications A Glimpse at Aggregating Factorized Data 2 / 19

Factorized Databases by Example Orders Pizzas Items customer day pizza pizza item item price Mario Monday Capricciosa Capricciosa base base 6 Mario Friday Capricciosa Capricciosa ham ham 1 Pietro Friday Hawaii Capricciosa mushrooms mushrooms 1 Lucia Friday Hawaii Hawaii base pineapple 2 Hawaii ham Hawaii pineapple Consider the natural join of the three relations above: Orders ✶ Pizzas ✶ Items customer day pizza item price Mario Monday Capricciosa base 6 Mario Monday Capricciosa ham 1 Mario Monday Capricciosa mushrooms 1 Mario Friday Capricciosa base 6 Mario Friday Capricciosa ham 1 Mario Friday Capricciosa mushrooms 1 . . . . . . . . . . . . . . . 3 / 19

Factorized Databases by Example Orders ✶ Pizzas ✶ Items customer day pizza item price Mario Monday Capricciosa base 6 Mario Monday Capricciosa ham 1 Mario Monday Capricciosa mushrooms 1 Mario Friday Capricciosa base 6 Mario Friday Capricciosa ham 1 Mario Friday Capricciosa mushrooms 1 . . . . . . . . . . . . . . . A flat relational algebra expression encoding the above query result is: � Mario � × � Monday � × � Capricciosa � × � base � × � 6 � ∪ � Mario � × � Monday � × � Capricciosa � × � ham � × � 1 � ∪ � Mario � × � Monday � × � Capricciosa � × � mushrooms � × � 1 � ∪ � Mario � × � Friday � × � Capricciosa � × � base � × � 6 � ∪ � Mario � × � Friday � × � Capricciosa � × � ham � × � 1 � ∪ � Mario � × � Friday � × � Capricciosa � × � mushrooms � × � 1 � ∪ . . . It uses relational product ( × ), union ( ∪ ), and singleton relations (e.g., � 1 � ). The attribute names are not shown to avoid clutter. 4 / 19

Factorized Databases by Example The previous relational expression entails lots of redundancy due to the joins: � Mario � × � Monday � × � Capricciosa � × � base � × � 6 � ∪ � Mario � × � Monday � × � Capricciosa � × � ham � × � 1 � ∪ � Mario � × � Monday � × � Capricciosa � × � mushrooms � × � 1 � ∪ � Mario � × � Friday � × � Capricciosa � × � base � × � 6 � ∪ � Mario � × � Friday � × � Capricciosa � × � ham � × � 1 � ∪ � Mario � × � Friday � × � Capricciosa � × � mushrooms � × � 1 � ∪ . . . We can factorize the expression following the join structure, e.g.,: pizza � Capricciosa � × ( � Monday � × � Mario � ∪ � Friday � × � Mario � ) × ( � base � × � 6 � ∪ � ham � × � 1 � ∪ � mushrooms � × � 1 � ) day item ∪ � Hawaii � × � Friday � × ( � Lucia � ∪ � Pietro � ) price customer × ( � base � × � 6 � ∪ � ham � × � 1 � ∪ � pineapple � × � 2 � ) There are several algebraically equivalent factorized representations defined by distributivity of product over union and commutativity of product and union. 5 / 19

Properties of Factorized Representations Factorized representations of results of queries with select, project, join, aggregate, groupby, and orderby operators: Very high compression rate ◮ Can be exponentially more succinct than the relations they encode. ◮ Arbitrarily better than generic compression schemes, e.g., bzip2 ◮ Factorized representations of asymptotically-tight size bounds computable directly from input database and query Querying in the compressed domain ◮ Factorizations are relational expressions ◮ We developed the FDB in-memory query engine for this purpose Constant-delay enumeration of represented tuples ◮ Tuple iteration as fast as listing them from equivalent flat relations 6 / 19

Outline What are Factorized Databases? Applications A Glimpse at Aggregating Factorized Data 7 / 19

Spot the Factorized Database! Traditional Relational Clustered Hierarchical Customer( CustomerId , …) Customer( CustomerId , …) Campaign( CampaignId , CustomerId, …) Campaign( CustomerId , CampaignId , …) Logical AdGroup( AdGroupId , CampaignId, …) AdGroup( CustomerId , CampaignId , AdGroupId , …) Schema Primary key includes Foreign key references only foreign keys that reference the parent record. all ancestor rows. Joining related data often requires reads Customer(1,...) spanning multiple machines. Campaign(1,3,...) Related data is clustered AdGroup (1,3,6,...) for fast common-case Customer(1,...) AdGroup(6,3,...) AdGroup (1,3,7,...) join processing. Physical Customer(2,...) AdGroup(7,3,...) Campaign(1,4,...) Layout AdGroup(8,4,...) AdGroup (1,4,8,...) AdGroup(9,5,...) Customer(2,...) Campaign(3,1,...) Physical data partition boundaries occur Campaign(2,5,...) Campaign(4,1,...) between root rows. AdGroup (2,5,9,...) Campaign(5,2,...) Figure 2: The logical and physical properties of data storage in a traditional normalized relational schema compared with a clustered hierarchical schema used in an F1 database. Excerpt from F1: A Distributed SQL Database That Scales . PVLDB’13. Google’s DB supporting their lucrative AdWords business Database factorization increases data locality for common access patterns ◮ Tables pre-joined using a nesting structure defined by key-fkey constraints Data partitioned across servers into factorization fragments

Spot the Factorized Database!                                                                                                                                                                                                                                                                                                                                      Figure 3: (a) In relational domains, design matrices X have large blocks of repeating patterns (example from Figure 2). (b) Repeating patterns in X can be formalized by a block notation (see section 2.3) which stems directly from the relational structure of the original data. Machine learning methods have to make use of repeating patterns in X to scale to large relational datasets. Excerpt from Scaling Factorization Machines to Relational Data. PVLDB’13. Feature vectors for predictive modelling represented as very large design matrices (= relations with high cardinality) Standard learning algorithms cannot scale on design matrix representation Use repeating patterns in the design matrix as key to scalability

Aggregation and Ordering in Factorized Databases B akibayev, K o y, - PowerPoint PPT Presentation

Aggregation and Ordering in Factorized Databases B akibayev, K o y, O lteanu, and Z cisk avodn y University of Oxford VLDB Sept 2, 2014 http://www.cs.ox.ac.uk/projects/FDB/ 1 / 19 Outline What are Factorized Databases?

Factorized groups and solubility Bernhard Amberg Universit at Mainz Malta, March 2018

Solving linear programs on factorized databases Nicolas Crosetti joint work with F. Capelli, J.

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Information Ordering Ling573 Systems & Applications April 20, 2017 Roadmap

CS5412: HOW MUCH ORDERING? Lecture XVI Ken Birman Ordering 2 The key to consistency turns

Variable & Value Ordering Heuristics Heuristics for backtracking algorithms Variable

Module 3: Creating and Managing Databases Overview Creating Databases Creating

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

New progress on factorized groups and subgroup permutability Paz Arroyo-Jord Instituto

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Multi-modal Factorized High-order Pooling for Visual Question Answering Team HDU-USYD-UNCC with

Products of groups which contain abelian subgroups of finite index Bernhard Amberg Universit

RAAGs in Ham Misha Kapovich UC Davis June 30, 2011 Motivation M is a compact surface, is

Implementation Report: Release of the System-centric Middleware Component for Universal Multicast

Part I (B) Find the hidden parameter! (Fixed parameter tractable problems) (C) Find an approximate

COARSE-TO-FINE, COST-SENSITIVE CLASSIFICATION OF E-MAIL Jay Pujara jay@cs.umd.edu Lise Getoor

Alvord Lake - Stanyan across from Haight St. William Alvord (1833 - 1904) - merchant, banker, 14th

CS 188: Artificial Intelligence Nave Bayes Instructors: Brijen Thananjeyan and Aditya Baradwaj

CS325 Artificial Intelligence Natural Language Processing I (Ch. 22) Dr. Cengiz Gnay, Emory

David Ham dolfin-adjoint: automating the adjoints of models written in the Python interface to

Aggregation and Ordering in Factorized Databases B akibayev, K o y, - PowerPoint PPT Presentation

Aggregation and Ordering in Factorized Databases B akibayev, K o y, O lteanu, and Z cisk avodn y University of Oxford VLDB Sept 2, 2014 http://www.cs.ox.ac.uk/projects/FDB/ 1 / 19 Outline What are Factorized Databases?

Factorized groups and solubility Bernhard Amberg Universit at Mainz Malta, March 2018

Solving linear programs on factorized databases Nicolas Crosetti joint work with F. Capelli, J.

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Information Ordering Ling573 Systems &amp; Applications April 20, 2017 Roadmap

CS5412: HOW MUCH ORDERING? Lecture XVI Ken Birman Ordering 2 The key to consistency turns

Variable &amp; Value Ordering Heuristics Heuristics for backtracking algorithms Variable

Module 3: Creating and Managing Databases Overview Creating Databases Creating

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

New progress on factorized groups and subgroup permutability Paz Arroyo-Jord Instituto

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Multi-modal Factorized High-order Pooling for Visual Question Answering Team HDU-USYD-UNCC with

Products of groups which contain abelian subgroups of finite index Bernhard Amberg Universit

RAAGs in Ham Misha Kapovich UC Davis June 30, 2011 Motivation M is a compact surface, is

Implementation Report: Release of the System-centric Middleware Component for Universal Multicast

Part I (B) Find the hidden parameter! (Fixed parameter tractable problems) (C) Find an approximate

COARSE-TO-FINE, COST-SENSITIVE CLASSIFICATION OF E-MAIL Jay Pujara jay@cs.umd.edu Lise Getoor

Alvord Lake - Stanyan across from Haight St. William Alvord (1833 - 1904) - merchant, banker, 14th

CS 188: Artificial Intelligence Nave Bayes Instructors: Brijen Thananjeyan and Aditya Baradwaj

CS325 Artificial Intelligence Natural Language Processing I (Ch. 22) Dr. Cengiz Gnay, Emory

David Ham dolfin-adjoint: automating the adjoints of models written in the Python interface to

Information Ordering Ling573 Systems & Applications April 20, 2017 Roadmap

Variable & Value Ordering Heuristics Heuristics for backtracking algorithms Variable