R ECURSION is fundamental in computer science. Most data Y?, how - PDF document

264 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 2, FEBRUARY 2010 Optimization of Linear Recursive Queries in SQL Carlos Ordonez Abstract —Recursion is a fundamental computation mechanism which has been incorporated into the SQL language. This work focuses on the optimization of linear recursive queries in SQL. Query optimization is studied with two important graph problems: computing the transitive closure of a graph and getting the power matrix of its adjacency matrix. We present SQL implementations for two fundamental algorithms: Seminaive and Direct. Five query optimizations are studied: 1) Storage and indexing; 2) early selection; 3) early evaluation of nonrecursive joins; 4) pushing duplicate elimination; and 5) pushing aggregation. Experiments compare both evaluation algorithms and systematically evaluate the impact of optimizations with large input tables. Optimizations are evaluated on four types of graphs: binary trees, lists, cyclic graphs, and complete graphs, going from the best to worst case. In general, Seminaive is faster than Direct, except for complete graphs. Storing and indexing rows by vertex and pushing aggregation work well on trees, lists, and cyclic graphs. Pushing duplicate elimination is essential for complete graphs, but slows computation for acyclic graphs. Early selection with equality predicates significantly accelerates computation for all types of graphs. Index Terms —Recursive query, SQL, query optimization, transitive closure. Ç 1 I NTRODUCTION R ECURSION is fundamental in computer science. Most data Y?,” “how many different routes are there between X and Y?,” “what is the distance between X and Y ?,” or “what structures, like trees or lists, are recursive. Most importantly, many search algorithms have a natural locations cannot be reached from X?”. recursive definition. Despite its prominent importance, Although recursive query optimization has been exten- recursion was not available in SQL for a long time. But sively studied in the past, mostly in deductive databases [3], the ANSI ’99 SQL standard introduced recursion into SQL [9], [19], [20], [24] and to a lesser extent in relational with syntactic constructs to define recursive views and database systems [2], [5], [6], [11], [13], there is no recent recursive-derived tables. This paper studies the optimiza- work that studies the optimization of linear recursive tion of linear recursive queries [24], [26] in SQL, which queries in SQL. Most research has proposed complex constitute a broad class of queries used in practice [2], [3], algorithms using sophisticated data structures. Instead, [17], [19]. Typical problems solved by linear recursive our work studies how to optimize recursive queries with queries include parent/child relationships, path computa- existing storage organization and indexing mechanisms and tions in a graph, and bill of materials. Linear recursive relational algebra transformations. Thus, this paper revisits queries have many applications in relational databases. the classical problem of optimizing linear recursive queries, Consider an ancestor/descendant example, with a table but focusing on SQL. containing employee/manager information with the em- This is a summary of contributions. We present ployee id of the manager. Examples are “who are all the implementations in SQL of two classical algorithms to employees that are managed directly or indirectly by evaluate linear recursive queries: Seminaive [3] and Direct person X?” or “is person X under person Y in the [2]. We study the optimization of selection-projection-join- organization?”. Suppose we have a table relating pairs of aggregation (SPJA) queries, following traditional query parts in a manufacturing environment where one column optimization principles [7]. SPJA queries represent the most identifies one part and the second column corresponds to a common and general queries in a relational DBMS. subpart in a hierarchical fashion; this is the so-called bill of Specifically, five query optimizations are studied: materials example. Examples are “list all subparts of part X” and “how many subparts does part X have two levels storage and indexing of input, intermediate, and 1. below?”. Assume that there is a geographical table with result tables; locations where each row indicates that there exists a road 2. early selection of rows by pushing predicates; (with distance as an attribute) between two locations. 3. early or late evaluation of nonrecursive (external) Examples are “which is the shortest path between X and joins; 4. pushing duplicate and cycle elimination into intermediate recursive steps; and . The author is with the Department of Computer Science, University of 5. pushing aggregation through recursion. Houston, Houston, TX 77204. E-mail: ordonez@cs.uh.edu. We also study how to improve the performance when there Manuscript received 13 Mar. 2008; revised 7 Nov. 2008; accepted 16 Mar. 2009; published online 31 Mar. 2009. are deep recursion levels, many duplicate rows and cycles. Recommended for acceptance by D. Papadias. We perform a systematic experimental evaluation with For information on obtaining reprints of this article, please send e-mail to: large tables storing graphs with different structure and tkde@computer.org, and reference IEEECS Log Number TKDE-2008-03-0142. levels of connectivity. Digital Object Identifier no. 10.1109/TKDE.2009.83. 1041-4347/10/$26.00 � 2010 IEEE Published by the IEEE Computer Society Authorized licensed use limited to: University of Houston. Downloaded on January 11, 2010 at 18:00 from IEEE Xplore. Restrictions apply.

R ECURSION is fundamental in computer science. Most data Y?, how - PDF document

264 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 2, FEBRUARY 2010 Optimization of Linear Recursive Queries in SQL Carlos Ordonez Abstract Recursion is a fundamental computation mechanism which has been incorporated

Complex-analytic and other properties of the generalized hypergeometric functions and their ratios

Reasoning over Permissions Regions in Concurrent Separation Logic James Brotherston, Diana Costa,

Geometric Numerical Integration of Differential Equations Ernst Hairer Universit e de Gen`

Econ 551 Government Finance: Revenues Fall 2019 Given by Kevin Milligan Vancouver School of

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Automata-theoretic analysis of hybrid systems Madhavan Mukund SPIC Mathematical Institute 92, G

Goals of Program Optimization (1 of 2) Goal: Improve program performance within some constraints

Model Checking Continuous-Time Markov Chains Joost-Pieter Katoen Software Modeling and

Lecture 1: Lvy processes A. E. Kyprianou Department of Mathematical Sciences, University of

ComputerVision October 30, 2018 1 Lecture 20: Introduction to Computer Vision CBIO (CSCI)

Outline Introduction Factor of 2 change in Q 0 from evaluation at VTA to placement in

C Programming 2 2 Lab Schedule Activities Deadlines This Week Lab 4 Due by Feb 19 th

Workshop on Theoretical Morphology, University of Leipzig, June 20-21 2008 Defectiveness and

SM2-TES: Functional Programming and Property-Based Testing Jan Midtgaard MMMI, SDU Introduction

Regulations, Constraints, and Political Processes of Energy I nfrastructure (December 5 Session)

Artificial intelligence and Jobs Andrea Renda 23 January 2020 Ge Genera rative Advers

PERTINENT FACTS ABOUT THE FOREST SURVEY "What is the Forest Survey? Edward C. Crafts, Chief,

Company Taxation in New Zealand Matt Benge and David Holland Tax Policy Conference 2 0 0 9 New

Second quarter 2018 results Delivering a world-class investment case Royal Dutch Shell plc July

Department of Political Science Ekiti State University, Ado Ekiti, PMB 5363, Ado-Ekiti, Nigeria

Projection: Getting only what you need IN TRODUCTION TO MON GODB IN P YTH ON Donny Winston

AA-AA

to Global Policymaking: with a focus on the Sustainable Development Goals InterAcademy

Courses for Professionals Orientation Program Melbourne Graduate School of Education 1

R ECURSION is fundamental in computer science. Most data Y?, how - PDF document

264 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 2, FEBRUARY 2010 Optimization of Linear Recursive Queries in SQL Carlos Ordonez Abstract Recursion is a fundamental computation mechanism which has been incorporated

Complex-analytic and other properties of the generalized hypergeometric functions and their ratios

Reasoning over Permissions Regions in Concurrent Separation Logic James Brotherston, Diana Costa,

Geometric Numerical Integration of Differential Equations Ernst Hairer Universit e de Gen`

Econ 551 Government Finance: Revenues Fall 2019 Given by Kevin Milligan Vancouver School of

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Automata-theoretic analysis of hybrid systems Madhavan Mukund SPIC Mathematical Institute 92, G

Goals of Program Optimization (1 of 2) Goal: Improve program performance within some constraints

Model Checking Continuous-Time Markov Chains Joost-Pieter Katoen Software Modeling and

Lecture 1: Lvy processes A. E. Kyprianou Department of Mathematical Sciences, University of

ComputerVision October 30, 2018 1 Lecture 20: Introduction to Computer Vision CBIO (CSCI)

Outline Introduction Factor of 2 change in Q 0 from evaluation at VTA to placement in

C Programming 2 2 Lab Schedule Activities Deadlines This Week Lab 4 Due by Feb 19 th

Workshop on Theoretical Morphology, University of Leipzig, June 20-21 2008 Defectiveness and

SM2-TES: Functional Programming and Property-Based Testing Jan Midtgaard MMMI, SDU Introduction

Regulations, Constraints, and Political Processes of Energy I nfrastructure (December 5 Session)

Artificial intelligence and Jobs Andrea Renda 23 January 2020 Ge Genera rative Advers

PERTINENT FACTS ABOUT THE FOREST SURVEY &quot;What is the Forest Survey? Edward C. Crafts, Chief,

Company Taxation in New Zealand Matt Benge and David Holland Tax Policy Conference 2 0 0 9 New

Second quarter 2018 results Delivering a world-class investment case Royal Dutch Shell plc July

Department of Political Science Ekiti State University, Ado Ekiti, PMB 5363, Ado-Ekiti, Nigeria

Projection: Getting only what you need IN TRODUCTION TO MON GODB IN P YTH ON Donny Winston

AA-AA

to Global Policymaking: with a focus on the Sustainable Development Goals InterAcademy

Courses for Professionals Orientation Program Melbourne Graduate School of Education 1

PERTINENT FACTS ABOUT THE FOREST SURVEY "What is the Forest Survey? Edward C. Crafts, Chief,