Advanced Data Modelling in PostgreSQL Chris Travers Adjust GmbH - - PowerPoint PPT Presentation

advanced data modelling in postgresql
SMART_READER_LITE
LIVE PREVIEW

Advanced Data Modelling in PostgreSQL Chris Travers Adjust GmbH - - PowerPoint PPT Presentation

Introduction Basics Derivations Extractions Types Inheritance Thank you Advanced Data Modelling in PostgreSQL Chris Travers Adjust GmbH May 16, 2020 Introduction Basics Derivations Extractions Types Inheritance Thank you About Me


slide-1
SLIDE 1

Introduction Basics Derivations Extractions Types Inheritance Thank you

Advanced Data Modelling in PostgreSQL

Chris Travers

Adjust GmbH

May 16, 2020

slide-2
SLIDE 2

Introduction Basics Derivations Extractions Types Inheritance Thank you

About Me

  • New contributor to PostgreSQL (one bugfix so far)
  • Heads the PostgreSQL-related R&D at Adjust GmbH
  • Long-time PostgreSQL user (since 1999)
  • Been around the community for a long time.
slide-3
SLIDE 3

Introduction Basics Derivations Extractions Types Inheritance Thank you

About Adjust

We are big PostgreSQL users. Over 10PB of data, with near-real-time analytics on 1PB of raw data and 400,000 inbound requests per second. We provide mobile advertisement attribution and analytics services to companies who buy advertising.

slide-4
SLIDE 4

Introduction Basics Derivations Extractions Types Inheritance Thank you

What We Will Cover

  • Derived Data Modelling
  • Custom Data Types
  • Data Extracted from Semi-Structured Data
  • Inheritance and Polymorphic Relationships
  • Multiple Inheritance and Overlapping Sets
slide-5
SLIDE 5

Introduction Basics Derivations Extractions Types Inheritance Thank you

Quick Refresher of RDBMS Model

  • Relation as a set of corresponding facts
  • Relation entry modelled as tuple
  • Functional dependencies important for inference derivation
slide-6
SLIDE 6

Introduction Basics Derivations Extractions Types Inheritance Thank you

RDBMSs as Inferential Systems

  • Functional dependencies imply functions
  • Inferences are derived via function composition
slide-7
SLIDE 7

Introduction Basics Derivations Extractions Types Inheritance Thank you

ORDBMS Goals

Example Query

SELECT * FROM photos p JOIN locations l ON distance(p.pt, l.pt) < ’100km’::distance WHERE is sunset(p.graphics)

  • Distance calculation?
  • Distance type?
  • Detecting sunsets?
slide-8
SLIDE 8

Introduction Basics Derivations Extractions Types Inheritance Thank you

What is needed?

  • Custom data types to normalize and compare
  • Custom calculations
  • Pluggable calculations based on arbitrary data
slide-9
SLIDE 9

Introduction Basics Derivations Extractions Types Inheritance Thank you

Calculated Inferences

  • All inferences are calculated.
  • Some inferences require traversing known facts
  • Other inferences can be calculated based on a single known

fact or a single relation entry. This section is about the last category.

slide-10
SLIDE 10

Introduction Basics Derivations Extractions Types Inheritance Thank you

Examples of tuple-derived Inferences

  • tsvectors for searching notes with a subject, author, and

contents

  • Area of a polygon
  • Bounding box of a polygon

Given x, we get one and exactly one y, and we can calculate it knowing only x.

slide-11
SLIDE 11

Introduction Basics Derivations Extractions Types Inheritance Thank you

Example: Searchable Notes

Table

create table note ( id serial primary key, subject text not null, author text not null, contents text not null, ... );

What we want

SELECT * from note WHERE plainto tsquery( ’Postgres is great’ ) @@ tsvector(note)

  • - or note.tsvector
slide-12
SLIDE 12

Introduction Basics Derivations Extractions Types Inheritance Thank you

The Function

Function

CREATE OR REPLACE FUNCTION tesvector(note) RETURNS TSVECTOR LANGUAGE SQL IMMUTABLE AS $$ SELECT to tsvector(array to string( array[$1.subject $1.contents], ’ ’ )); $$;

slide-13
SLIDE 13

Introduction Basics Derivations Extractions Types Inheritance Thank you

But there is more!

  • CREATE INDEX ON note using gin (tsvector(note));
  • We can change the function and reindex
  • Calculated on write, not on read
slide-14
SLIDE 14

Introduction Basics Derivations Extractions Types Inheritance Thank you

Fun with notation

Quick: Which of these does not work?

  • select tsvector(note) from note;
  • select subject(note) from note;
  • select note.subject, note.tsvector from note;
  • select tsvector from note;

Some of this is due to Postgres’s heritage before SQL.

slide-15
SLIDE 15

Introduction Basics Derivations Extractions Types Inheritance Thank you

Exercise

Exercise 1 https: //www.github.com/einhverfr/admcourse/exercises/1 (10 min to complete)

slide-16
SLIDE 16

Introduction Basics Derivations Extractions Types Inheritance Thank you

Particular Example: Extracting Data from Structured Text

slide-17
SLIDE 17

Introduction Basics Derivations Extractions Types Inheritance Thank you

Desired SQL

We want to pull protein documents that are patents and sort based

  • n patent date:

Something like this

select * from published doc d WHERE d.protein = 123 AND d.is patent ORDER BY d.patent date asc LIMIT 100

slide-18
SLIDE 18

Introduction Basics Derivations Extractions Types Inheritance Thank you

Example Document

Problem is, this information is the published docs text field.

Example

DB patentd REF 123ZZ12433 REGD 19750305 .... Here the important data is stored in a structured text document along with different documents of different structures.

slide-19
SLIDE 19

Introduction Basics Derivations Extractions Types Inheritance Thank you

Example is patent function

We want to make sure this is only called on these documents so we create a function which takes in a row of the appropriate type:

is patent

create or replace function is patent(published doc) returns bool strict immutable language sql as $$ select $1.document like ’%DB patentd%’ $$;

  • Simple. Part 1 solved.
slide-20
SLIDE 20

Introduction Basics Derivations Extractions Types Inheritance Thank you

Part 2: date extraction

patent date function

create or replace function patent date(published doc) returns date strict immutable language plperl as $$ use strict; my @lines = split /\n/, $ [1]; my ($regd) = grep /ˆREGD \d{4}\d{2}\d{2}/ @lines; return unless $regd; $regd =˜ /ˆREGD \d{4}\d{2}\d{2}/; return ”$1-$2-$3”; $$; create index doc patent date on published doc(patent date(published doc));

slide-21
SLIDE 21

Introduction Basics Derivations Extractions Types Inheritance Thank you

Exercise 2

Exercise 2 https: //www.github.com/einhverfr/admcourse/exercises/2 (10 min to complete)

slide-22
SLIDE 22

Introduction Basics Derivations Extractions Types Inheritance Thank you

Approach here

  • Composite types are used for illustration purposes
  • Most types should be implemented in C for heavier use
  • Indexing is not covered.
slide-23
SLIDE 23

Introduction Basics Derivations Extractions Types Inheritance Thank you

Use case 1: Interfaces

Use Cases:

  • Fixed-structure complex field in table
  • N1NF modelling (arrays of tuples in a field)
  • Way of passing data into or out of a user defined function
slide-24
SLIDE 24

Introduction Basics Derivations Extractions Types Inheritance Thank you

Example for Contact/tagged text

Type definition

create type full customer as ( customer customer, contacts contact item[], addresses address[] ); Each field is a tuple or an array of tuples.

slide-25
SLIDE 25

Introduction Basics Derivations Extractions Types Inheritance Thank you

Example function

Save Function

create or replace function full customer save(full customer in c) returns void language sql strict as $$ insert into customer select ((in c).customer).*; insert into contact item select u.* from unnest(contacts(in c)) u; insert into address select u.* from unnest(addresses(in c)) u; $$;

slide-26
SLIDE 26

Introduction Basics Derivations Extractions Types Inheritance Thank you

Exercise 3

Exercise 3 https: //www.github.com/einhverfr/admcourse/exercises/3 (10 min to complete)

slide-27
SLIDE 27

Introduction Basics Derivations Extractions Types Inheritance Thank you

Use Case 2: Tables as Types

  • Convert a table into another type
  • Arithmetic on rows
  • Aggregates on rows
slide-28
SLIDE 28

Introduction Basics Derivations Extractions Types Inheritance Thank you

Example: work shift

Table Structure

CREATE TABLE work shift ( id serial primary key, employee id int references employee(id), shift start timestamp not null, shift end timestamp not null, ... );

slide-29
SLIDE 29

Introduction Basics Derivations Extractions Types Inheritance Thank you

Example Derivation Function

CREATE OR REPLACE FUNCTION date(work shift)

RETURNS date language sql immutable as $$ select $1.shift start::date; $$;

CREATE OR REPLACE FUNCTION timestamp(work shift)

RETURNS interval language sql immutable as $$ select $1.shift end - $1.shift start; $$;

CREATE OR REPLACE FUNCTION add intervals(work shift, work shift)

RETURNS interval language sql immutable as $$ select interval($1) + interval($2)$$;

slide-30
SLIDE 30

Introduction Basics Derivations Extractions Types Inheritance Thank you

Example Aggregate

CREATE OR REPLACE AGGREGATE total interval(work shift) as (

sfunc = add intervals, stype = interval, initialcond = ’0 days’ );

slide-31
SLIDE 31

Introduction Basics Derivations Extractions Types Inheritance Thank you

Exercise 4

Exercise 4 https: //www.github.com/einhverfr/admcourse/exercises/4 (10 min to complete)

slide-32
SLIDE 32

Introduction Basics Derivations Extractions Types Inheritance Thank you

The Polymorphic Relationship Problem

slide-33
SLIDE 33

Introduction Basics Derivations Extractions Types Inheritance Thank you

Common example: Notes

  • notes in an accounting system
  • invoices, customers, vendors, journal entries, etc
  • Each of these have their own primary key spaces
slide-34
SLIDE 34

Introduction Basics Derivations Extractions Types Inheritance Thank you

What Inheritance Gives You

  • a collection for all of the collection entries
  • referential integrity enforcement
  • ability to differentiate on polymorphic relationship
  • Consistent point of management
  • casts for common functions like tsvector conversions etc
slide-35
SLIDE 35

Introduction Basics Derivations Extractions Types Inheritance Thank you

Gotchas

  • Multiple inheritance causes schema management challenges
  • Casting can happen at odd points leading to index strangeness
slide-36
SLIDE 36

Introduction Basics Derivations Extractions Types Inheritance Thank you

Exercise 5

Exercise 5 https: //www.github.com/einhverfr/admcourse/exercises/5 (10 min to complete)

slide-37
SLIDE 37

Introduction Basics Derivations Extractions Types Inheritance Thank you

Thank you

If you have comments or questions, feel free to reach out at chris.travers@gmail.com