Constant delay algorithms for regular document spanners Fernando - PowerPoint PPT Presentation

Constant delay algorithms for regular document spanners Fernando Florenzano Cristian Riveros Domagoj Vrgoˇ c From PUC Chile Mart´ ın Ugarte Stijn Vansummeren From Universit´ e Libre de Bruxelles

Rule-based information extraction by example 18:30 ERROR 06 “Extract all pairs (time,id) 19:10 OK 00 of ERROR events” 20:00 ERROR 19 y y x x : : : 1 1 8 3 3 4 0 6 E 7 R 8 R 9 O 10 R 11 12 0 13 6 ↱ 15 1 16 9 18 1 19 0 20 21 O 22 K 23 24 0 25 0 ↱ 27 2 28 0 30 0 31 0 32 33 E 34 R 35 R 36 O 37 R 38 39 1 40 9 2 5 14 17 26 29 41 Rule: RGX formula Output: mappings Σ ∗ ⋅ x { δδ ∶ δδ } ⋅ x y ⋅ y { δδ } ⋅ Σ ∗ ERROR [ 1 , 6 ⟩ [ 13 , 15 ⟩ δ = ( 0 + 1 + . . . + 9 ) [ 28 , 33 ⟩ [ 40 , 42 ⟩

Rule-based information extraction by example Evaluation of rules in information extraction. Problem: Input: RGX formula R and document d . Enumerate all mappings of d that satisfy R . Output: : : : 1 1 8 3 3 4 0 6 E 7 R 8 R 9 O 10 R 11 12 0 13 6 ↱ 15 1 16 9 18 1 19 0 20 21 O 22 K 23 24 0 25 0 ↱ 27 2 28 0 30 0 31 0 32 33 E 34 R 35 R 36 O 37 R 38 39 1 40 9 2 5 14 17 26 29 41 Output: mappings Rule: RGX formula Σ ∗ ⋅ x { δδ ∶ δδ } ⋅ x y ⋅ y { δδ } ⋅ Σ ∗ ERROR [ 1 , 6 ⟩ [ 13 , 15 ⟩ δ = ( 0 + 1 + . . . + 9 ) [ 28 , 33 ⟩ [ 40 , 42 ⟩

Unfortunately, the output can easily become exponential : : : 1 1 8 3 3 4 0 6 E 7 R 8 R 9 O 10 R 11 12 0 13 6 ↱ 15 1 16 9 18 1 19 0 20 21 O 22 K 23 24 0 25 0 27 2 ↱ 28 0 30 0 31 0 32 33 E 34 R 35 R 36 O 37 R 38 39 1 40 9 2 5 14 17 26 29 41 Output: mappings Rule: RGX formula Σ ∗ ⋅ x 1 { δδ } ⋅ Σ ∗ ⋅ x 2 { δδ } ⋅ Σ ∗ x 1 x 2 [ 1 , 3 ⟩ [ 4 , 6 ⟩ δ = ( 0 + 1 + . . . + 9 ) [ 1 , 3 ⟩ [ 13 , 15 ⟩ ⋮ ⋮ [ 1 , 3 ⟩ [ 40 , 42 ⟩ Θ (∣ d ∣ 2 ) [ 4 , 6 ⟩ [ 13 , 15 ⟩ [ 4 , 6 ⟩ [ 16 , 18 ⟩ ⋮ ⋮ In general, a RGX formula with k variables can have an output of size Θ (∣ d ∣ k ) .

Constant delay algorithms to the rescue Definition Given a RGX rule R and a document d , a constant delay algorithm is a two-phase enumeration algorithm: 1. Preprocessing phase: linear in ∣ d ∣ and, hopefully, linear in ∣ R ∣ . 2. Enumeration phase: constant time between two consecutive outputs. Can we have an efficient constant delay algorithm for RGX formulas?

In this paper, we propose a constant delay algorithm for variable-set automata Specifically, our contributions are: 1. We study the class of extended and deterministic variable-set automata. 2. We give a simple constant delay algorithm for deterministic functional extended variable-set automata. 3. We extend this algorithm for the full class of variable-set automata and spanner algebra. 4. We study the complexity of counting the number of output mappings. In this talk: only the main ideas of the constant delay algorithm.

Outline Variable-set automata and their variants The constant delay algorithm

Variable-set automata (VA) a a 1 y ⊢ x ⊢ ⊣ y ⊣ x a b 0 3 4 5 6 7 ⊣ y ⊣ x y ⊢ x ⊢ 2 a a 2 b document : 1 3

Variable-set automata (VA) a a 1 y ⊢ x ⊢ ⊣ y ⊣ x a b 0 3 4 5 6 7 ⊣ y ⊣ x y ⊢ x ⊢ 2 a a 2 b document : 1 3 y ⊢ ⊣ y x ⊢ a a ⊣ x b a a 2 b 0 1 3 3 4 5 6 7 1 3 x = [ 1 , 3 ⟩ , y = [ 1 , 4 ⟩

Variable-set automata (VA) a a 1 y ⊢ x ⊢ ⊣ y ⊣ x a b 0 3 4 5 6 7 ⊣ y ⊣ x y ⊢ x ⊢ 2 a a 2 b document : 1 3 y ⊢ ⊣ y x ⊢ a a ⊣ x b a a 2 b 0 1 3 3 4 5 6 7 1 3 x = [ 1 , 3 ⟩ , y = [ 1 , 4 ⟩ y ⊢ ⊣ y x ⊢ a a b ⊣ x a a 2 b 0 2 3 4 4 5 6 7 1 3 x = [ 1 , 4 ⟩ , y = [ 1 , 3 ⟩

Variable-set automata (VA) a a 1 y ⊢ x ⊢ ⊣ y ⊣ x a b 0 3 4 5 6 7 ⊣ y ⊣ x y ⊢ x ⊢ 2 a a 2 b document : 1 3 Theorem (Freydenberger17,MRV18) The evaluation problem of variable-set automata is NP -complete. How do we restrict VA to have constant delay algorithms?

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA a a 1 y ⊢ x ⊢ ⊣ y ⊣ x a b 0 3 4 5 6 7 ⊣ y ⊣ x y ⊢ x ⊢ 2 Problem: A VA can have accepting runs that are NOT valid.

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA a a 1 y ⊢ x ⊢ ⊣ y ⊣ x a b 0 3 4 5 6 7 ⊣ y ⊣ x y ⊢ x ⊢ 2 Problem: A VA can have accepting runs that are NOT valid. Example of an accepting run that is not valid y ⊢ x ⊢ a a ⊣ x b ⊣ x 0 1 3 3 4 5 6 7

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA a a 1 y ⊢ x ⊢ ⊣ y ⊣ x a b 0 3 4 5 6 7 ⊣ y ⊣ x y ⊢ x ⊢ 2 Definition: functional VA A VA is functional if every accepting run is a valid run.

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA b a a 1 5 6 y ⊢ ⊣ y x ⊢ ⊣ x a 0 3 4 7 y ⊢ ⊣ y x ⊢ b ⊣ x 2 5’ 6’ Definition: functional VA A VA is functional if every accepting run is a valid run. Theorem (FKRV15) Every VA is equivalent to a functional VA of at most exponential size.

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA b a a 1 5 6 y ⊢ ⊣ y x ⊢ ⊣ x a 0 3 4 7 y ⊢ ⊣ y x ⊢ b ⊣ x 2 5’ 6’

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA b a a 1 5 6 y ⊢ ⊣ y x ⊢ ⊣ x a 0 3 4 7 y ⊢ ⊣ y b ⊣ x x ⊢ 2 5’ 6’ Problem: VA can use several paths of variables for the same extraction of spans.

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA b a a 1 5 6 y ⊢ ⊣ y x ⊢ ⊣ x a 0 3 4 7 y ⊢ ⊣ y b ⊣ x x ⊢ 2 5’ 6’ Definition: extended VA An extended VA uses transitions extended with sets of variables such that between each pair of letters at most one of these transitions are used.

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA b a a {⊣ x } 5 6 {⊣ y } { x ⊢ , y ⊢} a 0 3 4 7 b {⊣ x } {⊣ y } 5’ 6’ Definition: extended VA An extended VA uses transitions extended with sets of variables such that between each pair of letters at most one of these transitions are used. Theorem Every VA is equivalent to an extended VA of at most exponential size.

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA b a a {⊣ x } 5 6 {⊣ y } { x ⊢ , y ⊢} a 0 3 4 7 b {⊣ x } {⊣ y } 5’ 6’ Problem : A VA can have several runs that witness the same output. Example of several runs with the same input/output { x ⊢ , y ⊢} {⊣ x } {⊣ y } a a b 0 3 3 4 5 6 7 { x ⊢ , y ⊢} {⊣ x } {⊣ y } a a b 0 3 4 4 5 6 7

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA b a a {⊣ x } 5 6 {⊣ y } { x ⊢ , y ⊢} a 0 3 4 7 b {⊣ x } {⊣ y } 5’ 6’ Definition: deterministic (Input/Output) VA An extended VA is deterministic if the transition relation is a function .

Problematic behaviors of VA and their classes 1. Functional VA 2. Extended VA 3. Deterministic VA b a {⊣ x } 5 6 {⊣ y } { x ⊢ , y ⊢} a 0 3 4 7 b {⊣ x } {⊣ y } 5’ 6’ Definition: deterministic (Input/Output) VA An extended VA is deterministic if the transition relation is a function . Theorem Every extended VA is equivalent to a deterministic extended VA of at most exponential size.

Outline Variable-set automata and their variants The constant delay algorithm

The constant delay algorithm for extended VA Given an deterministic and functional extended VA A = ( Q , q 0 , F , δ ) . procedure Evaluate ( A , a 1 . . . a n ) procedure Capturing ( i ) for all q ∈ Q / { q 0 } do for all q ∈ Q do list old list q ← ǫ ← list q . l azycopy q for all q ∈ Q with list old list q 0 ← [ � ] ≠ ǫ do q for i ∶ = 1 to n do for all S ∈ Markers δ ( q ) do node ← N ode (( S , i ) , list old Capturing ( i ) q ) Reading ( i ) p ← δ ( q , S ) list p . a dd ( node ) Capturing ( n + 1 ) Enumerate ({ list q } q ∈ Q , F ) procedure Reading ( i ) for all q ∈ Q do list old ← list q q list q ← ǫ for all q ∈ Q with list old ≠ ǫ do q p ← δ ( q , a i ) list p . a ppend ( list old q )

Sketch idea of the constant delay algorithm in 3 steps Given an deterministic and functional extended VA A = ( Q , q 0 , F , δ ) . 1. Convert the document d into a deterministic extended VA A d . a a 2 b document d : 1 3 . . . . . . . . . { x ⊢} { y ⊢} a a b VA A d : d 1 d 2 d 3 d 4 . . . {⊣ x , y ⊢}

Sketch idea of the constant delay algorithm in 3 steps Given an deterministic and functional extended VA A = ( Q , q 0 , F , δ ) . 1. Convert the document d into a deterministic extended VA A d . 2. Build the product between A and A d , and annotate the variable transitions with the position of d where they take place.

Constant delay algorithms for regular document spanners Fernando - PowerPoint PPT Presentation

Constant delay algorithms for regular document spanners Fernando Florenzano Cristian Riveros Domagoj Vrgo c From PUC Chile Mart n Ugarte Stijn Vansummeren From Universit e Libre de Bruxelles Rule-based information extraction by

Interconnect Gate delay Wire delay The delay in VLSI circuits have two components Gate delay (

Camden Unweighted undirected k-spanners Peleg and Ullman 1987 Input: An undirected graph

P Packet Scheduling: k S h d li E d t End-to-End Delay Bounds E d D l B d Delay bounds

RC delay 4: The Elmore delay - 3 Application of the Elmore delay formula to a (RC) wire. Let R

Geometric Algorithms Well-Separated Pair Decomposition & Spanners Motivation Connect a set

Query Evaluation With Constant Delay Wojciech Kazana INRIA Saclay, ENS de Cachan PhD Thesis

Regular Expressions A regular expression describes a language using three operations. Regular

Objectives You should be able to ... Regular Languages Use the syntax of regular expressions

Non-constant Non-constant growth model growth model You are calculating the intrinsic value of

Fibre Delay Line: Fibre Delay Line: FDL Principle drawing Principle drawing The length of

Outline Introduction Delay Test Issues Our Solutions Improved Launch Delay

Delay and Disruption Tolerant Networks An Overview NASA through the Delay Tolerant Network

VERIFIABLE DELAY FUNCTIONS Benjamin Wesolowski VERIFIABLE DELAY FUNCTIONS How to slow things

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

11 Introduction Introduction M/M/1 Queueing delay (revisited) R=link bandwidth (bps)

Gate%Delay Transistors%within%a%gate%require%finite%amount%of% time%to%switch%%

Key rollover @RIPE NCC draft-ietf-sidr-res-certs-18#section-8 draft-huston-sidr-keyroll-00.txt

Electroweak Phase Transition and Sphaleron Eibun Senaha (Natl Taiwan U) April 7, 2017 ACFI

Predictions for the neutrino parameters in the minimal model extended by general lepton flavor

2. Test results of LOCs1, a 5 Gbps 16:1 serializer. 3. Test results of the LCPLL, a 5 GHz

LHC optics measurement & correction procedures M. Aiba, R. Calaga, A. Morita, R. Toms &

Three-qubit quantum error correction with superconducting circuits Matt Reed Yale University

Using the Linear Sigma Model with quarks to describe the QCD phase diagram and to locate the

FIT-R2lab An open testbed for reproducible wireless networking research Walid Dabbous Inria

Constant delay algorithms for regular document spanners Fernando - PowerPoint PPT Presentation

Constant delay algorithms for regular document spanners Fernando Florenzano Cristian Riveros Domagoj Vrgo c From PUC Chile Mart n Ugarte Stijn Vansummeren From Universit e Libre de Bruxelles Rule-based information extraction by

Interconnect Gate delay Wire delay The delay in VLSI circuits have two components Gate delay (

Camden Unweighted undirected k-spanners Peleg and Ullman 1987 Input: An undirected graph

P Packet Scheduling: k S h d li E d t End-to-End Delay Bounds E d D l B d Delay bounds

RC delay 4: The Elmore delay - 3 Application of the Elmore delay formula to a (RC) wire. Let R

Geometric Algorithms Well-Separated Pair Decomposition &amp; Spanners Motivation Connect a set

Query Evaluation With Constant Delay Wojciech Kazana INRIA Saclay, ENS de Cachan PhD Thesis

Regular Expressions A regular expression describes a language using three operations. Regular

Objectives You should be able to ... Regular Languages Use the syntax of regular expressions

Non-constant Non-constant growth model growth model You are calculating the intrinsic value of

Fibre Delay Line: Fibre Delay Line: FDL Principle drawing Principle drawing The length of

Outline Introduction Delay Test Issues Our Solutions Improved Launch Delay

Delay and Disruption Tolerant Networks An Overview NASA through the Delay Tolerant Network

VERIFIABLE DELAY FUNCTIONS Benjamin Wesolowski VERIFIABLE DELAY FUNCTIONS How to slow things

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

11 Introduction Introduction M/M/1 Queueing delay (revisited) R=link bandwidth (bps)

Gate%Delay Transistors%within%a%gate%require%finite%amount%of% time%to%switch%%

Key rollover @RIPE NCC draft-ietf-sidr-res-certs-18#section-8 draft-huston-sidr-keyroll-00.txt

Electroweak Phase Transition and Sphaleron Eibun Senaha (Natl Taiwan U) April 7, 2017 ACFI

Predictions for the neutrino parameters in the minimal model extended by general lepton flavor

2. Test results of LOCs1, a 5 Gbps 16:1 serializer. 3. Test results of the LCPLL, a 5 GHz

LHC optics measurement &amp; correction procedures M. Aiba, R. Calaga, A. Morita, R. Toms &amp;

Three-qubit quantum error correction with superconducting circuits Matt Reed Yale University

Using the Linear Sigma Model with quarks to describe the QCD phase diagram and to locate the

FIT-R2lab An open testbed for reproducible wireless networking research Walid Dabbous Inria

Geometric Algorithms Well-Separated Pair Decomposition & Spanners Motivation Connect a set

LHC optics measurement & correction procedures M. Aiba, R. Calaga, A. Morita, R. Toms &