Towards Efficient String Processing of Annotated Events David Woods - - PowerPoint PPT Presentation

towards efficient string processing of annotated events
SMART_READER_LITE
LIVE PREVIEW

Towards Efficient String Processing of Annotated Events David Woods - - PowerPoint PPT Presentation

Towards Efficient String Processingof Annotated Events Towards Efficient String Processing of Annotated Events David Woods 1 Tim Fernando 2 Carl Vogel 2 1 ADAPT Centre Trinity College Dublin, Ireland 2 Computational Linguistics Group Trinity


slide-1
SLIDE 1

Towards Efficient String Processingof Annotated Events

Towards Efficient String Processing

  • f Annotated Events

David Woods1 Tim Fernando2 Carl Vogel2

1ADAPT Centre

Trinity College Dublin, Ireland

2Computational Linguistics Group

Trinity Centre for Computing and Language Studies School of Computer Science Trinity College Dublin, Ireland

ISA-13, 2017

slide-2
SLIDE 2

Towards Efficient String Processingof Annotated Events Motivation ISO-TimeML

ISO-TimeML Fragment

slide-3
SLIDE 3

Towards Efficient String Processingof Annotated Events Motivation TLINKs

TLINKs in an ISO-TimeML Document

slide-4
SLIDE 4

Towards Efficient String Processingof Annotated Events Motivation TLINKs

Examples

1 <TLINK relType="IS INCLUDED"

eventInstanceId="ei1" relatedToTime="t1"/>

2 <TLINK relType="IS INCLUDED" timeID="t1"

relatedToEventInstance="ei9"/>

3 <TLINK relType="BEFORE" eventInstanceID="ei9"

relatedToEventInstance="ei10"/>

slide-5
SLIDE 5

Towards Efficient String Processingof Annotated Events Motivation Allen Relations

Allen Relations

Allen (1983, p835, Fig. 2)

slide-6
SLIDE 6

Towards Efficient String Processingof Annotated Events Motivation Allen Relations

Example

Example “John slept through the fire alarm last Tuesday.” This sentence gives us two events, and one time period:

1 js =“John slept” (event) 2 fa =“a fire alarm occurred” (event) 3 lt =“last Tuesday” (time period)

We can represent the information with the binary Allen Relations: js di fa js d lt

slide-7
SLIDE 7

Towards Efficient String Processingof Annotated Events Introduction Strings as Models

Strings as Models

We can use strings as models to effectively represent this event data. Example “John slept through the fire alarm last Tuesday.” lt lt, js lt, js, fa lt, js lt

slide-8
SLIDE 8

Towards Efficient String Processingof Annotated Events Introduction Sets as Symbols

Sets as Symbols

Fix a finite set A of fluents. Fluents will be understood as naming an event instance (or time) in ISO-TimeML. We encode finite sets of these fluents as symbols, which may appear in a string.

slide-9
SLIDE 9

Towards Efficient String Processingof Annotated Events Introduction Event-Strings

Event-Strings

A string s = α1 · · · αn of subsets αi of A can be construed as a finite model consisting of n moments of time i ∈ {1, . . . , n}. Each αi specifies all fluents in A that hold simultaneously at i. Each αi is understood to occur chronologically before αj if and only if i < j. The powerset 2A of A will serve as an alphabet Σ = 2A of an event-string s ∈ Σ+.

slide-10
SLIDE 10

Towards Efficient String Processingof Annotated Events Introduction No Time Without Change

No Time Without Change

“But neither does time exist without change” – Aristotle, Physics IV

slide-11
SLIDE 11

Towards Efficient String Processingof Annotated Events Introduction No Time Without Change

No Time Without Change

The precise real-time duration of each symbol is disregarded (for now). Event-strings model a kind of inertial world. Change is the only marker of progression from one moment to the next.

slide-12
SLIDE 12

Towards Efficient String Processingof Annotated Events Superposition and Block Compression Superposition

Superposition

In order to usefully collect information from multiple strings into a single string, we define the operation of superposition: Definition With two strings s and s′ of equal length, their superposition, s & s′, is their componentwise union: α1 · · · αn & α′

1 · · · α′ n := (α1 ∪ α′ 1) · · · (αn ∪ α′ n)

slide-13
SLIDE 13

Towards Efficient String Processingof Annotated Events Superposition and Block Compression Superposition

Box Notation

For convenience of notation, we draw boxes rather than curly braces { } to represent sets of fluents in an event-string. Example With a, b, c, d ∈ A: a c & b d = a, b c, d

slide-14
SLIDE 14

Towards Efficient String Processingof Annotated Events Superposition and Block Compression String Manipulation

Stutter

We can cause a string s = α1 · · · αn to stutter such that αi = αi+1 for some integer 0 < i < n. For example, a a a c c is a stuttering version of a c . Since the realtime duration of each box is not taken into account, the interpretation of the string is unaffected.

slide-15
SLIDE 15

Towards Efficient String Processingof Annotated Events Superposition and Block Compression String Manipulation

Block Compression

We can transform a stuttering string to a stutterless string through block compression: Definition b c(s) :=    s if length(s) ≤ 1 b c(αs′) if s = ααs′ α b c(α′s′) if s = αα′s′ with α = α′ Thus, b c( a a a c c ) = a c .

slide-16
SLIDE 16

Towards Efficient String Processingof Annotated Events Superposition and Block Compression String Manipulation

Inverse Block Compression

We can generate infinitely many stuttering strings, all of which are b c-equivalent: Example b c−1( a c ) = { a c , a a c , a c c , . . .} = a

+ c +

Precisely, a string s′ is b c-equivalent to a string s iff s′ ∈ b c−1b c(s), and s′ ∈ b c−1b c(s) iff b c(s) = b c(s′).

slide-17
SLIDE 17

Towards Efficient String Processingof Annotated Events Superposition and Block Compression Asynchronous Superposition

Asynchronous Superposition

This gives our initial definition of asynchronous superposition: Definition (Initial) The asynchronous superposition of two strings s and s′ is the set

  • f strings obtained by block compressing the results of superposing

the strings which are b c-equivalent to s and s′: s &∗ s′ := {b c(s′′) | s′′ ∈ b c−1b c(s) & b c−1b c(s′)} Example a c &∗ b d = { a, b c, d , a, b a, d c, d , a, b b, c c, d }

slide-18
SLIDE 18

Towards Efficient String Processingof Annotated Events Superposition and Block Compression Asynchronous Superposition

Upper Bound on Asynchronous Superposition

We can improve this definition. It can be shown that for two strings of length n and n′, the longest string produced by asynchronous superposition which has no b c-equivalent strings will be of length n + n′ − 1. Thus, for any integer k > 0 and string s, we introduce a new

  • peration padk(s) which will generate the set of strings with

length k which are b c-equivalent to s. Definition padk(s) = b c−1(s) ∩ Σk‘

slide-19
SLIDE 19

Towards Efficient String Processingof Annotated Events Superposition and Block Compression Asynchronous Superposition

Upper Bound on Asynchronous Superposition

An improved definition of asynchronous superposition, which puts a clear finite bound on the infinite language generated by inverse block compression: Definition (Improved) For any s, s′ ∈ Σ+ with nonzero lengths n and n′ respectively, s &∗ s′ = {b c(s′′) | s′′ ∈ padn+n′−1(s) & padn+n′−1(s′)}

slide-20
SLIDE 20

Towards Efficient String Processingof Annotated Events Event Representation Allen Relations

Bounding Boxes

We use the empty box as a string of length 1 (not to be confused with the empty string ǫ, which is length 0) to bound events, allowing us to represent the fact that they are finite. Asynchronous superposition allows us to generate the 13 strings in e &∗ e′ , each of which corresponds to one

  • f the unique Allen Relations, and also one of the relation

types in ISO-TimeML’s TLINKs.

slide-21
SLIDE 21

Towards Efficient String Processingof Annotated Events Event Representation Allen Relations

Allen Relations as Event-Strings

e = e′ e, e′ equal e s e′ e, e′ e′ starts e si e′ e, e′ e starts (inverse) e f e′ e′ e, e′ finishes e fi e′ e e, e′ finishes (inverse) e d e′ e′ e, e′ e′ during e di e′ e e, e′ e during (inverse) e o e′ e e, e′ e′

  • verlaps

e oi e′ e′ e, e′ e

  • verlaps (inverse)

e m e′ e e′ meets e mi e′ e′ e meets (inverse) e < e′ e e′ before e > e′ e′ e after

slide-22
SLIDE 22

Towards Efficient String Processingof Annotated Events Event Representation Allen Relations

Three Unconstrained Bounded Events

e &∗ e′ &∗ e′′ =

{ e, e′, e′′ , e′′ e, e′, e′′ , e′′ e, e′ , e′′ e, e′ , e, e′ e, e′, e′′ , e, e′, e′′ e, e′ , e′′ e, e′, e′′ e, e′ , e, e′ e′′ , e, e′, e′′ e′′ , e′′ e, e′, e′′ e′′ , e, e′ e, e′, e′′ e, e′ , e, e′ e, e′, e′′ e′′ , e, e′ e′′ , e′ e, e′, e′′ , e′, e′′ e, e′, e′′ , e′, e′′ e, e′ , e′′ e′, e′′ e, e′, e′′ , e′′ e′, e′′ e, e′ , e′′ e′ e, e′ , e′′ e′ e, e′ , e′ e′, e′′ e, e′, e′′ , e′ e′, e′′ e, e′ , e′, e′′ e′ e, e′ , e′′ e′, e′′ e′ e, e′ , e′ e, e′ e, e′, e′′ , e′ e, e′, e′′ e, e′ , e′, e′′ e, e′, e′′ e, e′ , e′′ e′, e′′ e, e′, e′′ e, e′ , e′ e, e′ e′′ , e′ e, e′, e′′ e′′ , e′, e′′ e, e′, e′′ e′′ , e′′ e′, e′′ e, e′, e′′ e′′ , e′ e′, e′′ e′ e, e′ , e′ e′, e′′ e, e′, e′′ e, e′ , e′ e′, e′′ e, e′, e′′ e′′ , e′ e, e′ e, e′, e′′ e, e′ , e′ e, e′ e, e′, e′′ e′′ , e′ e, e′ e′′ , . . . }

slide-23
SLIDE 23

Towards Efficient String Processingof Annotated Events Constraints on Event-Strings Well-formed Event-Strings

Constraints

How to prevent unnecessary over-generation?

slide-24
SLIDE 24

Towards Efficient String Processingof Annotated Events Constraints on Event-Strings Well-formed Event-Strings

Reduct

The reduct operation will help to identify well-formed event-strings: Definition The reduct ρX(s) for any X ⊆ A and event-string s produces a componentwise intersection of s with X: ρX(α1 · · · αn) := (α1 ∩ X) · · · (αn ∩ X) Example With a, b ∈ A: b c(ρ{a}( a a, b b )) = a

slide-25
SLIDE 25

Towards Efficient String Processingof Annotated Events Constraints on Event-Strings Well-formed Event-Strings

Well-formed Event-Strings

Fluents are interval-like. Thus for any event-string s and any e ∈ A, b c(ρ{e}(s)) = e (or , if e doesn’t appear in s). Relations are consistent. For example, if the relations e > e′ and e′ > e′′ hold, then the relation e′′ > e cannot also hold. We may discard any event-string which is not well-formed.

slide-26
SLIDE 26

Towards Efficient String Processingof Annotated Events Constraints on Event-Strings Well-formed Event-Strings

Constrained Superposition

When a fluent appears in two different strings, s and s′, which are to be asynchronously superposed, the number of well-formed results is usually reduced. Example The fluent b appears in both strings, yielding only one well-formed result: a b &∗ b c = a b c Without the constraint of being well-formed, the above example would generate 270 strings, rather than 1.

slide-27
SLIDE 27

Towards Efficient String Processingof Annotated Events Constraints on Event-Strings Multiple Events

Transitivity Table Fragment

“before” b c “during” c b, c c “meets” b c “before” a b a b c a c b, c c , a a, c c b, c c , a c b, c c , c a, c c b, c c , a, c c b, c c a b c “during” b a, b b b a, b b c c b, c a, b, c b, c c b a, b b c “meets” a b a b c a a, c b, c c , c a, c b, c c , a, c b, c c a b c

slide-28
SLIDE 28

Towards Efficient String Processingof Annotated Events Constraints on Event-Strings Multiple Events

Arbitrary Events

No matter how many events feature in an event-string, applying the reduct ρ{e,e′} and block compressing (where e and e′ are the events we are interested in) will give the event-string which corresponds to the Allen Relation between e and e′. For example, given a > b, b > c, and c > d we can deduce a > d:

slide-29
SLIDE 29

Towards Efficient String Processingof Annotated Events Constraints on Event-Strings Multiple Events

Arbitrary Events

1 b

c(ρ{a,d}( a b &∗ b c &∗ c d ))

2 b

c(ρ{a,d}( a b c d ))

3 b

c( a d )

4

a d

5 a > d

slide-30
SLIDE 30

Towards Efficient String Processingof Annotated Events Applied to ISO-TimeML Translating TLINKs

Example TLINKs

1 <TLINK relType="IS INCLUDED"

eventInstanceId="ei1" relatedToTime="t1"/>

2 <TLINK relType="IS INCLUDED" timeID="t1"

relatedToEventInstance="ei9"/>

3 <TLINK relType="BEFORE" eventInstanceID="ei9"

relatedToEventInstance="ei10"/>

slide-31
SLIDE 31

Towards Efficient String Processingof Annotated Events Applied to ISO-TimeML Translating TLINKs

TLINKs as Allen Relations

1 ei1 d t1 2 t1 d ei9 3 ei9 > ei10

slide-32
SLIDE 32

Towards Efficient String Processingof Annotated Events Applied to ISO-TimeML Translating TLINKs

TLINKs as Event-Strings

1

t1 ei1, t1 t1

2

ei9 t1, ei9 ei9

3

ei9 ei10

slide-33
SLIDE 33

Towards Efficient String Processingof Annotated Events Applied to ISO-TimeML Translating TLINKs

Combining Information

t1 ei1, t1 t1 &∗ ei9 t1, ei9 ei9 &∗ ei9 ei10 = ei9 t1, ei9 ei1, t1, ei9 t1, ei9 ei9 ei10

slide-34
SLIDE 34

Towards Efficient String Processingof Annotated Events Applied to ISO-TimeML Translating TLINKs

Extracting New Information

1 b

c(ρ{ei1,ei10}( ei9 t1, ei9 ei1, t1, ei9 t1, ei9 ei9 ei10 ))

2 b

c( ei1 ei10 )

3

ei1 ei10

4 ei1 > ei10

slide-35
SLIDE 35

Towards Efficient String Processingof Annotated Events

Further Work

Deciding when to use asynchronous superposition (too many generated strings may not be worth it). Developing the framework to treat event types and include more information (durations, etc.).

slide-36
SLIDE 36

Towards Efficient String Processingof Annotated Events

Acknowledgements

This research is supported by Science Foundation Ireland (SFI) through the CNGL Programme (Grant 12/CE/I2267) in the ADAPT Centre (https://www.adaptcentre.ie) at Trinity College Dublin. The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. Thank you for listening!