 
              Towards Efficient String Processingof Annotated Events Towards Efficient String Processing of Annotated Events David Woods 1 Tim Fernando 2 Carl Vogel 2 1 ADAPT Centre Trinity College Dublin, Ireland 2 Computational Linguistics Group Trinity Centre for Computing and Language Studies School of Computer Science Trinity College Dublin, Ireland ISA-13, 2017
Towards Efficient String Processingof Annotated Events Motivation ISO-TimeML ISO-TimeML Fragment
Towards Efficient String Processingof Annotated Events Motivation TLINKs TLINKs in an ISO-TimeML Document
Towards Efficient String Processingof Annotated Events Motivation TLINKs Examples 1 <TLINK relType="IS INCLUDED" eventInstanceId="ei1" relatedToTime="t1"/> 2 <TLINK relType="IS INCLUDED" timeID="t1" relatedToEventInstance="ei9"/> 3 <TLINK relType="BEFORE" eventInstanceID="ei9" relatedToEventInstance="ei10"/>
Towards Efficient String Processingof Annotated Events Motivation Allen Relations Allen Relations Allen (1983, p835, Fig. 2 )
Towards Efficient String Processingof Annotated Events Motivation Allen Relations Example Example “John slept through the fire alarm last Tuesday.” This sentence gives us two events, and one time period: 1 js =“John slept” (event) 2 fa =“a fire alarm occurred” (event) 3 lt =“last Tuesday” (time period) We can represent the information with the binary Allen Relations: js di fa js d lt
Towards Efficient String Processingof Annotated Events Introduction Strings as Models Strings as Models We can use strings as models to effectively represent this event data. Example “John slept through the fire alarm last Tuesday.” lt lt , js lt , js , fa lt , js lt
Towards Efficient String Processingof Annotated Events Introduction Sets as Symbols Sets as Symbols Fix a finite set A of fluents. Fluents will be understood as naming an event instance (or time ) in ISO-TimeML. We encode finite sets of these fluents as symbols, which may appear in a string.
Towards Efficient String Processingof Annotated Events Introduction Event-Strings Event-Strings A string s = α 1 · · · α n of subsets α i of A can be construed as a finite model consisting of n moments of time i ∈ { 1 , . . . , n } . Each α i specifies all fluents in A that hold simultaneously at i . Each α i is understood to occur chronologically before α j if and only if i < j . The powerset 2 A of A will serve as an alphabet Σ = 2 A of an event-string s ∈ Σ + .
Towards Efficient String Processingof Annotated Events Introduction No Time Without Change No Time Without Change “But neither does time exist without change” – Aristotle, Physics IV
Towards Efficient String Processingof Annotated Events Introduction No Time Without Change No Time Without Change The precise real-time duration of each symbol is disregarded (for now). Event-strings model a kind of inertial world. Change is the only marker of progression from one moment to the next.
Towards Efficient String Processingof Annotated Events Superposition and Block Compression Superposition Superposition In order to usefully collect information from multiple strings into a single string, we define the operation of superposition : Definition With two strings s and s ′ of equal length, their superposition, s & s ′ , is their componentwise union: α 1 · · · α n & α ′ 1 · · · α ′ n := ( α 1 ∪ α ′ 1 ) · · · ( α n ∪ α ′ n )
Towards Efficient String Processingof Annotated Events Superposition and Block Compression Superposition Box Notation For convenience of notation, we draw boxes rather than curly braces { } to represent sets of fluents in an event-string. Example With a , b , c , d ∈ A : a c & b d = a , b c , d
Towards Efficient String Processingof Annotated Events Superposition and Block Compression String Manipulation Stutter We can cause a string s = α 1 · · · α n to stutter such that α i = α i +1 for some integer 0 < i < n . For example, a a a c c is a stuttering version of a c . Since the realtime duration of each box is not taken into account, the interpretation of the string is unaffected.
Towards Efficient String Processingof Annotated Events Superposition and Block Compression String Manipulation Block Compression We can transform a stuttering string to a stutterless string through block compression : Definition  s if length ( s ) ≤ 1  c ( α s ′ ) if s = αα s ′ c ( s ) := b b if s = αα ′ s ′ with α � = α ′ c ( α ′ s ′ ) α b  Thus, b c ( a a a c c ) = a c .
Towards Efficient String Processingof Annotated Events Superposition and Block Compression String Manipulation Inverse Block Compression We can generate infinitely many stuttering strings, all of which are c -equivalent : b Example + c + c − 1 ( a c ) = { a c , a a c , a c c , . . . } = a b Precisely, a string s ′ is b c -equivalent to a string s iff s ′ ∈ b c − 1 b c ( s ), and s ′ ∈ b c − 1 b c ( s ′ ). c ( s ) iff b c ( s ) = b
Towards Efficient String Processingof Annotated Events Superposition and Block Compression Asynchronous Superposition Asynchronous Superposition This gives our initial definition of asynchronous superposition : Definition (Initial) The asynchronous superposition of two strings s and s ′ is the set of strings obtained by block compressing the results of superposing c -equivalent to s and s ′ : the strings which are b s & ∗ s ′ := { b c ( s ′′ ) | s ′′ ∈ b c − 1 b c − 1 b c ( s ′ ) } c ( s ) & b Example a c & ∗ b d = { a , b c , d , a , b a , d c , d , a , b b , c c , d }
Towards Efficient String Processingof Annotated Events Superposition and Block Compression Asynchronous Superposition Upper Bound on Asynchronous Superposition We can improve this definition. It can be shown that for two strings of length n and n ′ , the longest string produced by asynchronous superposition which c -equivalent strings will be of length n + n ′ − 1. has no b Thus, for any integer k > 0 and string s , we introduce a new operation pad k ( s ) which will generate the set of strings with length k which are b c -equivalent to s . Definition c − 1 ( s ) ∩ Σ k ‘ pad k ( s ) = b
Towards Efficient String Processingof Annotated Events Superposition and Block Compression Asynchronous Superposition Upper Bound on Asynchronous Superposition An improved definition of asynchronous superposition, which puts a clear finite bound on the infinite language generated by inverse block compression: Definition (Improved) For any s , s ′ ∈ Σ + with nonzero lengths n and n ′ respectively, s & ∗ s ′ = { b c ( s ′′ ) | s ′′ ∈ pad n + n ′ − 1 ( s ) & pad n + n ′ − 1 ( s ′ ) }
Towards Efficient String Processingof Annotated Events Event Representation Allen Relations Bounding Boxes We use the empty box as a string of length 1 (not to be confused with the empty string ǫ , which is length 0) to bound events, allowing us to represent the fact that they are finite. Asynchronous superposition allows us to generate the 13 e ′ strings in & ∗ , each of which corresponds to one e of the unique Allen Relations, and also one of the relation types in ISO-TimeML’s TLINKs.
Towards Efficient String Processingof Annotated Events Event Representation Allen Relations Allen Relations as Event-Strings e = e ′ e , e ′ equal e s e ′ e , e ′ e ′ starts e si e ′ e , e ′ e starts (inverse) e f e ′ e ′ e , e ′ finishes e fi e ′ e , e ′ e finishes (inverse) e d e ′ e ′ e , e ′ e ′ during e di e ′ e , e ′ e e during (inverse) e o e ′ e , e ′ e ′ e overlaps e oi e ′ e ′ e , e ′ overlaps (inverse) e e m e ′ e ′ e meets e mi e ′ e ′ e meets (inverse) e < e ′ e ′ before e e > e ′ e ′ after e
Recommend
More recommend