Reverse engineering using computational algebra Elena Dimitrova - - PowerPoint PPT Presentation

reverse engineering using computational algebra
SMART_READER_LITE
LIVE PREVIEW

Reverse engineering using computational algebra Elena Dimitrova - - PowerPoint PPT Presentation

Reverse engineering using computational algebra Elena Dimitrova School of Mathematical and Statistical Sciences Clemson University http://edimit.people.clemson.edu/ Algebraic Biology E. Dimitrova (Clemson) Reverse engineering using


slide-1
SLIDE 1

Reverse engineering using computational algebra

Elena Dimitrova School of Mathematical and Statistical Sciences Clemson University http://edimit.people.clemson.edu/ Algebraic Biology

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 1 / 57

slide-2
SLIDE 2

What is reverse engineering?

Sometimes, complex biological systems can seem a bit like this: (click here!). Systems biology is the study of systems of biological components. A central problem in systems biology is to use experimental data to infer the structure of a system such as a gene regulatory network.

Modeling approaches

Bottom-up: Build a network from the known local information about every single

  • bject.

Top-down (“Reverse-engineering”): View the system as a black box, then use the available data to make a model. Previously, we’ve mostly studied the first approach to modeling. In this lecture, we’ll focus

  • n the second approach.

Many problems in statistics (e.g., linear regression) deal with the second approach.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 2 / 57

slide-3
SLIDE 3

The blind men and the elephant

An old parable from India tells of several blind men who try to determine what an elephant looks like just by touch. The blind men are trying to reverse engineer an elephant from just a few data points.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 3 / 57

slide-4
SLIDE 4

Inferring a Boolean model (elephant) from data (observations)

Consider a Boolean network on n nodes, with update function f : Fn

2 → Fn

  • 2. There are 2n

input states. Suppose we don’t know the actual function f , but through experimental data, we are able to

  • bserve several transitions:

s1 = (s11, s12, . . . , s1n) s2 = (s21, . . . , s2n) sm = (sm1, . . . , smn) t1 = (t11, t12, . . . , t1n) t2 = (t21, . . . , t2n) tm = (tm1, . . . , tmn)

· · · · · ·

Reverse engineering

Start with experimental data (observations) and reconstruct the model (elephant). The two main features are: (i) the network topology, or wiring diagram, (ii) the Boolean functions at each node: f = (f1, . . . , fn). This problem is not just limited to models over F2 = {0, 1}; it works for models over larger finite fields F. We will call such models local models.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 4 / 57

slide-5
SLIDE 5

Inferring a Boolean network (elephant) from data (observations)

Consider the following Boolean network: f1(x1, x2, x3) = x1 ∧ x2 = x1x2 f2(x1, x2, x3) = x1 ∧ x2 ∧ x3 = x1x2x3 f3(x1, x2, x3) = x1 ∧ x2 = x1x2 . The state space of f = (f1, f2, f3) is the following graph:

001 010 011 100 101 000 110 111

Question

What if we only knew part of this state space, e.g., (1, 1, 0) − → (1, 0, 1) − → (0, 0, 0) − → (0, 0, 0) . Could we recover the individual functions? How many possible models could yield this “fragment”?

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 5 / 57

slide-6
SLIDE 6

Reverse engineering the model space

Broad goal

Find “the best” local model f = (f1, . . . , fn) that fits the data: Input states: s1, . . . , sm ∈ Fn Output states: t1, . . . , tm ∈ Fn with f (si) = ti Note that: f (si) = (f1(si), f2(si), . . . , fn(si)) = (ti1, ti2, . . . , tin) = ti.

Question

What if no models fit the data? (This is actually impossible.) What if many models fit the data? First, we’ll find all local models that fit the data. This is called the model space: F1 × · · · × Fn =

  • (f1, . . . , fn) | fj(si) = tij for all i and j
  • .

Once we do this, the new problem becomes choosing the “best” one. This is called model selection.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 6 / 57

slide-7
SLIDE 7

Similar problems in other areas of mathematics

  • 1. Parametrize a line in Rn.
  • 2. Parametrize a plane in Rn.
  • 3. Solve the underdetermined system Ax = b.
  • 4. Solve the differential equation x′′ + x = 2.
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 7 / 57

slide-8
SLIDE 8

Parametrize a line in Rn

Suppose we want to write the equation for a line that contains a vector v ∈ Rn: x y z

v t v w v+w t v+w

This line, which contains the zero vector, is tv = {tv : t ∈ R}. Now, what if we want to write the equation for a line parallel to v? This line, which does not contain the zero vector, is tv + w = {tv + w : t ∈ R} . Note that ANY particular w on the line will work!!!

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 8 / 57

slide-9
SLIDE 9

Solve an underdetermined system Ax = b

Suppose we have a system of equations that has “too many variables,” so there are infinitely many solutions. For example: 2x + y + 3z = 4 3x − 5y − 2z = 6 “Ax = b form”: 2 1 3 3 −5 −2   x y z   = 4 6

  • .

How to solve:

  • 1. Solve the related homogeneous equation Ax = 0 (this is null space, NS(A));
  • 2. Find any particular solution xp to Ax = b;
  • 3. Add these together to get the general solution: x = NS(A) + xp.

This works because geometrically, the solution space is just a line, plane, etc. Here are two possible ways to write the solution: C   1 1 −1   +   2  , C   1 1 −1   +   10 8 −8  .

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 9 / 57

slide-10
SLIDE 10

Linear differential equations

Solve the differential equation x′′ + x = 2. How to solve:

  • 1. Solve the related homogeneous equation x′′ + x = 0. The solutions are

xh(t) = a cos t + b sin t.

  • 2. Find any particular solution xp(t) to x′′ + x = 2. By inspection, we see that xp(t) = 2

works.

  • 3. Add these together to get the general solution:

x(t) = xh(t) + xp(t) = a cos t + b sin t + 2. Note that while the general solution above is unique, its presentation need not be. For example, we could write it this way: x(t) = xh(t) + xp(t) = a(2 cos t − 3 sin t) + b sin t + (2 − cos t + 8 sin t). Here, the particular solution has (unnecessary) “extra terms” that vanish on the homogeneous part, x′′ + x = 0.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 10 / 57

slide-11
SLIDE 11

Reverse engineering: Problem statement

Recall that a local model over F is an n-tuple f = (f1, . . . , fn) of functions fi : Fn → F. The associated finite dynamical system (FDS) map is f : Fn − → Fn, f : x − → (f1(x), . . . , fn(x)). If F = Fp then each fi : Fn

p → Fp is a polynomial in Fp[x1, . . . , xn]/xp 1 − x1, . . . , xp n − xn.

Goal

Given a set of data: Input states: s1, . . . , sm ∈ Fn Output states: t1, . . . , tm ∈ Fn with f (si) = ti Construct the model space F1 × · · · × Fn of all local models f = (f1, . . . , fn) that fit the data: f (si) = (f1(si), . . . , fn(si)) = (ti1, . . . , tin) = ti. We’ll find each F1, . . . , Fn separately.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 11 / 57

slide-12
SLIDE 12

Reverse engineering: How to find Fj

We wish to find the set Fj of all local functions (polynomials!) fj that fit the data: Fj = {fj : fj(s1) = t1j, . . . , fj(sm) = tmj} . Define the set I (it is actually an “ideal” of the polynomial ring F[x1, . . . , xn]) I = {h : h(si) = 0 for all i = 1, . . . , m} = {all polynomials that vanish on the data}.

Theorem

The set of polynomials that fit the data at node j is Fj = fj + I = {fj + h : h ∈ I} , where fj is any one particular polynomial that fits the data. Thus, to find Fj, we need to do two things:

  • 1. Find the ideal I; (all solutions to {fj(si) = 0, ∀i})
  • 2. Find any polynomial fj that fits the data. (one solution to {fj(si) = tij, ∀i})
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 12 / 57

slide-13
SLIDE 13

Reverse engineering: How to find I and fj

  • 1. Finding I: Define I(si) to be the set of polynomials that vanish on si:

I(si) = {all polynomials hi such that hi(si) = 0} = {(x1 − si1)g1(x) + (x2 − si2)g2(x) + · · · + (xn − sin)gn(x)} = x1 − si1, x2 − si2, . . . , xn − sin Clearly, the set I of polynomials that vanish on all si (for i = 1, . . . , m) is I =

m

  • i=1

I(si) .

  • 2. Finding fj: There are many algorithms. Lagrange interpolation is one of them:

f (x1, . . . , xn) =

  • (c1,...,cn)∈V

[f (c1, . . . , cn)

n

  • i=1

1 − (xi − ci)p−1]. In this lecture, we will learn another method which has the Chinese remainder theorem lurking behind the scenes.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 13 / 57

slide-14
SLIDE 14

Finding fj (one method)

For each data point si (i = 1, . . . , m), we’ll construct an r-polynomial that has the following property: ri(x) =

  • 1

x = si x = si Once we have these, the polynomial fj(x) we seek will be fj(x) = t1jr1(x) + t2jr2(x) + · · · + tmjrm(x) . One way to construct the r-polynomials: ri(x) =

m

  • k=1

k=i

bik(x) , where bik(x) = (siℓ − skℓ)p−2(xℓ − skℓ) and ℓ is any coordinate in which si and sk differ. (We’ll take the first coordinate for simplicity.)

Remark

When our systems are Boolean (over F2), then this reduces to bik(x) = xℓ − skℓ.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 14 / 57

slide-15
SLIDE 15

A Boolean example

Consider the following model of the lac operon, which implicitly assumes that A degrades slower than M or B.      fM = xA fB = xM fA = L ∨ (B ∧ Lm) ∨ (A ∧ B). If lactose levels are low, then L = Lm = 0, and this model reduces to the one at left (left) with state space (right):      f1 = x3 f2 = x1 f3 = (x2 + 1)x3.

001 101 111 110 010 000 100 011

Exercise

Let’s reverse engineer the Boolean network from just knowing the 6 red nodes and 5 transitions. In other words, find all triples of polynomials f = (h1, h2, h3) such that:

f (0, 0, 1) = (h1(0, 0, 1), h2(0, 0, 1), h3(0, 0, 1)) = (1, 0, 1), f (1, 0, 1) = (h1(1, 0, 1), h2(1, 0, 1), h3(1, 0, 1)) = (1, 1, 1), f (1, 1, 1) = (h1(1, 1, 1), h2(1, 1, 1), h3(1, 1, 1)) = (1, 1, 0), f (1, 1, 0) = (h1(1, 1, 0), h2(1, 1, 0), h3(1, 1, 0)) = (0, 1, 0), f (0, 1, 0) = (h1(0, 1, 0), h2(0, 1, 0), h3(0, 1, 0)) = (0, 0, 0),

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 15 / 57

slide-16
SLIDE 16

A Boolean example (cont.)

Since we know the original functions a priori, we secretly know the answer to this. (f1, f2, f3) = (x3, x1, (x2 + 1)x3).

001 101 111 110 010 000 100 011

The ideal of polynomials that vanish on each sk is: I1 = ideal(x1, x2, x3-1); I2 = ideal(x1-1, x2, x3-1); I3 = ideal(x1-1, x2-1, x3-1); I4 = ideal(x1-1, x2-1, x3); I5 = ideal(x1, x2-1, x3); The ideal of polynomials that vanish on every sk is: I = intersect{I1,I2,I3,I4,I5}; Thus, the complete model space is F1 × F2 × F3, Fj = fj + I . If we hadn’t known f1, f2, f3 a priori, then we’d have to find our own particular solution that fits the data. We’ll do that now.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 16 / 57

slide-17
SLIDE 17

A Boolean example (cont.)

We’re looking for a single solution f = (f1, f2, f3) that fits the data. We know how to do this. For example: f1(x) = t11r1(x) + t21r2(x) + t31r3(x) + t41r4(x) + t51r5(x) = 1r1(x) + 1r2(x) + 1r3(x) + 0r4(x) + 0r5(x) = r1(x) + r2(x) + r3(x) , where r1(x) =

5

  • k=1

k=1

b1k(x) = b12(x)b13(x)b14(x)b15(x) .

s1 = (0, 0, 1)= t0 s2 = (1, 0, 1) = t1 s3 = (1, 1, 1) = t2 s4 = (1, 1, 0) = t3 s5 = (0, 1, 0) = t4 s6 =(0, 0, 0) = t5

Recall that b1k(x) = xℓ − skℓ, where ℓ is the first coordinate that s1 differs from sk. skip k = 1 b12(x) = (x1 − s21) = x1 + 1 b13(x) = (x1 − s31) = x1 + 1 b14(x) = (x1 − s41) = x1 + 1 b15(x) = (x2 − s52) = x2 + 1 Now, r1(x) = (x1 + 1)3(x2 + 1) = (x1 + 1)(x2 + 1).

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 17 / 57

slide-18
SLIDE 18

A Boolean example (cont.)

Recall that bik(x) = xℓ − skℓ, where ℓ is the first coordinate that si differs from sk.

s1 = (0, 0, 1)= t0 s2 = (1, 0, 1) = t1 s3 = (1, 1, 1) = t2 s4 = (1, 1, 0) = t3 s5 = (0, 1, 0) = t4 s6 = (0, 0, 0) = t5

b21(x) = (x1 − s11) = x1 skip k = 2 b23(x) = (x2 − s32) = x2 + 1 b24(x) = (x2 − s42) = x2 + 1 b25(x) = (x1 − s51) = x1 b31(x) = (x1 − s11) = x1 b32(x) = (x2 − s22) = x2 skip k = 3 b34(x) = (x3 − s43) = x3 b35(x) = (x1 − s51) = x1 b41(x) = (x1 − s11) = x1 b42(x) = (x2 − s22) = x2 b43(x) = (x3 − s33) = x3 + 1 skip k = 4 b45(x) = (x1 − s51) = x1 b51(x) = (x2 − s12) = x2 b52(x) = (x1 − s21) = x1 + 1 b53(x) = (x1 − s31) = x1 + 1 b54(x) = (x1 − s42) = x1 + 1 skip k = 5 Recall that xk

i = xi, and (xj + 1)k = xj + 1, so the “r-polynomials” are

r1(x) = (x1 + 1)(x2 + 1) r2(x) = x1(x2 + 1) r3(x) = x1x2x3 r4(x) = x1x2(x3 + 1) r5(x) = (x1 + 1)x2

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 18 / 57

slide-19
SLIDE 19

A Boolean example (cont.)

We can now compute our particular solution (f1, f2, f3) that fits the data, using: fj(x) = t1jr1(x) + t2jr2(x) + · · · + tmjrm(x) .

s1 = (0, 0, 1)= t0 s2 = (1, 0, 1) = t1 s3 = (1, 1, 1) = t2 s4 = (1, 1, 0) = t3 s5 = (0, 1, 0) = t4 s6 = (0, 0, 0) = t5

f1(x) = t11r1(x) + t21r2(x) + t31r3(x) + t41r4(x) + t51r5(x) = r1(x) + r2(x) + r3(x) = 1 + x2 + x1x2x3 f2(x) = t12r1(x) + t22r2(x) + t32r3(x) + t42r4(x) + t52r5(x) = r2(x) + r3(x) + r4(x) = x1 f3(x) = t13r1(x) + t23r2(x) + t33r3(x) + t43r4(x) + t53r5(x) = r1(x) + r2(x) = 1 + x2. Our original FDS was (f1, f2, f3) = (x3, x1, x3 + x2x3), but our algorithm yielded (f1, f2, f3) = (1 + x2 + x1x2x3, x1, 1 + x2) = (x3, x1, x3 + x2x3) + (1 + x2 + x3 + x1x2x3, 0, 1 + x2 + x3 + x2x3)

Remark

Each polynomial in the 2nd term above is in the vanishing ideal I. (Why?)

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 19 / 57

slide-20
SLIDE 20

A Boolean example (cont.)

Figure: The original phase space (left), and the reverse-engineered phase space (right).

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 20 / 57

slide-21
SLIDE 21

A Boolean example (cont.)

Now that we found a particular solution f = (f1, f2, f3) that fits the data, we need to (re)compute the ideal I of polynomials that vanish on the data. We’ll use Macaulay2 in Sage: %default_mode macaulay2 R=ZZ/2[x1,x2,x3] / ideal(x1^2-x1, x2^2-x2, x3^2-x3);

s1 = (0, 0, 1)= t0 s2 = (1, 0, 1) = t1 s3 = (1, 1, 1) = t2 s4 = (1, 1, 0) = t3 s5 = (0, 1, 0) = t4 s6 =(0, 0, 0) = t5

The ideal of polynomials that vanish on each sk is: I1 = ideal(x1, x2, x3-1); I2 = ideal(x1-1, x2, x3-1); I3 = ideal(x1-1, x2-1, x3-1); I4 = ideal(x1-1, x2-1, x3); I5 = ideal(x1, x2-1, x3); The ideal of polynomials that vanish on every sk is: I = intersect{I1,I2,I3,I4,I5} To compute a Gr¨

  • bner basis:

G = gens gb I The output is: | x2x3+x2+x3+1 x1x2+x1x3+x1+x2+x3+1 |

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 21 / 57

slide-22
SLIDE 22

A Boolean example (cont.)

In conclusion, the set of all Boolean models that fit the data

001 101 111 110 010 000

i.e., the model space, is the set F1 × F2 × F3, Fj = fj + I where I is the vanishing ideal I = g1, g2 = 1 + x2 + x3 + x2x3, 1 + x1 + x2 + x3 + x1x2 + x1x3 . Our reverse-engineered BN is slighly different than the “true model”: (f1, f2, f3) = (1 + x2 + x1x2x3, x1, 1 + x2) = (x3 + x1g1 + g2, x1, (x2 + 1)x3 + g1) Note that x1g1, 0, and g1 must be in the vanishing ideal I.

Next goal (“model selection”)

We would like to be able to recover functions in Fj = fj + I that have no “extra terms” in I.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 22 / 57

slide-23
SLIDE 23

A Boolean example (cont.)

Goal (“model selection”)

We would like to be able to be able to recover functions in Fj = fj + I that have no “extra terms” in I.

001 101 111 110 010 000

We can do this with Macaulay2. It’s called finding the remainder of fj modulo I, and we use the % symbol. f1 = 1+x2+x1*x2*x3; f2 = x1; f3 = 1+x2; f1%I; f2%I; f3%I; The output is: x3, x1, x2+1. Almost the original Boolean model!

Question

What would happen if we: added the (original) self-loop at 000 to the data? removed the data point 010 − → 000?

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 23 / 57

slide-24
SLIDE 24

An example over F5

Consider the following time series in a 3-node local model over F5:

s1 = (2, 0, 0)= t0 s2 = (4, 3, 1) = t1 s3 = (3, 1, 4) = t2 s4 =(0, 4, 3) = t3

For reference, here are the input vectors si and output vectors ti: s1 = (s11, s12, s13) = (2, 0, 0) , t1 = (t11, t12, t13) = (4, 3, 1) , s2 = (s21, s22, s23) = (4, 3, 1) , t2 = (t21, t22, t23) = (3, 1, 4) , s3 = (s31, s32, s33) = (3, 1, 4) , t3 = (t31, t32, t33) = (0, 4, 3) . Note that s1 differs from s2 and s3 in the ℓ = 1 coodinate, so this ℓ will work for each of r1, r2, and r3.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 24 / 57

slide-25
SLIDE 25

An example over F5 (cont.)

Since we are working in F5, we are taking the remainder of everything modulo 5. Particularly useful identities are: 0 = 5, −1 = 4, −2 = 3, −3 = 2, and −4 = 1. Using our formulas for bij(x), we compute: b12(x) = (s11 − s21)3(x1 − s21) = (2 − 4)3(x1 − 4) = −8(x1 + 1) = 2x1 + 2 b13(x) = (s11 − s31)3(x1 − s31) = (2 − 3)3(x1 − 3) = −x1 + 3 = 4x1 + 3 . Therefore, the first r-polynomial is r1(x) = b12(x)b13(x) = (2x1 + 2)(4x1 + 3) = 8x2

1 + 14x1 + 6 = 3x2 1 + 4x1 + 1 .

In-class Exercise

Compute the other two r-polynomials in this example: r2(x) and r3(x). Solution: r2(x) = 3x2

1 + 3 ,

r3(x) = 4x2

1 + x1 + 2.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 25 / 57

slide-26
SLIDE 26

An example over F5 (cont.)

In summary, we computed the r-polynomials to be: r1(x) = b12(x)b13(x) = (2x1 + 2)(4x1 + 3) = 8x2

1 + 14x1 + 6 = 3x2 1 + 4x1 + 1

r2(x) = b21(x)b23(x) = (3x1 + 4)(x1 + 2) = 3x2

1 + 10x1 + 8 = 3x2 1 + 3

r2(x) = b31(x)b32(x) = (x1 + 3)(4x1 + 4) = 4x2

1 + 16x1 + 12 = 4x2 1 + x1 + 2

Thus, the following functions fit the data: f1(x) = t11r1(x) + t21r2(x) + t31r3(x) = 4(3x2

1 + 4x1 + 1) + 3(3x2 1 + 3) + 0(4x2 1 + x1 + 2)

= x2

1 + x1 + 3

f2(x) = t12r1(x) + t22r2(x) + t32r3(x) = 3(3x2

1 + 4x1 + 1) + 1(3x2 1 + 3) + 4(4x2 1 + x1 + 2)

= 3x2

1 + x1 + 4

f3(x) = t13r1(x) + t23r2(x) + t33r3(x) = 1(3x2

1 + 4x1 + 1) + 4(3x2 1 + 3) + 3(4x2 1 + x1 + 2)

= 2x2

1 + 2x1 + 4

Comments on this? [Note that only the variable x1 appears.]

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 26 / 57

slide-27
SLIDE 27

An example over F5 (cont.)

Recall that the ideal I is the set of polynomials that vanish on all si: I = I(s1) ∩ I(s2) ∩ I(s3) s1 = (2, 0, 0) , s2 = (4, 3, 1) , s3 = (3, 1, 4) . These are precisely the sets I(s1) = x1 − 2, x2, x3 = {(x1 − 2)g1(x) + x2g2(x) + x3g3(x)} I(s2) = x1 − 4, x2 − 3, x3 − 1 = {(x1 − 4)g1(x) + (x2 − 3)g2(x) + (x3 − 1)g3(x)} I(s3) = x1 − 3, x2 − 1, x3 − 4 = {(x1 − 3)g1(x) + (x2 − 1)g2(x) + (x3 − 4)g3(x)}. As before, we compute the vanishing ideal I in Macaulay2: R=ZZ/5[x1,x2,x3] / ideal(x1^5-x1, x2^5-x2, x3^5-x3); I1 = ideal(x1-2, x2, x3); I2 = ideal(x1-4, x2-3, x3-1); I3 = ideal(x1-3, x2-1, x3-4); I = intersect{I1,I2,I3}; G = gens gb I This says that a Gr¨

  • bner basis for I under the default monomial ordering GrRevLex is

G = {x1 − 2x2 − x3 − 2, x2

3 + 2x2 − 2x3, x2x3 + 2x2 + x3, x2 2 + x3}.

For a slightly different format, try the command: flatten entries gens gb I

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 27 / 57

slide-28
SLIDE 28

An example over F5 (cont.)

Let us enter the functions that we reverse-engineered into Macaulay2: f1=x1*x1+x1+3; f2=3*x1^2+x1+4; f3=2*x1^2+2*x1+4; The following command will reduce each function modulo I: p1=f1%I; p2=f2%I; p3=f3%I; (p1,p2,p3) The output is (p1, p2, p3) = (−x3 − 1, x2 − 2, −2x3 + 1). The polynomial pj is called the normal form of fj with respect to the Gr¨

  • bner basis G.

Using a different monomial order will (likely) give a different Gr¨

  • bner basis, and thus a

different reduction modulo I. Note: To use the Lex monomial ordering, repeat the above commands but using R=ZZ/5[x1,x2,x3,MonomialOrder=>Lex] / ideal(x1^5-x1, x2^5-x2, x3^5-x3);

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 28 / 57

slide-29
SLIDE 29

Motivation

Problem: Multiple models result from the reverse-engineering process. Hope: Suggest experiments that will allow for unique model identification.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 29 / 57

slide-30
SLIDE 30

Multivariate polynomial rings

n ≥ 2 k[x1, . . . , xn] is not a PID.

Theorem (Hilbert Basis Theorem)

Every ideal I ⊆ k[x1, . . . , xn] is finitely generated, i.e. I =< f1, . . . , ft > for some fi ∈ I.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 30 / 57

slide-31
SLIDE 31

Ideal Membership Problem

Example

k = Z3, k[x, y] f1 = y + 1, f2 = y2 + xy, I =< f1, f2 > Is f = y2 − x ∈ I? Divide f first by f1: y y + 1 |y2 − x y2 + y −x − y y2 | − x. Does this mean that −x − y is the remainder? I.e., f / ∈ I?? (BTW, LT(f2) = y2 or xy? LT(−x − y) = −x or −y?)

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 31 / 57

slide-32
SLIDE 32

Ideal Membership Problem

Example

k = Z3, k[x, y] f1 = y + 1, f2 = y2 + xy, I =< f1, f2 > Is f = y2 − x ∈ I? Divide f first by f2: 1 −x y2 + xy |y2 − x y + 1 |−xy − x y2 + xy −xy − x −xy − x f = −xf1 + f2, so f ∈ I!

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 32 / 57

slide-33
SLIDE 33

Generating sets

Problem: {f1, f2} is not a “nice” generating set for I. “Definition” A nice generating set for an ideal will not do what {f1, f2} did for I, i.e. The remainder of division will be the same regardless of the order of division. Remainder is 0 iff f ∈ I.

Theorem (Buchberger, 1965)

Nice generating sets exist for any polynomial ideal I = 0 and are called Gr¨

  • bner bases (GBs).

Notes: I can have multiple GBs because there is no unique way to order the monomials of a multivariate polynomial (x2 ≺ xy or x2 ≻ xy?) Each I = 0 has finitely many GBs even for k infinite.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 33 / 57

slide-34
SLIDE 34

Monomial orders

Definition

A monomial order on k[x1, . . . , xn] is any relation ≺ on Zn

≥0, or equivalently, on the set of

monomials xα, α ∈ Zn

≥0, satisfying:

  • 1. ≺ is a total order on Zn

≥0.

  • 2. If α ≺ β and γ ∈ Zn

≥0, then α + γ ≺ β + γ.

  • 3. ≺ is a well-ordering on Zn

≥0 (every nonempty subset of Zn ≥0 has a smallest element

under ≺). Examples: The lexicographic monomial order (lex) is analogous to ordering of words in dictionaries, e.g. x2y = xxy ≻ xyz. The graded lexicographic order (grlex) uses total degree for comparison and uses lex to breaks ties, e.g. xy2z4 ≻grlex xyz5 since both have total degree 7 and xy2z4 ≻lex xyz5.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 34 / 57

slide-35
SLIDE 35

Gr¨

  • bner bases

Definition

Let I ⊆ k[x1, . . . , xn] be an ideal other than 0. LT(I) = {cxα : there exists f ∈ I with LT(f ) = cxα} and < LT(I) >, called the initial ideal of I, is the ideal generated by the elements of LT(I).

Definition

Fix a monomial order. G = {g1, . . . , gt} is a Gr¨

  • bner basis (GB) of I if

< LT(g1), . . . , LT(gt) >=< LT(I) >.

Example

Recall the example over Z3: f1 = y + 1, f2 = xy + y2, f = −x + y2, I =< f1, f2 >. Let’s use lex. We saw that −xf1 + f2 = −x + y2 = f ∈ I, so LT(f ) = −x ∈< LT(I) >. However, neither y = LT(f1) nor xy = LT(f2) divide −x, so LT(f ) / ∈< LT(f1), LT(f2) >.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 35 / 57

slide-36
SLIDE 36

Gr¨

  • bner bases

No uniqueness is guaranteed by the GB existence theorem, not even for a fixed monomial

  • rder. To remedy this for a fixed monomial order...

Definition

Fix a monomial order. A reduced GB for a polynomial ideal I is a GB G for I such that:

  • 1. For all g ∈ G, g is monic.
  • 2. LT(g) does not divide any term of any h ∈ G − {g}.

Theorem

Let I = 0. For a given monomial order, I has a unique reduced GB. Note: An ideal can still have multiple reduced GBs for different monomial orders.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 36 / 57

slide-37
SLIDE 37

Ideals of a single point

Definition

Let V ⊆ kn. The ideal I = I(V ) = {f ∈ k[x1, . . . , xn] : f (a) = 0 for all a ∈ V } is the ideal

  • f points (or vanishing ideal) of V .

Example

k = Z3, R = k[x, y], V = {(2, 0), (0, 1)} Vanishing ideal of {(2, 0)}: I1 =< x − 2, y − 0 > Vanishing ideal of {(0, 1)}: I1 =< x − 0, y − 1 > Ideals of a single point are maximal. They have the same reduced GB wrt any monomial

  • rder.
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 37 / 57

slide-38
SLIDE 38

Ideals of multiple points

Example

k = Z3, R = k[x, y], V = {(2, 0), (0, 1)} Compute the ideal of both points: I(V ) = I1 ∩ I2 = I1I2 =< x2 − 2x, xy, xy − 2y − x + 2, y2 − y > (Generators don’t form a GB.) Reduced GB wrt grevlex: G1 = {x − y + 1, y2 − y} (Not unique; there is one more.)

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 38 / 57

slide-39
SLIDE 39

Unique reduced GBs

Question: When is a reduced GB for an ideal unique wrt any monomial order?

Theorem (D, Gao, Stigler, 2016)

A reduced GB G = {g1, . . . , gt} is a GB wrt any monomial order iff for each gi, each of its non-leading monomials divides LT(gi).

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 39 / 57

slide-40
SLIDE 40

Unique reduced GBs for ideals of points

Question: When is a reduced GB for an ideal of points I(V ) unique wrt any monomial

  • rder? Properties of V ?
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 40 / 57

slide-41
SLIDE 41

Gr¨

  • bner fan

Generally, we can’t tell how many reduced GBs an ideal has until we compute them (finitely many). The information is contained in the Gr¨

  • bner fan of the ideal.

Introduced by

Mora and Robbiano (1988) Bayer and Morrison (1988)

Combinatorial structure that contains information about all reduced Gr¨

  • bner bases of an

ideal A polyhedral complex of cones: maximal cone ↔ reduced marked Gr¨

  • bner basis ↔ initial ideal
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 41 / 57

slide-42
SLIDE 42

The Gr¨

  • bner fan of a polynomial ideal in 3 variables intersected with hyperplane wi = 1:
  • A. Jensen. Gfan, a software system for Gr¨
  • bner fans, http://home.imf.au.dk/ajensen/software/gfan/gfan.html, 2005.
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 42 / 57

slide-43
SLIDE 43

Initial ideals

Definition

Let I be a polynomial ideal. The initial ideal of I, in(I), is generated by the leading terms (wrt a monomial order) of its generators.

Example

k = Z3, R = k[x, y], V = {(2, 0), (0, 1)} I(V ) has two reduced GBs: G1 =

  • x − y + 1, y2 − y
  • and G2 =
  • x2 + x, y − x + 2
  • .

Initial ideals: in1(I) =

  • x, y2

and in2(I) =

  • x2, y
  • .
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 43 / 57

slide-44
SLIDE 44

Staircases of the initial ideals

Monomial staircases of the initial ideals. A monomial is depicted via its exponent vector: (m, n) ↔ xmyn. (A) LT1(I) =

  • x, y2

(B) LT2(I) =

  • x2, y
  • .

Monomials “under” the staircase are called standard monomials (SMs) of the initial ideal: the monomials that are not in the initial ideal. SM1(I) = {1, y} and SM2(I) = {1, x}.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 44 / 57

slide-45
SLIDE 45

Polynomial normal forms

Example

k = Z3, R = k[x, y], V = {(2, 0), (0, 1)} G1 =

  • x − y + 1, y2 − y
  • and G2 =
  • x2 + x, y − x + 2
  • .

LT1(I) =

  • x, y2

and LT2(I) =

  • x2, y
  • .

SM1(I) = {1, y} and SM2(I) = {1, x}. Let f = x2 + y + 1. f /I(V ) = f

G1 = −y − 1, the normal form of f wrt G1.

Notice: f

G1 is a linear combination of the standard monomials in SM1(I).

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 45 / 57

slide-46
SLIDE 46

Theorem

Let I be an ideal of R = k[x1, . . . , xn]. R/I is isomorphic as a k-vector space to the space spanned by the standard monomials of I. f /I(V ) is a linear combination over k of standard monomials. The set of standard monomials depends on the choice of GB.

Theorem

|V | = |SM(I(V ))|. Proof: Let V = {a1, . . . , at} ⊆ kn. By Chinese Remainder Theorem R/I(V ) ∼ = R/I1 × · · · × R/It, where Ii is the vanishing ideal of ai.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 46 / 57

slide-47
SLIDE 47

Interpolating polynomials

Problem: Given V ⊆ kn, find an interpolating polynomial f . If k = Zp, then this is easy: f (x1, . . . , xn) =

  • (c1,...,cn)∈V

[f (c1, . . . , cn)

n

  • i=1

1 − (xi − ci)p−1].

Example

Let k = Z3, V = {(2, 0), (0, 1)}, f ((2, 0)) = 1, and f ((0, 1)) = 2. Then f (x, y) = −x2y − xy2 − x2 + y2 + x + y. Problem: There are many other interpolating polynomials, e.g. f + x − y + 1 also vanishes

  • n V .
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 47 / 57

slide-48
SLIDE 48

Interpolating polynomials

Theorem

The set of interpolating polynomials is the coset f + I(V ), where f is any interpolating polynomial. Notice f = −x2y − xy2

  • 0 on V

−x2 + y2 + x + y

0 on V

. f is not “minimal” on V . Take instead f /I(V ) = f

G .

Example

I(V ) has two GBs: G1 = {x − y + 1, y2 − y} and G2 = {y − x + 2, x2 + x}, and f has two normal forms: f

G1 = −y + 1 and f G2 = −x.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 48 / 57

slide-49
SLIDE 49

Polynomial model identification

Definition

Let f (x1, . . . , xn) be a polynomial model and consider a set V = {p1, . . . , ps} of distinct points in kn where k is a field. f is called identifiable by V if, by evaluating it at the given points, we can uniquely determine its coefficients.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 49 / 57

slide-50
SLIDE 50

Robbiano’s Theorem

Theorem (L. Robbiano, 1998)

Let f (x1, . . . , xn) be a polynomial function with monomials S = {t1, . . . , tr} and consider a set V = {p1, . . . , ps} of distinct points in kn where k is a field. Let also X(S, V ) be the s × r-matrix whose element in position (i, j) is tj(pi), the evaluation of tj at pi. Then f is identifiable by V if and only if rank(X(S, V )) = r.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 50 / 57

slide-51
SLIDE 51

Example

Let S = {1, x, x2} ⊂ F3[x, y], V = {(1, 0), (0, 1), (2, 1)}. X(S, V ) =     1 x x2 (1, 0) 1 1 1 (0, 1) 1 (2, 1) 1 2 1     The columns are linearly independent, so V identifies any polynomial whose support is in S. Using this result, the polynomial that is identified by the data is f = x + xy. However, f is not minimal because it is not a normal form wrt a GB (the standard monomials are 1, x, y).

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 51 / 57

slide-52
SLIDE 52

Theorem (D & Stigler, 2014)

Let M be the set of monomials that can appear in f and V = {p1, . . . , ps} ⊆ kn. Let S = {xα : xα|xβ for some xβ ∈ M}. V identifies f iff rank(X(S, V )) = |S|. This is essentially Robbiano’s theorem for minimal polynomials. Problem: Even with normal forms, there is still dependence on monomial order.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 52 / 57

slide-53
SLIDE 53

Hope: Some data sets have vanishing ideals whose reduced GB is the same regardless of the monomial order. Examples: I({p}), I(kn), I(kn − {p}) all have a unique reduced GB. Observation: If I(V ) has a unique reduced GB, then so does I(kn − V ).

Theorem (D & Stigelr, 2016)

Let k be a finite field of characteristic p and V ⊆ kn. Fix a monomial order. xα ∈ SM(I(V )) iff xα / ∈ SM(I(kn − V )), where xα is the monomial such that xαxα = xp−1

1

· · · xp−1

n

.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 53 / 57

slide-54
SLIDE 54

Example

Let k = Z2, R = [x, y, z], V = {(1, 0, 0), (1, 1, 0), (1, 0, 1)}. Consider the unique GB for I(V ): {x + 1, yz, y2 + y, z2 + z}. SM(I(V )) = {1, y, z}, so xα : xyz, xz, xy. To find the standard monomials for I(kn − V ) : 1, x, y, z,✚

xy, yz,✚

xz,✟

  • xyz. So

SM(I(kn − V )) = {1, x, y, z, yz}. Collorary: I(V ) has the same number of reduced GBs as I(kn − V ).

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 54 / 57

slide-55
SLIDE 55

Finding sets whose ideals have unique GB

Theorem (D & Stigler, 2014)

Let k be a finite field and V ⊂ kn be a set of points for which I = I(V ) has initial ideals in1(I), . . . , inm(I) in R = k[x1, . . . , xn]. Let M be the set of monomials xαi = xαi1

1

· · · xαin

n

that are standard in m

i=1 ini(I) but not in inj(I) for some j. Finally, let

F = {f ∈ R | f (p) = 0 ∀ p ∈ V and in(f ) = xαi ∈ M for some monomial order}. A set of points W ⊂ kn \ V with |W | = #st. monomials in

m

  • i=1

ini(I) − #st. monomials in in(I) is such that I(V ∪ W ) has a unique Gr¨

  • bner basis if and only if F ∩ I(W ) = ∅.
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 55 / 57

slide-56
SLIDE 56

Definition

For V1, V2 ⊆ Zn

p with |V1| = |V2|, we say that V1 is a linear shift of V2, denoted V1 L

∼ V2, if there exists φ = (φ1, . . . , φn) : Zn

p → Zn p such that V1 = φ(V2) and φi : Zp → Zp is defined

coordinate-wise as φi(xi) = aixi + bi for some ai ∈ (Zp)× and bi ∈ Zp for i = 1, . . . , n. Linear shift is a bijection between two point sets and defines an equivalence relation on subsets of Zn

p of the same size.

Theorem (He, D, Stigler, 2016)

Let V ⊆ Zn

  • p. If V is a linear shift of some staircase, then I(V ) has a unique reduced GB.
  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 56 / 57

slide-57
SLIDE 57

Future work

Find a necessary and sufficient condition on V , so that I(V ) has a unique reduced GB. Construct all sets of given size over a finite field whose ideals have unique reduced GB.

  • E. Dimitrova (Clemson)

Reverse engineering using computational algebra Algebraic Biology 57 / 57