600.325/425 Declarative Methods Assignment 5: Dynamic Programming - - PDF document

600 325 425 declarative methods assignment 5 dynamic
SMART_READER_LITE
LIVE PREVIEW

600.325/425 Declarative Methods Assignment 5: Dynamic Programming - - PDF document

600.325/425 Declarative Methods Assignment 5: Dynamic Programming Spring 2006 Prof. Jason Eisner TA: John Blatz Due date: Wednesday, May 10, 2 pm The questions in this assignment concern Dyna. Questions 34 are closely related to


slide-1
SLIDE 1

600.325/425 — Declarative Methods Assignment 5: Dynamic Programming

Spring 2006

  • Prof. Jason Eisner

TA: John Blatz Due date: Wednesday, May 10, 2 pm

The questions in this assignment concern Dyna. Questions 3–4 are closely related to questions you did in the Prolog assignment. Policies and general submission instructions are the same as for the previous assignment.

  • 1. To find out how to run the Dyna compiler and debugger, see the “CS undergrad

machines” section of http://dyna.org/JHU. Now work through the start of the tutorial at http://dyna.org/Tutorial: “hello world,” Dijkstra’s algorithm, and the debugger. Also read http://dyna.org/Several_perspectives_on_Dyna. There is nothing to hand in for this question. Important request: Please send feedback about Dyna’s usability to the cs325-staff email address. If the compiler or the visual debugger does something you don’t ex- pect, or gave a confusing error message, please forward it to us. We need the are happy to help quickly. Disclaimer: Dyna is both a language and an implementation. The prototype im- plementation that you are using is for an older version of the language, and has not been updated since May 2005. The same is true of the visual debugger, Dynasty. The new versions under development are cleaner, faster, and more powerful—as you saw in lecture—but are being rebuilt from the ground up and are unfortunately not ready for you yet. Email the professor if you would like to receive announcements of new versions in the future. Hint: We apologize for any rough edges you encounter in this assignment. If you encounter a cryptic error message, something poorly explained in the documentation,

  • r especially a bug, please don’t hesitate to inform us ASAP by email.
slide-2
SLIDE 2
  • 2. First, an easy problem to warm you up. A number of presidents of the United States

have been blood relatives of one another:

  • George Bush, George W. Bush (father and son)
  • John Adams, John Q. Adams (father and son)
  • Theodore Roosevelt, Martin van Buren (third cousins twice removed)
  • Theodore Roosevelt, Franklin Roosevelt (fifth cousins)

It is natural to ask questions like “Who was the most recent common ancestor of Theodore and Martin, and how recent was he or she?” (a) Write a short, well-commented Dyna program, ancestor.dyna, to find the most recent common ancestor of two people. You should be able to run the result as ./ancestor presidents.par queryA.par These files (like others in this assignment) are available from either http://cs. jhu.edu/~jason/325/hw5 or /usr/local/data/cs325/hw5. The presidents.par file contains both Roosevelt family data and Adams family data—plenty for you to experiment with. It’s traditional to call this a family tree, but actually it is a “family DAG” (directed acyclic graph). You are supposed to find the most recent common ancestor (not Adam or Eve). We define the “recency” of a common ancestor to be the total length of his or her shortest paths from the two descendants. For example, http://www.gwu. edu/~erpapers/abouteleanor/q-and-a/q6.htm shows that Franklin and his wife Eleanor had a common ancestor of recency 13: namely Nicholas, who was 6 generations above Franklin and 7 generations above Eleanor.1 Eleanor’s last name was Roosevelt even before she married Franklin! The most recent common ancestor of Nicholas and Nicholas was Nicholas, with recency 0. (For our purposes, he was his own ancestor.) Turn in your commented code in a file called ancestor.dyna. In your README, give the results of queryA, queryB, queryC, and queryD when you compile with

  • -driver=backtrace. Explain how to interpret these results. You could also

try --driver=dynasty bestonly. It is okay (but unnecessary) to write your own driver program or alter the .par

  • files. If you do these things, also turn in your changed files and alert us in your

README.

1Assuming that Nicholas had one wife who bore both his sons, she was another common ancestor of

recency 13.

2

slide-3
SLIDE 3

Note: In the current version of Dyna (v0.3), you will probably need to include the following to avoid a runtime type error when you read the .par file.2 :- structure(child(string,string)). Hint: An earlier version of queryA.par read person1("FRANKLIN DELANO ROOSEVELT 1882-1945") := 0. person2("ANNA ELEANOR ROOSEVELT 1884-1962") := 0. You may want to try solving the problem that way first. A hint: path_from_1_up_to(X) + path_from_2_up_to(X) But you will notice that your code for handling person1 and path_from_1_up_to is basically identical to the code for handling person2 and path_from_2_up_to. Duplicate code is inelegant and hard to maintain. The fix is to have a variable that ranges over 1,2. To do this, you’ll need to change path_from_1_up_to(Name) to path(1,Name), etc. This will allow you to eliminate your duplicate code, while changing queryA.par back to the version provided: person(1,"FRANKLIN DELANO ROOSEVELT 1882-1945") := 0. person(2,"ANNA ELEANOR ROOSEVELT 1884-1962") := 0. (b) We now turn to a related problem that has similar structure. The solution is fairly similar to ancestor.dyna, but a little trickier. You may know that on average, two siblings share 1/2 of their genes. But what fraction of their genes did Franklin and Eleanor share? Modify your .dyna and/or .par files slightly to answer this and similar ques- tions. You want to consider all the paths that relate Franklin and Eleanor. Because this is part b of the problem, call the new files ancestorb.dyna and presidentsb.par. For this question (i.e., 2b), you can assume that x and y’s common ancestry involves no inbreeding (where a child’s parents are genetically related, as in incest or cousin marriages). Then the expected fraction of genes that they share can be found as F(x, y) =

  • p∈P(x,y)

1

2

length(p)

(1)

2If you don’t include this line, the compiler concludes that the arguments of child can be arbitrary

  • terms. That’s great, except that, alas, native types like strings do not yet count as terms. They will in the

near future. (Java faced a similar situation until Java 1.5 introduced automatic upcast from int to Integer, known as “autoboxing”). This was not a problem in the tutorial because path.dyna used literal strings in the body

  • f the program,

in a way that allowed the compiler to guess the correct type declaration :- structure(child(string,string)). So the compiled path program was able to read flights.par.

3

slide-4
SLIDE 4

where P(x, y) is the set of paths in the family DAG that run from x up to a common ancestor and back down to y. This is closely related to the previous problem, where you were looking for the shortest path in P(x, y) instead of summing over paths. Under this assumption, what is the expected fraction of genes shared by

  • i. a parent and child?
  • ii. two siblings? (i.e., same mother and same father: sometimes called “full

siblings”)

  • iii. two half-siblings? (e.g., same mother, different fathers)
  • iv. two half-siblings whose fathers are brothers? (This might happen if a dead

man’s brother marries his widow, as actually required by the Bible. It is not a case of inbreeding, but the children have a stronger genetic relationship than in the previous case.)

  • v. an aunt and nephew? (e.g., the aunt’s full sister is the nephew’s mother)
  • vi. two full first cousins (e.g., their mothers are full sisters)?
  • vii. two first cousins whose mothers are full sisters and their fathers are half-

brothers? (Again, this is not a case of inbreeding, just a stronger genetic relationship than the previous case.)

  • viii. Franklin and Eleanor?

Hand in ancestorb.dyna and any .par files. Explain your strategy in your

  • README. You can answer the above questions either directly from the definition

(1) above, or by using Dyna with the tangled DAG in tangle.par. If you do both, you can check that your Dyna program gets the same answer as the formula—that’s what the graders will do. The most natural solution involves using += in your Dyna program. Then your presidentsb.par should use := 0.5 instead of := 1.3 Note: For this += program, you will want to use --driver=goal to see the an- swer, or --driver=dynasty in order to see the complete computation. (--driver=backtrace

3Alternatively, you may be able to get away without changing the .par files! The trick is to “work in

the log domain.” You are trying to multiply probabilities 0.5 · 0.5 · 0.5 · · · along each path, instead of summing edge lengths 1 + 1 + 1 · · · along each path as in the previous problem. However, if you are willing to output the negative logarithm of the probability instead of the probability itself, you can compute − log2(0.5 · 0.5 · 0.5 · · ·) as (− log2(0.5)) + (− log2(0.5)) + (− log2(0.5)) . . ., which is 1 + 1 + 1 · · ·! So it is really the same computation. Ordinary multiplication is accomplished by addition if you’re using logs. The question now is how to add up the probabilities over paths. Dyna provides an operator log+= to let you do this when your probabilities are expressed in the log domain. Suppose lx and ly are the (negated) logarithms of x and y. Then the rule lz log+= lx+ly increases lz from the (negated) logarithm of z to the (negated) logarithm of z+(x*y), being careful to avoid underflow. So it has the same effect on negated logs that z += x*y would have had on the original values.

4

slide-5
SLIDE 5

and --driver=dynasty bestonly are intended for min= or max= programs; they show

  • nly the best way of deriving each item.)

Hint: The main difficulty in solving this problem correctly is to consider only true paths in P(x, y). If x and y are half-siblings with common parent p, you should consider the path x → p ← y. But if g and h are grandparents, f is a great-grandparent, etc., you should not also consider the “paths” x → p → g ← p ← y, x → p → h ← p ← y, x → p → g → f ← g ← p ← y, etc. These are not technically paths because they repeat nodes—even though the two halves (e.g., x → p → g and g ← p ← y) are individually paths. To see that adding in the false paths would get an incorrect result, note that you can get more and more false paths by genealogical research that expands the family DAG with additional known ancestors of p—but surely turning up more ancestors should not make x and y any more related!4 There are several possible ways to exclude these false paths. One approach

  • bserves that any false path we find must have the form x · · · b → a ← b · · · y.5

In other words, the apex of the path (a) comes down through the same child (b)

  • n both sides. You want to exclude paths where the two b vertices are the same.

Unfortunately, the current implementation of Dyna won’t compile rules with in- equalities, like foo(X,Y) += bar(X,Y) whenever X != Y. (The whenever SOMETHING clause can’t yet evaluate arbitrary boolean expressions; it is only powerful enough to ask whether a value has been proved for SOMETHING at all.) But there is a workaround using subtraction: unequal(X,Y) += 1. unequal(X,X) += -1. foo(X,Y) += bar(X,Y) * unequal(X,Y). % multiply by 1 or 0 Of course, in the current implementation, even that will not quite compile. You need to introduce axioms one and minusone.6 And you need to ensure that variables in the head of the rule also appear in the body, e.g., unequal(X,Y) += 1 whenever interesting(X) whenever interesting(Y). so that unequal(X,Y) is only computed for a finite number of “interesting” vertices for which you actually might care about unequal(X,Y). (It is okay to mark all vertices in the DAG as interesting, though you could be more efficient.)

4Unless it discovers inbreeding, as you can explore in the extra credit part of the question. 5Only because of our assumption (in question 2b) that the family DAG contains no inbreeding. Otherwise

we could have a false path that merged, split, and merged again: x → p → b → a ← c ← p ← x.

6Actually, alas, you may have to set minusone to -0.99999 to avoid an error message about how the

current implementation won’t allow unequal(X,Y) to revert to 0 . . . :-(

5

slide-6
SLIDE 6

This kind of trick gives you at least two ways to exclude false paths x · · · b → a ← b · · · y:

  • As you build a path, the Dyna item could keep track not just of the path’s

bottom and top vertices, but also the vertex just below the top vertex. Then when you put the two halves of the path together, you can check using the subtraction trick above that they will not form a false path.

  • You could alternatively use subtraction to exclude false paths directly. First

write rules that ignore the false-path issue and sum freely over both true and false paths. Then write rules that sum over only false paths—where the two halves reach the apex a by way of the same child b. Subtract the second sum from the first sum, i.e., goal += true_and_false_sum. goal += -1 * false_sum. (c) [extra credit—recommended!] Now let’s get more sophisticated about the above problem by considering relatedness in arbitrary family DAGs, which may have inbreeding. I wrote that on average, two siblings share 1/2 of their genes. However, Anna Roosevelt and James Roosevelt, meaning the ones born in 1906 and 1907, were not ordinary siblings, because their parents (Franklin and Eleanor) were distant

  • cousins. What is the expected fraction of genes that they share?
  • i. First let’s consider the simplest possible case of inbreeding. A hermaphrodite

mates with itself.7 What is the expected fraction of genes that it shares with its offspring? See discussion below for how to compute this and the following.

  • ii. A parent and child mate incestuously with one another. What fraction of

genes does their offspring shares with each of them, on average?

  • iii. What is the expected fraction of genes that Anna and James share?
  • iv. What fraction of Anna’s genes does she share with herself? Explain what it

means that this answer is greater than 1. Should Anna worry about having a rare genetic disease because she might have inherited two copies of the same recessive mutant gene?

  • v. If Anna and James had a child, what fraction of its genes would it share

with itself?

  • vi. If a hermaphrodite h1 mates with itself, and its offspring h2 mates with

itself, and its offspring h3 mates with itself, etc., what can you say about hn as n → ∞?

7You may assume that the hermaphrodite is diploid (two alleles for each gene, just like people) and is

not itself inbred (the two alleles are always different, making the organism heterozygous).

6

slide-7
SLIDE 7

Let’s be precise about all this. In fact, even a random human and a random chimp share 98.4% of their genes. We are mainly concerned with “rare” genes— mutations conferring a rare disease or extraordinary musical ability—that en- tered the family DAG through only one of the roots (parentless people),8 so that if Anna and James both have it, they inherited it from the same source. For the sake of explanation, let’s focus on a particular gene—say, some hypothet- ical single gene that controls body odor.9 Each of the r roots of the DAG has two alleles for this gene (since homo sapiens is a diploid species—two copies of each chromosome). Let’s suppose for the sake of argument that those 2r alleles at the roots are all different from one another, so that each of the founding ancestors contributes his or her own special pair of distinct smells. (Even if some of the 2r alleles are really the same in practice, pretending that they are all different means that we are tracking not whether Anna and James share an allele, but whether they inherited the same copy of that allele.) Let’s formally define F(anna, james), the expected fraction of alleles that Anna and James share.10 We want to define this as the probability, averaged over all of Anna’s alleles g, that James also has g. But wait—thanks to inbreeding, James could have g twice, and we’d like this to count more than if James had g

  • nly once. So let’s define F(anna, james) as the expected number of copies of g

that James has: some number between 0 and 2. (We’d better check that this definition is symmetric, i.e., that F(anna, james) = F(james, anna). If Anna’s body odor alleles are gm and gf (inherited from her mother and father respectively), and James’s are hm and hf, then F(anna, james) can be regarded as the expected average of (gm == hm) + (gm == hf) and (gf == hm) + (gf == hf). In other words, it is half of the expectation

  • f (gm == hm) + (gm == hf) + (gf == hm) + (gf == hf), which is indeed

symmetric.) Based on this definition, here are some rules for computing F(x, y), which your Dyna program should reflect.

8We ignore the possibility of rare genes that were created once (or repeatedly) by recent mutations or

recombinations within the family DAG.

9It is fine for our purposes to consider each gene in isolation. It is true that genes on the same chromosome

tend to be inherited together, so the inheritance patterns are correlated. That is very important to genetic sleuths analyzing particular pedigrees where they already have some evidence of which family members share genes—a very interesting problem in graphical modeling, which is the probabilistic version of constraint

  • programming. But we are only concerned with the expected total number of Anna’s genes that James shares,

and for this purpose we can consider the genes separately (formally, because the expectation E[X+Y ] equals E[X]+E[Y ] even when we do not have independence between the random variables X and Y , corresponding in this case to different genes).

10Anyone know what geneticists call this number? They must have a name for it; I’m just giving my own

analysis of the problem here.

7

slide-8
SLIDE 8
  • Inheritance to the next generation: If we know relatedness in one generation,

we can compute relatedness in the next generation. Suppose x and y mate to produce z. Then for any w other than z itself or a descendant of z, F(w, z) = 1

2F(w, x) + 1 2F(w, y).11

Why? Consider a random allele g from w (if you like, one of w’s two body

  • dor alleles). How many copies does z have on average? Well, the mother

x has F(w, x) copies on average, of which z inherits half on average.12 And the father y has F(w, y) copies on average, of which z also inherits half on average. As an example of particular interest, consider w = x, which tells us how much x shares with her own child z: F(x, z) = 1

2F(x, x)+ 1 2F(x, y). Without

inbreeding, F(x, x) = 1 and F(x, y) = 0, so this works out to 1

2, as assumed

in question 2b. But the formula rightly expects it to show up more often (i.e., F(x, z) > 1

2) if an allele g from x is likely to appear more than once in

x (i.e., F(x, x) > 1), or is also likely to appear in y (i.e., F(x, y) > 0). Note that this example requires us to know F(x, x), which measures how

11As a sanity check, note that if F(w, x) and F(w, y) are both in the range [0, 2] as they’re supposed to

be, then so is F(w, z). As another sanity check, suppose w’s parents are u and v. Then there are two ways to compute F(w, z) (provided that all “no descendant” conditions are met), and fortunately they give equal results: F(w, z) = 1 2F(w, x) + 1 2F(w, y) = 1 2(1 2F(u, x) + 1 2F(v, x)) + 1 2(1 2F(u, y) + 1 2F(v, y)) F(w, z) = 1 2F(u, z) + 1 2F(v, z) = 1 2(1 2F(u, x) + 1 2F(u, y)) + 1 2(1 2F(v, x) + 1 2F(v, y))

12In other words, z inherits each of x’s copies of g with probability 1

  • 2. We are assuming that the fact that

g is known to appear in w does not make z any more or less likely to inherit that gene. This assumption fails in the case where w happens to be z, in which case z inherits g with probability 1. That is why w is not allowed to be z; we will handle the case w = z separately. More generally, the assumption fails if w is any descendant of z, even if w is not z itself. In that case, when we are trying to guess whether z inherited from x a given one of x’s copies of g, the probability is > 1

2

if we know that g also made it down to z’s descendant w. The assumption fails in the same way if w happens to be z’s identical twin, in which case z again inherits g with probability 1, or a descendant of z’s identical twin. The easiest way to handle identical twins is to pretend they are the same person, i.e., give them a single node in the family DAG, though this node may have roughly twice the typical number of mates and offspring. Biologically speaking, are we otherwise on safe ground? Almost. It is true as far as I know that inheritance events are independent—i.e., if you have allele g, you really are 1

2 likely to pass it on to each of your gametes,

independent of who you inherited it from or which of your relatives got it too. However, it may be that you are not 1

2 likely to pass it on to each of your children, since perhaps g is maladaptive (either by itself or in

combination with certain other alleles of the same gene or other genes) in such a way that the gametes with g are less likely to lead to actual births. Thus a child z that actually shows up in the family DAG as a birth may have < 1

2 chance of having g (and > 1 2 chance of having your other allele). We ignore this “selection

effect.”

8

slide-9
SLIDE 9

much x is inbred. We now turn to computing that.

  • Self-relatedness: The above rule explictly does not let us compute F(z, z),

for reasons explained in footnote 12. Now let’s handle that case. Again suppose that z’s parents are x and y. Then F(z, z) = 1 + 1

2F(x, y).13

Why? Consider a random allele g from z. How many copies does z have

  • n average? Well, it definitely has the 1 copy that we chose, plus perhaps

a copy inherited from the other parent. If g is the copy inherited from x, then we know that x had it, so y on average will have F(x, y) copies, of which z inherits half on average. If g is the copy inherited from y, then we know that y had it, so x on average will have F(y, x) copies (which we saw above = F(x, y)), of which z again inherits half on average. So either way, F(z, z) = 1 + 1

2F(x, y).

  • Base cases: Suppose we don’t know z’s parents because z is a root of the

family DAG. Then we set F(z, z) = 1, recalling our assumption that each root is heterozygous— it has only one copy of each of its two alleles (e.g., smells). For w = z, F(w, z) can usually be found by considering w’s parents and applying our earlier rules: F(w, z) = F(z, w) = 1

2F(z, u) + 1 2F(z, v). But

this too has a base case: what if w is also a root? Then we set F(w, z) = 0, recalling our assumption that distinct roots contribute distinct alleles (e.g., smells).

  • Awkward case: We have dealt with the case where z’s parents are both

known, and the case where z is a root with no parents. But what if only

  • ne of z’s parents is known?

We can avoid this case by first adding z’s other parent to the family DAG (as a new root node with no other children). The presidents.par file is already set up that way for you, e.g., child("Jacobus Roosevelt 1692-1776", "Nicholas Roosevelt 1658-1742") := 0.5. child("Jacobus Roosevelt 1692-1776", "not shown 5") := 0.5. Your extra credit challenge, beyond answering the questions earlier in this sec- tion, is to write a Dyna program ancestorc.dyna that can answer all such questions correctly. Hand in your answers, your program, etc. Also comment on what good could be achieved by changing the numbers 1 and 0 in the base cases F(z, z) = 1 and F(w, z) = 0 (for all roots w, z). Hint: You may find it easiest to compute F(w, z) for all pairs of vertices w, z, not just those related to the two individuals of interest.14 There is a simple recursive

13Again, as a sanity check, note that if F(x, y) is in the range [0, 2] as it’s supposed to be, then so is

F(z, z).

14If you like, you could then use a “magic templates” transformation as shown in class to restrict to only

9

slide-10
SLIDE 10

formula given above that handles most cases, but it has to be “corrected” when w is a strict descendant of z, and “corrected” in a different way when w is z

  • itself. These corrections can be handled by the same kinds of subtraction tricks

you used in question 2b. Hint: It’s okay if your Dyna program can’t handle the hermaphrodite questions. These are tricky only because we didn’t prepare to encode the input for such

  • cases. If you write

child("offspring","hermaphrodite") := 0.5. child("offspring","hermaphrodite") := 0.5. then you have not added two edges to the DAG, but rather added a single edge

  • f weight 0.5 and then changed its weight to 0.5 (not even really a change). This

wouldn’t be a problem if we had distinguished maternal and paternal edges in the first place, which would have allowed us to write mother("hermaphrodite","offspring") := 0.5. father("hermaphrodite","offspring") := 0.5.

  • r perhaps more simply

child("offspring",mom,"hermaphrodite") := 0.5. child("offspring",dad,"hermaphrodite") := 0.5. which could be derived automatically if you like from child("offspring","hermaphrodite" ,"hermaphrodite") := 1. % for example You are welcome to re-encode all the input this way and phrase your program accordingly.

  • 3. In assignment 4, you wrote Prolog code to find the longest increasing subsequence of

a given list. (Strictly increasing, i.e., no duplicates.) Suppose the input is [3,5,2,6,7,4,9,1,8,0]. We pointed out in assignment 4 that it would be a bad idea to generate all subsequences, such as [3,5,6,4,9], and then keep only the ones that were increasing. There would be 2n subsequences to generate and test. Instead, you did the < checks as you went along. You never even built [3,5,6,4,9]— because you wouldn’t have been willing to stick 6 at the front of [4,9] at an earlier step. So your Prolog code generated only the increasing subsequences only, and then picked the longest.

those computations that are strictly necessary.

10

slide-11
SLIDE 11

(a) Though better, why was that still inefficient? Give an example of a length-10 list where even generating all increasing subsequences would be slow. (b) Someone suggests the following “greedy” recursive solution: “To build the longest increasing subsequence of [3,5,2,6,7,4,9,1,8,0], first build the longest in- creasing subsequence of [5,2,6,7,4,9,1,8,0], then glue 3 on the front if you can.”

lis(list):

1.

if (list.empty())

2.

return []

3.

else

4.

subproblem = list.rest()

5.

(* find just the best solution to subproblem—not all solutions as in the Prolog version! *)

6.

subsolution = lis(subproblem)

7.

if (subsolution.empty() or list.first() < subsolution.first())

8.

return cons(list.first(), subsolution)

9.

else

10.

return subsolution

Try this function by hand on the list [3,5,2,6,7,4,9,1,8,0]. What is the big-O runtime? What answer does the function get? (c) Now you’ll correct the above solutions, using dynamic programming. As preparation, remember the in-class problem of maximum-weight independent set in a tree. (For simplicity, let’s restrict to just a simplified binary-tree version.) The algorithm had to solve 4 recursive subproblems rather than 2. It wasn’t enough just to get the max-weight independent set of the left subtree and also

  • f the right subtree! Rather, in each of the two subtrees, we had to get the max-

weight independent set and the max-weight unrooted independent set. Knowing that a partial solution was unrooted allowed us to determine that we could legally combine it with other partial solutions in certain ways. In other words, we decided to solve a slightly harder problem than the original. In effect, we wrote mis(tree, must be rooted). Then we called both mis(tree, false) and mis(tree, true), at different times. So mis had a slightly more general prob- lem to solve, but was able to solve it by relying on recursive copies of itself that could also solve more general problems! (This trick is known as “strengthening the inductive hypothesis.” You’ve done proofs by induction—basically a proof that calls itself recursively. Sometimes the

  • nly way to write one is to decide to prove a stronger theorem than you were as-

signed, because otherwise the recursive call—the “inductive hypothesis”—won’t establish all the results that you need in order to solve the original problem.) Now go back to the lis problem. What do you have to know about an increasing subsequence of the subproblem [5,2,6,7,4,9,1,8,0] in order to know whether 11

slide-12
SLIDE 12

you are allowed to glue 3 onto the front? (d) Given your answer to 3c, improve the pseudocode from 3b so that it will get the correct answer. Like mis, your lis will have an extra argument that is used in those recursive calls. And like mis, it will have to call itself more than once. (e)

  • i. How should you call your new two-argument lis function if you want the

longest increasing subsequence of [3,5,2,6,7,4,9,1,8,0]?

  • ii. The fact that your lis is multiply recursive suggests that it might benefit

from dynamic programming (i.e., reuse of subsolutions). Give an example

  • f a subproblem lis(x, y) that must be solved at least twice during the call

you just proposed.

  • iii. What specific technique would avoid the duplicate computation?
  • iv. What happens to your runtime if you don’t avoid the duplicate computation?

(For extra credit, prove your answer.) (f) Your lis function worked by backward chaining, starting with the original list and calling simpler subproblems as needed. Write a Dyna program that does essentially the same computation by forward chaining, starting with simpler lists and building the solution up from there. Hints:

  • Roughly speaking, the Dyna program will solve various lis(x, y) problems,

starting with x = [] and working up to bigger lists x.

  • To ensure that it doesn’t run forever, your program should confine that

computation to “interesting” lists that are tails of the original input. We saw how to do this in class: interesting(Xs) :- input(Xs). interesting(Xs) :- interesting([X|Xs]). % replace :- by the accumulation operator used in rest of program Then add “whenever interesting(...)” to some of your program’s other rules, to constrain their use.

  • Hint: The basic idea is to compute items of the form lis(x), where the

value of lis(x) should be the length of the longest increasing subsequence

  • f x. (Use --driver=backtrace to reconstruct an actual subsequence of

that length.) But you will have to strengthen the inductive hypothesis. So you should also, or instead, state recursive rules for computing items

  • f the form lis(x,y), where y tells you something about the actual best

subsequence. y should not actually specify a full subsequence, since then you’d face an exponential proliferation of possible items. The idea is to keep the number 12

slide-13
SLIDE 13
  • f items pretty small, by having each item lis(x,y) summarize a lot of

possible subsequences of x and record only the length of the best one.

  • The current Dyna compiler, alas, does not yet let you write A < B in a
  • program. We have written a little hack for you in lessthan.dyna. Read

that file carefully. Your lis.dyna can include its material as follows: #include "lessthan.dyna" Note: If you get a compiler error about “inconsistent type declarations,” email us your code and we’ll help.15 Turn in your commented code as lis.dyna. Also turn in a parameter file, lis.par, that specifies the [3,5,2,6,7,4,9,1,8,0] problem. (g) In your README, explain how to decode your program’s output (e.g., on lis.par) when it is compiled with --driver=backtrace. In other words, how could you figure out what the best subsequence is? You might also want to use

  • -driver=dynasty bestonly, with the option “Tools / Show Node Values.”

(Note that in practice, you wouldn’t decode the backtrace text. The C++ chart object provides programmer-friendly methods that let you extract the same

  • information. So it’s cleaner to write your own C++ driver program.)

(h) What does “strengthening the inductive hypothesis” correspond to in Dyna?

  • 4. [425] In assignment 4, you wrote Prolog/ECLiPSe code to generate balanced binary

search trees. Balanced trees might not always be what we want in practice, though. If some of the keys in the tree will be searched for much more often than the rest, then it will be more efficient in the long run to store those keys closer to the root, even if this means pushing other keys further down. Suppose that we have a fixed set of keys that we want to store in a search tree, and we know (or can guess) exactly how many times per day we will want to search for each key. We want to construct a search tree so that the total number of nodes visited per day is minimized. Because of the property that a subtree of an optimal search tree is itself an optimal search tree (for the keys in the subtree), constructing optimal search trees can be solved efficiently by dynamic programming.

15Dyna is a typed language, for reasons of safety and efficiency. But by default, the compiler will try to

guess whatever type declarations you left out. If it can find equally strong arguments for different types, it will give up and ask you to include more declarations. (The current error message is misleading, since it sometimes means “insufficient” rather than “inconsistent.”) I ran into this when solving this question myself. We were already considering strengthening the type inference algorithm in such a way that it would have succeeded on my program. Meanwhile, adding the following line may be enough (it was for me): :- structure(cons(int,list)).

13

slide-14
SLIDE 14

See http://www.cs.auckland.ac.nz/software/AlgAnim/opt_bin.html for a full exposition of the problem, a traditional dynamic programming algorithm for solving it, and a very helpful Java animation of the algorithm. (a) Translate the above algorithm into a running Dyna program. Turn in your commented code as optBST.dyna. Your program should look like a very concise statement of the algorithm. Sample inputs abcde.par, words1.par, words2.par, words3.par, and words4.par can be found in the usual directory. These problems are taken from real data and are progressively larger. Since the algorithm has O(n3) runtime, the final input file might require half an hour to find the final answer. (b) In your README, explain how to decode your program’s output (e.g., on abcde.par) when it is compiled with --driver=backtrace. In other words, how could you figure out what the optimal tree is? For the small input files, you might want to try --driver=dynasty bestonly, with the option “Tools / Show Node Values,” or perhaps even --driver=dynasty, which shows all the ways of building each item, not only the best way. (c) The root of the optimal tree is the same word for all four of the words*.par

  • files. What is this word and why is it so great? Is it the most frequent word?

The most central word? If not, then what? (d) For this program, it is safe to add the declaration :- converges(goal, first pop). That says that one has found the true value of goal as soon as one has processed the first update to it. If one only wants to know goal, then it is not necessary to process further updates from the agenda until it empties out. How much does this speed things up? (Note: One could get considerable further speedup by designing a better or- dering for the agenda. The default is to process the lowest-value items first. A better class of solutions includes an A* heuristic. At the moment, this can be done only by writing a C++ function to define the priorities. Eventually, though, the priority of an item foo(123) will be represented by another item priority(foo(123)), whose value can be defined and updated directly in Dyna like any other item.) Hints:

  • In the input files, a line like key("d",3,4) := 15 encodes the statement that

”d” is the 4th key in sorted order, and that you expect to search for it 15 times per day. You might expect just key("d",4) := 15. We include 3 in the item name merely as a trick to make it easy to combine this item key("c",2,3) that represents the previous key. 14

slide-15
SLIDE 15
  • Your job is to compute the value of items like cost(2,5). This should repre-

sent the total cost of the best search subtree that spans all 3 consecutive keys key("c",2,3), key("d",3,4), and key("e",4,5). (Note that 5 − 2 = 3.) The cost of a subtree is the total number of visits per day to nodes in that subtree. In general, cost(I,J) is the cost of the best subtree T spanning the range I to

  • J. You may want to try writing a mathematical formula expressing this as a

minimum over the different possible choices of breakpoint in T. Dyna notation is rather close to such formulas.

  • Ultimately, you want to find the cost of the best search tree spanning all n
  • keys. So write a rule that defines goal accordingly. It should use a clause like

“whenever numkeys(N).”

  • You’ll want the declaration

:- structure(key(string,int,int)).

  • Note that when you use a subtree as the left or right child of a larger tree, visiting

any node in the subtree will also mean visiting the root node of the larger tree. How does that affect the cost of the larger tree? You’ll need to strengthen your inductive hypothesis so that you have enough information to compute this new cost. In this problem (unlike problem 3), strengthening the inductive hypothesis doesn’t mean solving more-detailed subproblems, but rather returning more-detailed subsolutions. What does this correspond to in Dyna? This should give you a new way to answer question 3h.

  • Hint: Avoid having too many items on the right-hand side of a rule, as in a

+= b*c*d*e. The way that you arrange and parenthesize them can have ma- jor consequences for speed, e.g., a += b*(c*(d*e)) versus a += (b*e)*(c*d). Writing things out with temporary items will help you see which versions are more efficient. For example, a(X) += (b(W,X)*c(X,Y))*d(W) is equivalent to temp(W,X,Y) += b(W,X)*c(X,Y). % creates a cubic number of temp items a(X) += temp(W,X,Y)*d(W). % sums over W and Y whereas a(X) += (d(W)*b(W,X))*c(X,Y) gets the same answer at only quadratic cost, being equivalent to temp(X) += d(W)*b(W,X). % sums over W a(X) += temp(X)*c(X,Y). % sums over Y 15

slide-16
SLIDE 16

A summary of what to turn in for this assignment:

  • The dynac.log.gz file that was created in the directory where you ran dynac. This

will not be used when grading, but will help us improve Dyna by counting how

  • ften different error messages were encountered by novice users. (If you did different

problems in different directories, please submit all your log files.)

  • Your README file, including clear, well-thought-out answers to all of the questions.

You may want to include general comments about Dyna.

  • Turn in well-commented Dyna code for problems 2, 3, and 4. Explain how your

code works! (You can give explanation in your README if you prefer.)

  • DO NOT submit binaries, C++ driver programs,16 or any of the auxiliary files

created by dynac. Just submit the stuff you wrote yourself. Good luck! p.s. One of the goals of the Dyna project is to create a language that is easy to use, even for people not familiar with logic programming or dynamic programming. Any feedback you can offer as to how to achieve that better in the new version will be greatly appreciated.

16Unless, of course, you wrote your own—but using standard drivers like --driver=backtrace should be

sufficient for this assignment.

16