Hi. Can you answer all of the 3 questions listed in the attached file and explain the answers you will provide?
hw2.pdf

Unformatted Attachment Preview

4. (20 points) Suppose that while waiting for Joe Biden to speak, the audience members
start to get a little boisterous. You realize that it’s because there are Republicans in the
audience and they are sprinkled throughout the mostly Democrat audience. To make
things more peaceable you decide that you will rearrange people by political affiliation.
However, to avoid creating even more chaos, you want to do this quickly and in a way
that doesn’t create a “free for all” with everyone standing up at once and moving about.
Assume the seats are numbered 1 to n. All seats are occupied. You can move to any
seat and inquire about the political affiliation of the person sitting there, which must
be either D or R (independents stayed home). You can remove someone from a seat,
creating an empty seat. You can also move someone from one seat to an empty seat.
Finally, you can “swap” two people. Each of these operations has a “cost” of 1 unit. At
any given time only a small number of people should be without a seat.
(a) Describe the most efficient algorithm (i.e. lowest cost) for moving all of the Republicans to the seats in the back (meaning the highest numbered seats). Please
write your algorithm in the same style as the algorithms described in Algorithms
Unlocked and on the handouts.
(b) Analyze the runtime of your algorithm. You can give a rough approximation using
big Oh notation, such as O(lg n), or O(n), etc.
5. (16 points) Your task in this question is to compute the edit distance between two strings
(arrays of characters). In addition, you must identify the sequence of edit commands
that achieves the edit distance. Here is an example. Suppose we want to calculate the
edit distance between s =’ABC’and t =’AxCD’. First, we calculate the edit distance
matrix. The first cell in the matrix, the one corresponding to (_, _), is always equal to
zero.
_
A
B
C
_
0,
1,
2,
3,
A
1,
0,
1,
2,
x
2,
1,
1,
2,
C
3,
2,
2,
1,
D
4
3
3
2
The sequence of edit commands that achieves this edit distance:
COPY A
SUB B
COPY C
INSERT
-> A
-> x
-> C
D
(cost
(cost
(cost
(cost
0)
1)
0)
1)
Complete the following problems using the style shown in the example above. In addition
to lecture notes, you may wish to consult the Wikipedia page https://en.wikipedia.
org/wiki/Levenshtein_distance. (Please cite any additional sources.)
(a) Compute the edit distance matrix between strings s =’zAzB’and t =’AB’.
(b) Compute the sequence of edit commands that achieves the above edit distance for
strings s =’zAzB’and t =’AB’. (Please follow the template above.)
(c) Compute the edit distance matrix between strings s =’aAB’and t =’ABzz’.
(d) Compute the sequence of edit commands that achieves the above edit distance for
strings s =’aAB’and t =’ABzz’. (Please follow the template above.)
Algorithm 1 An algorithm for seeking location of an entry in a sorted array
1: procedure Seek(A, x) . Find x in sorted array A of length n or return location where
x would be inserted if it’s not there.
2:
Set p to 1 and r to n.
3:
while p ≤ r do
4:
Set q to b(p + r)/2c.
5:
If A[q] = x, then return q.
6:
Otherwise, if A[q] > x, then set r to q − 1.
7:
Otherwise, it must be that A[q] < x, so set p to q + 1. 8: end while 9: We know x is not there. Return p. 10: end procedure 6. (20 points) In this question, I’m going to ask you to analyze the runtime of an algorithm. But first, let me set up the problem. Recall from Lecture 12 and Ch. 2 of Nine Algorithms the problem of finding web pages that contain certain search terms. For this problem, we assume we have the following structure. Array W is a sorted array. An entry of the array W is a pair: a word and an array of document ids. The array of documents ids represents the ids of documents where that word occurs. Each array of document ids is also sorted. Here is an example that matches the example on p. 15 of Nine Algorithms. index word document id arrays 3 1 a 1 3 2 cat 2 3 3 dog 1 2 4 mat 1 2 5 on 1 3 6 sat 2 3 7 stood 1 2 3 8 the 3 9 while Let’s assume that binary search can be extended to handle arrays where the entries of the array are pairs. So if I do binary search on W for word “dog”, I would get back 3. Then I could retrieve the document id array for “dog.” Let’s call this Bdog . The array Bdog has two entries 2 and 3. So, if I index into Bdog at position 2, as in Bdog [2], I would get back 3. The Seek algorithm shown below is identical to BinarySearch except for one key difference: if x is not in the sorted array, then it returns the index where x could be inserted while still maintaining the array in sorted order. For example, Seek(W, “colgate”) would return 3, indicating that “colgate” could be inserted after “cat” but before “dog.” The ZigZagJoin algorithm solves the following task: given two words such as “cat” and “dog” it returns an array that contains the document ids of those documents that contains both words. In this example, it would return an array containing a single entry, document 3. The basic idea of ZigZagJoin is the following: since the document id arrays are in sorted order, we can repeatedly use binary search to find the next matching document id. This allows to potentially skip large numbers of non-matching documents. Algorithm 2 An algorithm for finding documents that contain two given words. 1: procedure ZigZagJoin(W, v, w) . W is an array as described above and v and w are two words. 2: Initialize results to be a sufficiently large empty array. Initialize r to 1. 3: Run BinarySearch on W to find the entry for v. 4: If v is not in W return the empty results. 5: Otherwise, let Bv be the array of document ids for v. 6: Run BinarySearch on W to find the entry for w. 7: If w is not in W return the empty results. 8: Otherwise, let Bw be the array of document ids for w. 9: Let n1 be the number of entries in Bv ; n2 the number in Bw . 10: Initialize i = 1 and j = 1. 11: while i ≤ n1 and j ≤ n2 do . We’re not yet at end of either array 12: Let docidv be the document id Bv [i]. 13: Let docidw be the document id Bw [j]. 14: If docidv = docidw , then both words appear in this document. 15: Put docidv in results[r] and increment r. Increment i and j as well. 16: Otherwise, if docidv < docidw , then set i equal to the result of Seek(Bv , docidw ). 17: Otherwise, we know docidw < docidv . Set j equal to Seek(Bw , docidv ). 18: end while 19: Return results. 20: end procedure Analyze the runtime of ZigZagJoin using big Oh notation. Assume the following: • There are m words in W . • There n1 documents that contain word v and n2 documents that contain word w. • There are k documents that contain both v and w. Hint: the correct answer includes m, n1 , n2 , and k. For full credit, give the smallest big Oh expression possible. For example, while it’s true that the runtime is O(2m+n1 +n2 +k ), the runtime is considerably smaller than the function in this big-Oh expression. This space is blank for you to analyze the runtime of ZigZagJoin. At the top of this page, give your big-Oh expression. Then include a brief justification for it. I would like to clarify a few points: For this question, please consider worst case runtime. For line 2 of the algorithm, creating the result array, assume that takes 1 unit of time. Your answer should include m, n1, n2 but may not necessarily include k. Finally, if you find the question difficult, my suggestion is to write out answers to the following questions. In your own words, describe what this algorithm does. You might include a short example illustrating the algorithm. Suppose n1 = 1, what is the runtime? Suppose n2 = 1, what is the runtime? Suppose that n1 = k and all k documents contain both words. What is the runtime? What is a general expression for the (worst-case) runtime? Answering some but not all of these questions will earn you partial credit. My goal with this question is that (a) you can read an algorithm description, work through it, and make sense of it at a "higher level" and (b) that you can use that "higher level" understanding to analyze its runtime. ... Purchase answer to see full attachment