15-110: Principles of Computing

15-110 Spring 2011 Homework 3
Due Sunday, 6-Feb, at 6pm

Special Note: Due to the Super Bowl, this week's hw is due Sunday at 6pm. Also, we will provide extra coverage during the Sunday office hours prior to 6pm. Go Steelers!!!!

Same basic directions as hw2, so read those carefully (replacing "hw2" with "hw3", where appropriate, of course).

Additional notes:

Start with hw3.zip.
This week's hw has a special emphasis on applications in Science. Future hw's will focus on other disciplines. Our aim is to provide at least two weeks of special emphasis for each college at CMU.
Reminder: portions marked SOLO must be solved entirely on your own (you may discuss with course staff, but not with each other or any outside sources). See syllabus for details.
This week you may use "if" statements (conditionals), and "for" or "while" statements (loops), but not recursion, or collections (lists, dictionaries, etc).
Unless indicated otherwise, you are not responsible for malformed input.

Practice:

To help you get started, here are some hw3 practice problems (with solutions and short video tutorials)

SOLO Problem-Solving (with Conditionals and Loops) [35 pts]

SOLO DNA Complementary Strands Verification [10 pts]
Background: DNA strands are composed of sequences of 4 nucleotides (Adenine, Cytosine, Guanine, and Thymine). Adenine and Thymine form A-T bond pairs, and Guanine and Cytosine form G-C bond pairs. Two DNA strands are complementary, and so can form a double helix, if each corresponding nucleotide can form a bond pair. For example, each Adenine in one strand must correspond to a Thymine in the other.

Your task: Write the function areComplementaryStrands (in the file hw3-solo.py). This function takes two DNA strands, represented as Python strings of the letters "A", "C", "G", and "T" (for the 4 nucleotides), and returns True if the two strands are complementary and False otherwise.   Here are some test cases for you:

    assert(areComplementaryStrands("AAGCATCA", "TTCGTAGT") == True)
    assert(areComplementaryStrands("AAG", "TTC") == True)
    assert(areComplementaryStrands("AAG", "TTT") == False)
    assert(areComplementaryStrands("AAG", "TTCC") == False)
SOLO Hardy-Weinberg Principle + Mendelian Genetics [10 pts]
Background: Read the wikipedia page on the Hardy-Weinberg Principle. Note: this is not a course in genetics, so you really do not have to understand the Hardy-Weinberg Principle, nor is it a course in statistics, so you do not have to understand a chi-square test. But you do have to understand how to take a page of fairly clear text describing straightforward algorithms and to translate them into working Python code!

Your tasks: First: Write the function hardyWeinbergChiSquared (in the file hw3-solo.py). This function takes three values -- the observed populations with dominant (AA), heterozygous (Aa), and recessive (aa) genes -- and returns the value of chi-squared resulting from testing the "null hypothesis" that these values are in Hardy-Weinberg Frequencies. Again, you do not have to really understand what that means to do this problem (though it would be nice, it's not required) -- you just have to follow the logic in the "Example chi-squared test for deviation" section of the article). For example, that page includes a table of observed populations of Scarlet Tiger moths:
   Genotype: White-spotted(AA) Intermediate(Aa) Little spotting(aa)
   Number:          1469               138                 5
Following this table, the article provides a fully worked-out example showing how to derive the chi-squared value for this data. Note that in this section, obs(Q) means the observed number with property Q, and exp(Q) means the expected number with property Q (the closer these two are, the better the model fits the data). For this data, chi-squared = 0.83, and so:
     hardyWeinbergChiSquared(1469, 138, 5)
should return:
     0.83
Hint: this function may take integers as parameters -- beware of inadvertent errors caused by integer division! In any case, here is a test case for you (the one given above, but coded in Python):

    assert(almostEqual(hardyWeinbergChiSquared(1469, 138, 5), 0.83))

Note that this uses our almostEqual function because we cannot safely compare floating-point numbers for equality using "==" (why not?). Here is our almostEqual function:

def almostEqual(d1, d2):
    return abs(d1 - d2) < 0.001 # adjust this value as appropriate

Second: Write the function inHardyWeinbergFrequencies (in the file hw3-solo.py). This function takes three values -- the observed populations with dominant (AA), heterozygous (Aa), and recessive (aa) genes -- and returns True if these numbers are consistent with Hardy-Weinberg Frequencies and False otherwise. From the article, we see that this is true if the chi-squared test is below the 5% significance level, which for 1-degree of freedom (as we have here) is True if chi-squared is less than 3.84. Thus, this function can be written in one line of code, using hardyWeinbergChiSquared as a helper function. In the given example, chi-squared is 0.83, so your function would return True. What does this True mean? It means that, absent excessive forces from inbreeding, selection, mutation, or other "deviations", the observed ratios of Scarlet Tiger moths should be self-perpetuating. Fascinating! Here is a test case for you (the one given above, but coded in Python):

assert(inHardyWeinbergFrequencies(1469, 138, 5) == True)
SOLO Hardcoded 4-Person Knights and Knaves [10 pts]
Background: for this problem, we revisit Knights and Knaves. Our goal is to write a general-purpose solver for any N-person Knights and Knaves problem. For that, we need to use some techniques we've not yet covered. So instead, this week we'll take a step in that direction, and write a solver for a specific 4-person problem (though these techniques can be used to solve any specific Knights and Knaves problem). We will solve Puzzle #145:
Abe tells you, `I know that I am a knight and that Dave is a knave.'
Dave claims, `Abe and I are the same.'
     Sally says that neither Abe nor Dave are knaves.
   Zippy claims that Abe is a knave.

Note that you will receive zero credit if you manually solve the problem and then hardcode the answer. To receive credit, you must follow the algorithm described here, in which your code closely follows the solution strategy we covered in the course notes on Knights and Knaves.

Your tasks: First, write the helper function psColumn4 (in the file hw3-solo.py) that takes 8 boolean (true/false) parameters: the truth values for each of the four inhabitants (and here we are hardcoding for 4 inhabitants -- that's the 4 in psColumn4) -- call these A, B, C, and D for this explanation -- followed by the truth values of their statements (SA, SB, SC, SD). Notice that this corresponds to one row of our truth table. The function completes the truth table for that row, computing IA, IB, IC, and ID, and finally computing PS. The function returns the value of PS for that row.   Here are some test cases for you:

    A = True
    B = True
    C = False
    D = False
    SA = True
    SB = True
    SC = False
    SD = False
    assert(psColumn4(A, B, C, D, SA, SB, SC, SD) == True)
    SD = True
    assert(psColumn4(A, B, C, D, SA, SB, SC, SD) == False) # because D != SD

(Note: when we assign required helper functions such as psColumn4, we will grade these -- robotically and/or manually -- as well as the main function in the exercise.)

Second, write the function solveKnightsAndKnaves145 (in the file hw3-solo.py). This function takes no parameters and returns a string encoding the solution to the puzzle, where the participants are listed in alphabetic order, and uppercase means Knight and lowercase means Knave. So, for example, "AdsZ" means that Abe and Zippy are knights, and Dave and Sally are knaves. How do we write this function? Simple: we just call psColumn4 over and over again, once for each row -- if that function returns True, then we found the winning row, so we can return a string representing the truth values of the inhabitants on that row. And to call psColumn4 once for each row, the function will use FOUR NESTED "for" LOOPS, with each loop assigning a value from 0 to 1, inclusively, to a variable representing one of the inhabitants. Inside the innermost loop, assign truth values to A, D, S, and Z based on those 0-or-1 values in the loop variables, then assign values to SA, SD, SS, and SZ based on the problem statement, and finally call psColumn4 to finish the row. Note that test cases are not really interesting here, since this function takes no inputs and always returns the same value; even so, here is the one-and-only test case for you:

    assert(solveKnightsAndKnaves145() == "Adsz")

Yes, this includes the final answer to the problem (Abe is a knight, the rest are knaves). You will be graded here not on what answer your function returns, but how it computes that answer.

Collaborative Problem-Solving (with Conditionals and Loops) [45 pts]

DNA Strand Pattern Matching [10 pts]
Background: Many questions in genetics involve searching for various patterns in DNA sequences. Some of these searches require advanced algorithms, but others can be implemented with simpler algorithms. Here we will explore a fairly straightforward DNA string pattern matcher.

Your task: Using the helper functions described below (so finish reading this entire question before starting!), write the function findPattern (in the file hw3-collab.py) that takes two strings and an integer, where the first string represents a DNA sequence (composed of the symbols "A", "C", "G", and "T"), the second string represents a pattern to find, and the integer is the index in the DNA sequence from which to start the search. The pattern is also composed of "A", "C", "G", and "T", but can also contain these characters:
character meaning
     "."      match any single letter ("A", "C", "G", or "T")
     "x"      match either "A" or "T"
     "y"      match either "C" or "G"
Your function should return the index of the start of the first match (starting from the given start index), or -1 if there are no matches.   Here are some test cases for you:

    assert(findPattern("CGACGATACGTTCT", "ACG", 0) == 2)
    assert(findPattern("CGACGATACGTTCT", "ACG", 3) == 7)
    assert(findPattern("CGACGATACGTTCT", "Cyx.C", 0) == 8)

To encourage you to use functional composition, your findPattern function must use the helper function findPatternAtIndex. This helper function takes three parameters -- the DNA string, the pattern string, and a specific targetIndex -- and returns True if the given pattern occurs in the given DNA string starting only at that given index, and False otherwise. Your findPattern function will call this helper function repeatedly (presumably with different starting indexes). Here are some test cases for you:

    assert(findPatternAtIndex("CGACGATACGTTCTC", "ACG", 0) == False)
    assert(findPatternAtIndex("CGACGATACGTTCTC", "ACG", 2) == True)
    assert(findPatternAtIndex("CGACGATACGTTCTC", "ACG", 7) == True)
    assert(findPatternAtIndex("CGACGATACGTTCTC", "Cyx.C", 0) == False)
    assert(findPatternAtIndex("CGACGATACGTTCTC", "Cyx.C", 3) == False)
    assert(findPatternAtIndex("CGACGATACGTTCTC", "Cyx.C", 8) == True)
    assert(findPatternAtIndex("CGACGATACGTTCTC", "Cyx.C", 14) == False)

Continuing with the theme of functional composition, your findPatternAtIndex function must use the helper function symbolsMatch. This function takes two symbols, one from a DNA string and one from a pattern string, and returns True if the two match as described above, and False otherwise. Here are some test cases for you:

    assert(symbolsMatch("A", "A") == True)
    assert(symbolsMatch("A", "C") == False)
    assert(symbolsMatch("A", "x") == True)
    assert(symbolsMatch("T", "x") == True)
    assert(symbolsMatch("A", "y") == False)
    assert(symbolsMatch("T", "y") == False)
    assert(symbolsMatch("A", ".") == True)
    assert(symbolsMatch("C", ".") == True)
    assert(symbolsMatch("C", "y") == True)
    assert(symbolsMatch("G", "y") == True)
Pendulum Motion [10 pts]
So you know: In this problem you will be translating some complicated math into Python. But don't fret! This is not a math course, and you do not have to understand the underlying math (for the most part). But you do have to understand how to translate it into Python. This is an invaluable skill: in many application areas, you will come across really complicated math expressions, the kind that are devilishly hard to reason over. But if you can manage to code them into Python functions, then you can gain control over them and reason over them by running experiments on those Python functions. What's more, even if the math itself is complicated, translating it into Python doesn't have to be. That is a key point of this exercise.

Background: in high school physics, you may have learned that the period of a pendulum is the time it takes for the pendulum to swing all the way from one side to the other and back again. What's more, this period is basically constant for a pendulum of a given length, which is what makes pendulum clocks work so well. Specifically, high school classes usually teach that a pendulum of length L (in meters) has the following period (T, in seconds):

Here, g is the force of gravity on Earth, or about 9.8 m/s². This not-so-complicated formula is a simplification that assumes that the period is independent of the initial angle at which you start the pendulum. This assumption is false, as it turns out. But for small angles, it works well enough, and it keeps the math much simpler, which is why the formula is usually taught like that in high school.

Now, we'll use the "real" formula that depends both on L, the length of the pendulum (in meters), and theta0, the initial angle (in radians, where 0 radians is pointing downwards to start (so the pendulum doesn't move!), and pi/2 radians is holding the pendulum horizontally to start):

Ok, now the math isn't so easy. But, unfortunately, this time the math is real. This is how pendulums really work. So we need to reason over this math. But how? It's so complicated! Answer: code it up in a Python function, then use that function to reason over the math. This may still be a little challenging, but it's definitely more doable for many people than directly analyzing the math.

Your task: Using the helper functions described below (so finish reading this entire question before starting!), write the function T(L, theta0) (in the file hw3-collab.py), which takes a length L in meters and an initial angle theta0 in radians, and returns the period T of the given pendulum according to the complicated formula above. Then, answer the free-response questions at the end of this exercise.

To solve this, we'll work the problem from the inside out. Way on the inside we see n! and also (2n)!, where ! means factorial. So we'll start here. Write the function factorial that takes a non-negative integer n and returns n!, which is the product 1*2*...*n (or just 1 if n=0). Here are some test cases for you:

    assert(factorial(1) == 1)
    assert(factorial(5) == 5*4*3*2*1)
    assert(factorial(10) == 10*9*8*7*6*5*4*3*2*1)
    assert(factorial(0) == 1)

Next, we'll take on the part of the equation inside the big brackets:

This expression uses two variables, n and theta0, so we'll model it as a Python function seriesTerm (the name should become clear soon), that takes two parameters, n and theta0. Naturally, this function will use the factorial function we just wrote. It will also have to use the function math.sin to compute the sine function (and for that you will have to "import math"). It will also have to use the ** operator for exponentiation. Also, note that sin²ⁿ(x) means the same thing as (sin(x))²ⁿ. Armed with this information, write the function seriesTerm(n, theta0). The file hw3-collab.py includes a few test cases for you, and while they are helpful for testing purposes, they are not very illustrative, so we omit them here.

At this point, we can replace the expression in the brackets with the function seriesTerm(n, theta0), so we have:

Next, we need to take on the sigma (the angular S-like (or perhaps E-like) symbol with the n=0 below and the infinity sign above). This sigma indicates that we should add up all the values of seriesTerm(n, theta0) where n goes from 0 to infinity. Now, it's hard to add an infinite number of terms in Python, so we'll just go up to n=k (inclusively), for some finite k. So we have:
seriesSum(k, theta0) = seriesTerm(0,theta0) + seriesTerm(1,theta0) + ... + seriesTerm(k,theta0)
Now this doesn't look so bad! Write the function seriesSum that takes an integer k (the largest value of n in the series) and the angle theta0, and returns the sum given above. As before, a test function is included in the file hw3-collab.py for this function.

So now we have:

And now you can take this all the way home. Write the function T(L,theta0) that computes the previous expression (use math.pi for the value of pi). Note that we need to choose a value for k -- how many terms to use from the seriesSum. For now, let's go with k=20. Again, a (modest) test function is included for this function.

And there you have it! You have modeled that complicated math expression as a Python function. Congratulations! And so now we ask a few questions about that function. Include your answers in the file hw3.doc (or whatever legal extension you prefer), which should be placed in the hw3.zip file you submit. For each question, provide both an answer to the question and a clear explanation as to how you used the T(L,theta0) Python function to answer the question.

First, confirm that the simple formula (the one that does not depend on the starting angle theta0) and our more complicated formula, T(L,theta0), return the same value when theta0 is 0, regardless of the length L. Try a few different L's to confirm. Be sure to use almostEqual and not == in your test, since we are dealing with floating-point numbers here, so some small error is acceptable and expected.
Looking at the formula for T, it appears that as theta0 increases, T increases. That is, as our starting angle increases, the actual period is longer than the period predicted by the simple formula. For a 1-meter pendulum, how many times longer is the actual period than the period predicted by the simple formula when theta0 is 5 degrees (which equals pi/36 radians)? (That is, what is the ratio of T(1,pi/36) to SF(1), where SF is the simple formula that only depends on L and not theta0?)
Repeat the previous question where theta0 is 90 degrees (which equals pi/2 radians, or when we start with the pendulum extended horizontally).
Repeat the previous question using a length L of 100 meters. Does the result depend on the length of the pendulum?
Do you think the predicted error in the simple formula is large enough that you could observe it in a real, physical pendulum? Why or why not? In any case, what starting angle between 0 and 90 degrees appears to give the best chance of observing the predicted error?
We somewhat arbitrarily chose a value of 20 for k. Try different values for k (the number of terms we used when computing the seriesSum), then suggest if 20 is a good value to use, or if you think there is a better value to use. Either way, briefly explain your answer. Why not use a value of, say, 1 million to ensure super-high accuracy?
Bonus/Optional (1 pt): Use your Python function to determine the angle at which the observed period will be 2% longer than the time predicted by the simple formula. Hint: you can use binary search for this. You know that a theta0 of 0 is too low, and as a hint, a theta0 of pi/2 is too high (though you should verify this). So keep dividing in half until you get fairly close to the answer (remember not to use == with floating-point numbers).
Bonus/Optional (2 pts): Construct an actual physical pendulum (we used some string and a light weight for ours). For credit, submit a jpg picture (no larger than 50k) in your hw3.zip file of you with your pendulum. Next, do the physical experiments necessary to try to confirm what your Python models predict. That is, start your pendulum at some small angle and measure its period. Does the simple formula predict the behavior? How about the more complicated formula? Then increase the theta0 to, say, 90 degrees (pi/2 radians, or starting at the horizontal), and answer those same questions.

Blackjack [10 pts]
Note: We'll wrap up this week with something a bit more lighthearted, but still important. Not only does blackjack stress loops and conditionals along with problem-solving (our topics of the week), but it's a complete (if simple) program. While most of what we write in this stage of the course are standalone functions, we should be moving towards writing entire applications that are reasonably usable. This is our first one. Have fun!

Background: Here we will write a simplified version of Blackjack, without most of the complexities (betting, splitting, doubling down, and so on). In our simple version, we start each round with a new shuffled deck. The dealer (the computer) is dealt one card and the player (the human) is dealt two cards. The player can see all cards dealt. The score of a hand (dealer's or player's) is the sum of the face values of the cards, where aces count as 1, except an ace counts as 11 if this would not make the hand "bust" (exceed 21). The round proceeds with the player repeatedly facing the choice of "hitting" (taking a card) or "standing" (ending this part of the round). If the player "busts" (scores over 21), the dealer wins the round. Otherwise, once the player stands, the dealer starts taking cards until either the dealer scores 17 or higher, or the dealer "busts" (scores over 21, in which case the player wins the round). If neither party busts and the round ends, then the hand with the higher score wins the round, with ties decided in the dealer's favor.

As for implementing blackjack, seeing as we have not yet covered lists, we have modeled a deck of cards using a single string of hyphen-separated cards, like so:
3H-4D-AH-7D-TH-KH-KC-QD-8S-QH-QC-4H-6H-9H-9C-7H-AS-8H-TD-6C-7S-4S-2S-TS-JH-5C-JD-8C-TC-6D-2H-AC-AD-2C-3D-3S-JS-JC-7C-3C-9D-QS-9S-KS-4C-5D-8D-6S-2D-5S-5H-KD
We have also provided you with the function getRandomDeck that takes no parameters and returns a new shuffled deck each time you call it. You are not responsible for how the getRandomDeck function works, but you should understand how to use it.

In fact, we have provided more than that for this problem. We have written most of the game already! Your task is to finish the writing the game, adding two key functions missing from our implementation.

Your task: First, write the function blackjackScore (in the file hw3-collab.py). This function takes a hand of cards (represented as a string, just like the deck), and returns the score of those cards in blackjack. As noted above, the score is the sum of the face values of the cards, but with special treatment for aces (which are 1 unless they can be 11 without busting). Here are some test cases for you:

    assert(blackjackScore("2H") == 2)
    assert(blackjackScore("TD") == 10)
    assert(blackjackScore("QC") == 10)
    assert(blackjackScore("AS") == 11)
    assert(blackjackScore("2H-5C-TD-4S") == 21)
    assert(blackjackScore("2H-5C-TD-4S-2C") == 23) # bust!
    assert(blackjackScore("JS-KD-AC") == 21)
    assert(blackjackScore("9S-AC") == 20)
    assert(blackjackScore("9S-AC-AD") == 21)
    assert(blackjackScore("9S-AC-AD-AS") == 12)

Next, study the function playBlackjackRound that we have provided for you (and which uses the blackjackScore function that you just wrote!). You should understand this function thoroughly for quiz3. For now, though, you mainly need to understand its high-level behavior: it plays one full round of blackjack, and returns True if the player (the human) wins and False otherwise (if the computer dealer wins). There is nothing to submit for this step, just study the code for a while.

Next, write the function playBlackjack. This function takes no parameters and returns nothing, but it's far from useless! It plays a game of blackjack by repeatedly calling the playBlackjackRound function that we have given you. It also keeps track of the wins and losses, and asks the user whether to keep playing or not. To specify the behavior of this function, we will provide a transcript of an actual game. Make your function match the behavior of this transcript (though, of course, the actual cards and player decisions will vary each time you play):

--------------------------------
Welcome to blackjack!
--------------------------------
Your turn...
Dealer is showing: AS
Your hand: 3H-8D
Your score: 11
Hit or Stand? h
Dealer is showing: AS
Your hand: 3H-8D-9D
Your score: 20
Hit or Stand? s
Now it's my turn...
Dealer draws: AD   (press Enter to continue)
   Dealer hand:   AS-AD
   Dealer score: 12
   Player score: 20
Dealer draws: 5S   (press Enter to continue)
   Dealer hand:   AS-AD-5S
   Dealer score: 17
   Player score: 20
You won this round!
Play again?   y
--------------------------------
Player rounds won:   1
Dealer rounds won:   0
Your turn...
Dealer is showing: 6S
Your hand: AD-5C
Your score: 16
Hit or Stand? h
Dealer is showing: 6S
Your hand: AD-5C-4H
Your score: 20
Hit or Stand? h
Dealer is showing: 6S
Your hand: AD-5C-4H-2H
Your score: 12
Hit or Stand? h
Dealer is showing: 6S
Your hand: AD-5C-4H-2H-3S
Your score: 15
Hit or Stand? h
Dealer is showing: 6S
Your hand: AD-5C-4H-2H-3S-8S
Your score: 23
You bust! (your score is over 21)
Play again?   y
--------------------------------
Player rounds won:   1
Dealer rounds won:   1
Your turn...
Dealer is showing: 4D
Your hand: KD-5D
Your score: 15
Hit or Stand? s
Now it's my turn...
Dealer draws: 7C   (press Enter to continue)
   Dealer hand:   4D-7C
   Dealer score: 11
   Player score: 15
Dealer draws: KS   (press Enter to continue)
   Dealer hand:   4D-7C-KS
   Dealer score: 21
   Player score: 15
I have a higher score. I win this round!
Play again?   n
Goodbye!

Readings in Computational Thinking [20 pts]

Watson [10 pts]
Watch this short video on Watson, the impressive Jeopardy-playing program written by IBM (in collaboration with CMU!) that has been garnering much press of late. At the end of the video, there is a reference to how this is not just about game-playing, but how it could revolutionize certain aspects of business. Briefly reflect on this point. Is this a game-changer? Will this radically change your life (maybe not when it is playing Jeopardy, but when it is applied in other ways)? If so, how (and if not, why not)? And even if it won't be a radical game-changer, will it be a money-maker? That is, will it make some business processes (whether for consumers or not) sufficiently more efficient to justify the expense of this approach? Speculate, but explain your reasoning.
Human Genome Sequencing and String Matching [10 pts]
Watch this short video on human genome sequencing. Briefly explain how string matching, like we did in this hw (only with a bit more sophistication), played a critical role in sequencing the human genome.

Bonus/Optional: Week #3 Bonus/Honors Topic [5 pts]
See the Hw3 bonus writeup.