15-112 Fall 2011 Homework 3
Due Thursday, 29-Sep, at 10pm
Note: Due to the one-day extension (to Thursday), there is no additional 24-hour late period. The last time to submit hw3 is on Thursday at 10pm.

Read these instructions first!

With the exception of the contest1-related bonus, which is explained below, this hw is SOLO. See hw1 and the syllabus for details.
Except as noted below, follow the steps in hw1, except (of course) replace hw1 with hw3.
You will not be provided with a starter function nor with a public autograder for this assignment. Place your solutions in a file named hw3.py .
As we discussed in class this week, write your own well-chosen helper functions, and also write test functions for all your functions.
You may now use loops and conditionals and strings, including using import string (at the top of your file), but you still may not use the other constructs we've not yet covered.
In particular, you may also use the string.splitlines method (even though this technically returns a list, we only use that list implicitly in a for loop, so it's ok here).
You may only submit 6 times total!

encodeText
decodeText
crackCode
hasBalancedParentheses
justifyText
bonus/optional: findFirstRecursiveFunctionCall
bonus/optional: contest1

encodeText [20 pts]
Background: A Caesar's Cipher is a simple way to encrypt text (and is so-called because a version of it was in fact used by Caesar and his generals). To encode some text, you are given a shift (a positive integer), and you shift each letter in turn by that amount. For example, shifting "A" by 1 produces "B", and shifting "A" by 3 produces "D". Shifting wraps around, so shifting "Y" by 1 produces "Z", but shifting "Y" by 2 produces "A". Also, shifting is case-sensitive, so shifting "y" by 2 produces "a". Non-letter characters are left intact (not shifted). In this way, shifting the plaintext "We attack at dawn!" by 5 produces the ciphertext "Bj fyyfhp fy ifbs!".

With this in mind, write the function encodeText, that takes a string of plaintext and a non-negative integer shift, and returns the ciphertext resulting from encrypting the given plaintext with the given shift as just described.
decodeText [20 pts]
Write the function decodeText, that takes a string of ciphertext encoded as just described, and a non-negative integer shift, and returns the original plaintext that was used to create the given ciphertext. Thus, decodeText("Bj fyyfhp fy ifbs!", 5) returns "We attack at dawn!".
crackCode [20 pts]
It turns out that the Caesar's cipher you just wrote is very easy to crack. One way is to just try every shift, and for each one look up every word in a dictionary, figuring the shift that produces the most real words is in fact the real shift. Generally, only the correct shift will produce much more than gobbledygook (a technical term), so this approach works very well in practice (well enough to make the Caesar cipher of little more than historical value, except for its common use in introductory programming courses!).

Fascinating! But here we ask: can we crack the code without even using a dictionary? In fact, it is such a weak code that yes, we can (at least in many cases). How? Here is one way: in ordinary English text, different letters have different frequencies. For example, "E" (the most common letter) has a frequency of 12.702% (that is, 12.702% of all letters are the letter "E" or "e" (we are case-insensitive)). Similarly, "L" has a frequency of 4.025%, and "K" has a frequency of 0.772%. Now, given some ciphertext, we will consider each of the 26 possible shifts in turn. For each shift, find the decoded plaintext and count the frequencies of "E", "L", and "K". We will assume that the real shift is the one where these three frequencies best match their usual frequencies noted above.

How do we decide how good a match any given shift's frequencies are? For each shift, we'll use the chi-squared test, which measures how good a fit multiple observed values are compared to their expected values. Here is the formula:
     chiSquared = sum [ (observed_i-expected_i)² / expected_i ]

Restating this, in terms of E, L, and K:
     chiSquared(shift) = [(observed_"E"-expected_"E")² / expected_"E"] +
                          [(observed_"L"-expected_"L")² / expected_"L"] +
                          [(observed_"K"-expected_"K")² / expected_"K"]

And substituting the expected values listed above, we have:
     chiSquared(shift) = [(observed_"E"- 12.702%)² / 12.702%] +
                          [(observed_"L"- 4.025%)² / 4.025%] +
                          [(observed_"K"- 0.772%)² / 0.772%]

How do we find the observed frequencies? For "E", count the total number of uppercase or lowercase E's, and divide this by the total number of uppercase or lowercase letters (ignoring all non-letters). Do this for "E", "L", and "K" for each shift, find the chiSquared value for each shift, and return the shift that has the smallest such chiSquared value (and hence is the best fit by this metric).

Finally: write the function crackCode that takes a string that has been encoded by a Caesar cipher, and applies the technique just described to try and crack the code. Your function should return the shift that it predicts was used to encode the given ciphertext.

Does this work? For short strings of ciphertext, no. But for even moderately long strings, yes. Here is a sample test function which shows the test working on every possible shift value for a paragraph-sized chunk of ciphertext:
```
def testCrackCode():
    print "testing crackCode()...",
    plaintext = (
"""
CHAPTER I. Down the Rabbit-Hole
Alice was beginning to get very tired of sitting by her sister on the bank,
and of having nothing to do: once or twice she had peeped into the book her
sister was reading, but it had no pictures or conversations in it, 'and what
is the use of a book,' thought Alice 'without pictures or conversation?'
So she was considering in her own mind (as well as she could, for the hot day
made her feel very sleepy and stupid), whether the pleasure of making a
daisy-chain would be worth the trouble of getting up and picking the daisies,
when suddenly a White Rabbit with pink eyes ran close by her.
""")
    for shift in xrange(26):
        ciphertext = encodeText(plaintext, shift)
        crackedShift = crackCode(ciphertext)
        assert(shift == crackedShift)
    print "passed!"
```
hasBalancedParentheses [20 pts]
It is possible to write a function that takes a string containing Python code and does some sort of analysis on it. For example, this function is a simplistic way to count the number of functions defined in a piece of code:
```
def countDefs(code):
    defs = 0
    for line in code.splitlines():
        if line.startswith("def"):
            # found another one
            defs += 1
    return defs

code = """\
# this is sample code to submit to our countDefs function
def f(x):
    return x/5

def g(x):
    if (x > 0):
        return 5
    else:
        return 6

def h(x):
    return "wow"

for n in xrange(50):
    print f(n), g(n), h(n)
"""

print countDefs(code)  # prints 3 (f, g, and h)
```
In a similar fashion, write the function hasBalancedParentheses, which takes a string of Python code and returns True if that code has balanced parentheses and False otherwise. We say that parentheses are balanced in a block of text if each right parenthesis closes (matches) an open (unmatched) left parenthesis, and no left parentheses are left unclosed (unmatched) at the end of the text. So, for example, "( ( ( ) ( ) ) ( ) )" is balanced, but "( ) )" is not balanced, and "( ) ) (" is also not balanced.

To keep things simple, we will ignore the fact that parentheses in Python cannot span different function definitions. In fact we'll pretty much ignore that the code is Python and just focus on the parentheses, but with two exceptions: comments and strings. Do not count parentheses that occur in comments (that is, after a hash mark (#) on a line). Also, do not count parentheses that occur in strings -- whether those strings start with single-quote ('), double-quote ("), or three-double-quotes ("""). You also must deal with escaped quotes that do not close strings. For example, the string "ab\"(c" contains a left parenthesis that should not be counted by your function.

Here are some specifics to keep in mind:
- Do not worry about syntax errors.
- Thus, for example, you may assume all non-triple-quoted strings end on the line they began on.
- Do not worry about backslashes at ends of lines (which technically continue the line on the next line)
- In general, match parentheses across newlines, even if this would accept some syntactically incorrect Python. Remember, we are assuming the Python is syntactically correct!
- Do not match braces { }, brackets [ ], or other delimiters. Just parentheses.
- Here is how you do and do not include escaped strings in code samples you might provide to your own test function:
```
wrongCode = """\
print "This code does not escape the double-quote:  \"  (Oh no!)"
"""

print "wrongCode=",wrongCode

rightCode = """\
print "This code does escape the double-quote:  \\"  Whew!"
"""

print "rightCode=",rightCode

print "Now we'll watch the wrongCode fail and the rightCode work."

try:
    print "\n****** running wrongCode *****\n"
    exec(wrongCode)
except Exception as err:
    print "Oh dear, the wrong code failed!"
    print "  Error:", err

try:
    print "\n****** running rightCode *****\n"
    exec(rightCode)
except Exception as err:
    print "Gadzooks, the 'right' code failed, too!"
 
```
justifyText [20 pts]
Write the function justifyText that takes a string (which may be multi-line, or just one potentially very long line) and a width (which you may assume is a positive integer), and returns that same text as a multi-line string that is left- and right- justified to the given line width. For example, consider the following text:
```
text = """\
We hold these truths to be self-evident:  that all men are created equal;
that they are endowed by their Creator with certain unalienable rights;
that among these are life, liberty, and the pursuit of happiness."""
```
WIth this text, a call to justifyText(text, 30) would return this text left- and right-justified with 30 characters per line, as such:
```
"""\
We  hold  these  truths  to be
self-evident: that all men are
created  equal;  that they are
endowed  by their Creator with
certain   unalienable  rights;
that  among  these  are  life,
liberty,  and  the  pursuit of
happiness."""
```
Here are some specifics to keep in mind:
- You should first replace all sequences of any length of whitespace (spaces, tabs, newlines) with a single space.
- Then, for each line except the last line, find the space as far to the right as possible while still allowing the line to remain under the given width restriction. That space will be replaced with a newline ("\n").
- What if a line has no such space, and instead it is solid non-spaces? This is unlikely, but we still have to deal with it. In this case, just insert a newline at the given line width. Do not add a hyphen in this case.
- Now, unless the line is a perfect fit in the given width (as in the second line above ), you will have to add some extra spaces to make it fit exactly. Consider the first line in the example above ("We hold these truths to be"). There are 5 total spaces (one between each word), and since the line is 26 characters long, it requires 4 more spaces to pad out to a width of 30. This is done by adding an extra space to each space from left-to-right. In this case the first 4 spaces are now 2 spaces, and the last space is still one space. This process works in general, so you may have to add more than one extra space of padding to each gap (such as the 3 total spaces between "certain" and "unalienable"), and there cannot be more than a 1-space difference between any two gaps on a line, and the extra spaces must all be on the left side of the line.
- The last row is not justified.
bonus/optional: findFirstRecursiveFunctionCall [2.5 pts]
Write the function findFirstRecursiveFunctionCall that takes a string of Python code (similar to the previous problem), and returns the name of the first occurrence in the code of a recursive function -- that is, a function that calls itself. If there are no recursive functions, your function should return the value None. You are responsible for properly handling comments and strings. You are not responsible for mutually-recursive functions (for example, a function f that calls g, which in turn calls f). And don't worry about unnamed lambda functions or other inobvious cases (such as functions as parameters). If you don't understand that previous sentence, just understand the first few words ("don't worry").
bonus/optional: contest1 [5 pts -- 1/2 point per problem]
Students may complete the problems from our first programming contest (contest1) for bonus on hw3. Do not submit this with hw3, however. Instead, submit directly with contest1. Also, on this part and only this part of the assignment, you may work in teams of 2 rather than SOLO. You may not change teams, though, so choose partners wisely. If you choose to work on a team, each student must make their own submission, though it can be of the same file, and the file must indicate at the top the names of both students. These are unusual rules, especially for homework assignments, so be sure to understand they apply exclusively to the contest1 problems for bonus and definitely not in general. If you actually participated in contest1, you may reuse your code from there (though now you have to clean it up, since style does count in homeworks, even in bonus!). Also, if you were on a team in contest1, you may continue working as a team on this bonus, or as individuals, but not as a team with anyone other than who you started with.

carpe diem - carpe diem - carpe diem - carpe diem - carpe diem - carpe diem - carpe diem - carpe diem - carpe diem