15-112 Fall 2011 Homework 3
Due Thursday, 29-Sep, at 10pm
Note:  Due to the one-day extension (to Thursday), there is no additional 24-hour late period.  The last time to submit hw3 is on Thursday at 10pm.


Read these instructions first!
  1. encodeText
  2. decodeText
  3. crackCode
  4. hasBalancedParentheses
  5. justifyText
  6. bonus/optional: findFirstRecursiveFunctionCall
  7. bonus/optional: contest1

  1. encodeText  [20 pts]
    Background:  A Caesar's Cipher is a simple way to encrypt text (and is so-called because a version of it was in fact used by Caesar and his generals).  To encode some text, you are given a shift (a positive integer), and you shift each letter in turn by that amount.  For example, shifting "A" by 1 produces "B", and shifting "A" by 3 produces "D".  Shifting wraps around, so shifting "Y" by 1 produces "Z", but shifting "Y" by 2 produces "A".  Also, shifting is case-sensitive, so shifting "y" by 2 produces "a".  Non-letter characters are left intact (not shifted).  In this way, shifting the plaintext "We attack at dawn!" by 5 produces the ciphertext "Bj fyyfhp fy ifbs!".

    With this in mind, write the function encodeText, that takes a string of plaintext and a non-negative integer shift, and returns the ciphertext resulting from encrypting the given plaintext with the given shift as just described.
     
  2. decodeText  [20 pts]
    Write the function decodeText, that takes a string of ciphertext encoded as just described, and a non-negative integer shift, and returns the original plaintext that was used to create the given ciphertext.  Thus, decodeText("Bj fyyfhp fy ifbs!", 5) returns "We attack at dawn!".
     
  3. crackCode  [20 pts]
    It turns out that the Caesar's cipher you just wrote is very easy to crack.  One way is to just try every shift, and for each one look up every word in a dictionary, figuring the shift that produces the most real words is in fact the real shift.  Generally, only the correct shift will produce much more than gobbledygook (a technical term), so this approach works very well in practice (well enough to make the Caesar cipher of little more than historical value, except for its common use in introductory programming courses!).

    Fascinating!  But here we ask:  can we crack the code without even using a dictionary?  In fact, it is such a weak code that yes, we can (at least in many cases).  How?  Here is one way:  in ordinary English text, different letters have different frequencies.  For example, "E" (the most common letter) has a frequency of 12.702% (that is, 12.702% of all letters are the letter "E" or "e" (we are case-insensitive)).  Similarly, "L" has a frequency of 4.025%, and "K" has a frequency of 0.772%.  Now, given some ciphertext, we will consider each of the 26 possible shifts in turn.  For each shift, find the decoded plaintext and count the frequencies of "E", "L", and "K".  We will assume that the real shift is the one where these three frequencies best match their usual frequencies noted above.

    How do we decide how good a match any given shift's frequencies are?  For each shift, we'll use the chi-squared test, which measures how good a fit multiple observed values are compared to their expected values.  Here is the formula:
         chiSquared =  sum [ (observedi-expectedi)2 / expectedi ]

    Restating this, in terms of E, L, and K:
         chiSquared(shift) =  [(observed"E"-expected"E")2 / expected"E"] +
                              [(observed"L"-expected"L")2 / expected"L"] +
                              [(observed"K"-expected"K")2 / expected"K"]


    And substituting the expected values listed above, we have:
         chiSquared(shift) =  [(observed"E" - 12.702%)2 / 12.702%] +
                              [(observed"L" -  4.025%)2 /  4.025%] +
                              [(observed"K" -  0.772%)2 /  0.772%]

    How do we find the observed frequencies?  For "E", count the total number of uppercase or lowercase E's, and divide this by the total number of uppercase or lowercase letters (ignoring all non-letters).  Do this for "E", "L", and "K" for each shift, find the chiSquared value for each shift, and return the shift that has the smallest such chiSquared value (and hence is the best fit by this metric).

    Finally:  write the function crackCode that takes a string that has been encoded by a Caesar cipher, and applies the technique just described to try and crack the code.  Your function should return the shift that it predicts was used to encode the given ciphertext.

    Does this work?  For short strings of ciphertext, no.  But for even moderately long strings, yes.  Here is a sample test function which shows the test working on every possible shift value for a paragraph-sized chunk of ciphertext:
    def testCrackCode():
        print "testing crackCode()...",
        plaintext = (
    """
    CHAPTER I. Down the Rabbit-Hole
    Alice was beginning to get very tired of sitting by her sister on the bank,
    and of having nothing to do: once or twice she had peeped into the book her
    sister was reading, but it had no pictures or conversations in it, 'and what
    is the use of a book,' thought Alice 'without pictures or conversation?'
    So she was considering in her own mind (as well as she could, for the hot day
    made her feel very sleepy and stupid), whether the pleasure of making a
    daisy-chain would be worth the trouble of getting up and picking the daisies,
    when suddenly a White Rabbit with pink eyes ran close by her.
    """)
        for shift in xrange(26):
            ciphertext = encodeText(plaintext, shift)
            crackedShift = crackCode(ciphertext)
            assert(shift == crackedShift)
        print "passed!"
  4. hasBalancedParentheses  [20 pts]
    It is possible to write a function that takes a string containing Python code and does some sort of analysis on it.  For example, this function is a simplistic way to count the number of functions defined in a piece of code:
    def countDefs(code):
        defs = 0
        for line in code.splitlines():
            if line.startswith("def"):
                # found another one
                defs += 1
        return defs
    
    code = """\
    # this is sample code to submit to our countDefs function
    def f(x):
        return x/5
    
    def g(x):
        if (x > 0):
            return 5
        else:
            return 6
    
    def h(x):
        return "wow"
    
    for n in xrange(50):
        print f(n), g(n), h(n)
    """
    
    print countDefs(code)  # prints 3 (f, g, and h)

    In a similar fashion, write the function hasBalancedParentheses, which takes a string of Python code and returns True if that code has balanced parentheses and False otherwise.  We say that parentheses are balanced in a block of text if each right parenthesis closes (matches) an open (unmatched) left parenthesis, and no left parentheses are left unclosed (unmatched) at the end of the text.  So, for example, "( ( ( ) ( ) ) ( ) )" is balanced, but "( ) )" is not balanced, and "( ) ) (" is also not balanced.

    To keep things simple, we will ignore the fact that parentheses in Python cannot span different function definitions.  In fact we'll pretty much ignore that the code is Python and just focus on the parentheses, but with two exceptions:  comments and strings.  Do not count parentheses that occur in comments (that is, after a hash mark (#) on a line).  Also, do not count parentheses that occur in strings -- whether those strings start with single-quote ('), double-quote ("), or three-double-quotes (""").  You also must deal with escaped quotes that do not close strings.  For example, the string "ab\"(c" contains a left parenthesis that should not be counted by your function.

    Here are some specifics to keep in mind:

  5. justifyText  [20 pts]
    Write the function justifyText that takes a string (which may be multi-line, or just one potentially very long line) and a width (which you may assume is a positive integer), and returns that same text as a multi-line string that is left- and right- justified to the given line width.  For example, consider the following text:
    text = """\
    We hold these truths to be self-evident:  that all men are created equal;
    that they are endowed by their Creator with certain unalienable rights;
    that among these are life, liberty, and the pursuit of happiness."""

    WIth this text, a call to justifyText(text, 30) would return this text left- and right-justified with 30 characters per line, as such:

    """\
    We  hold  these  truths  to be
    self-evident: that all men are
    created  equal;  that they are
    endowed  by their Creator with
    certain   unalienable  rights;
    that  among  these  are  life,
    liberty,  and  the  pursuit of
    happiness."""
    Here are some specifics to keep in mind:
  6. bonus/optional: findFirstRecursiveFunctionCall  [2.5 pts]
    Write the function findFirstRecursiveFunctionCall that takes a string of Python code (similar to the previous problem), and returns the name of the first occurrence in the code of a recursive function -- that is, a function that calls itself.  If there are no recursive functions, your function should return the value None.  You are responsible for properly handling comments and strings.  You are not responsible for mutually-recursive functions (for example, a function f that calls g, which in turn calls f).  And don't worry about unnamed lambda functions or other inobvious cases (such as functions as parameters).  If you don't understand that previous sentence, just understand the first few words ("don't worry").
     
  7. bonus/optional: contest1  [5 pts -- 1/2 point per problem]
    Students may complete the problems from our first programming contest (contest1) for bonus on hw3.  Do not submit this with hw3, however.  Instead, submit directly with contest1.  Also, on this part and only this part of the assignment, you may work in teams of 2 rather than SOLO.  You may not change teams, though, so choose partners wisely.  If you choose to work on a team, each student must make their own submission, though it can be of the same file, and the file must indicate at the top the names of both students.  These are unusual rules, especially for homework assignments, so be sure to understand they apply exclusively to the contest1 problems for bonus and definitely not in general.  If you actually participated in contest1, you may reuse your code from there (though now you have to clean it up, since style does count in homeworks, even in bonus!).  Also, if you were on a team in contest1, you may continue working as a team on this bonus, or as individuals, but not as a team with anyone other than who you started with.

carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem