- encodeText
[20 pts]
Background: A Caesar's Cipher is a simple way to encrypt text (and is
so-called because a version of it was in fact used by Caesar and his generals).
To encode some text, you are given a shift (a positive integer), and you shift
each letter in turn by that amount. For example, shifting "A" by 1
produces "B", and shifting "A" by 3 produces "D". Shifting wraps around,
so shifting "Y" by 1 produces "Z", but shifting "Y" by 2 produces "A".
Also, shifting is case-sensitive, so shifting "y" by 2 produces "a".
Non-letter characters are left intact (not shifted). In this way, shifting
the plaintext "We attack at dawn!" by 5 produces the ciphertext "Bj fyyfhp fy
ifbs!".
With this in mind, write the function encodeText, that takes a string of
plaintext and a non-negative integer shift, and returns the ciphertext resulting
from encrypting the given plaintext with the given shift as just described.
- decodeText
[20 pts]
Write the function decodeText, that takes a string of ciphertext encoded as just
described, and a non-negative integer shift, and returns the original plaintext
that was used to create the given ciphertext. Thus, decodeText("Bj fyyfhp
fy ifbs!", 5) returns "We attack at dawn!".
- crackCode
[20 pts]
It turns out that the Caesar's cipher you just wrote is very easy to crack.
One way is to just try every shift, and for each one look up every word in a
dictionary, figuring the shift that produces the most real words is in
fact the real shift. Generally, only the correct shift will produce much
more than gobbledygook (a technical term), so this approach works very well in
practice (well enough to make the Caesar cipher of little more than historical
value, except for its common use in introductory programming courses!).
Fascinating! But here we ask: can we crack the code without even
using a dictionary? In fact, it is such a weak code that yes, we can (at
least in many cases). How? Here is one way: in ordinary
English text, different letters have different frequencies. For example,
"E" (the most common letter) has a frequency of 12.702% (that is, 12.702% of all
letters are the letter "E" or "e" (we are case-insensitive)). Similarly,
"L" has a frequency of 4.025%, and "K" has a frequency of 0.772%. Now,
given some ciphertext, we will consider each of the 26 possible shifts in turn.
For each shift, find the decoded plaintext and count the frequencies of "E",
"L", and "K". We will assume that the real shift is the one where these
three frequencies best match their usual frequencies noted above.
How do we decide how good a match any given shift's frequencies are? For
each shift, we'll use the
chi-squared test, which measures how good a fit multiple observed values
are compared to their expected values. Here is the formula:
chiSquared =
sum [ (observedi-expectedi)2 / expectedi
]
Restating this, in terms of E, L, and K:
chiSquared(shift) = [(observed"E"-expected"E")2
/ expected"E"] +
[(observed"L"-expected"L")2 / expected"L"]
+
[(observed"K"-expected"K")2 / expected"K"]
And substituting the expected values listed above, we have:
chiSquared(shift) = [(observed"E" - 12.702%)2 /
12.702%] +
[(observed"L" - 4.025%)2 / 4.025%] +
[(observed"K" - 0.772%)2 / 0.772%]
How do we find the observed frequencies? For "E", count the total number
of uppercase or lowercase E's, and divide this by the total number of uppercase
or lowercase letters (ignoring all non-letters). Do this for "E", "L", and
"K" for each shift, find the chiSquared value for each shift, and return the
shift that has the smallest such chiSquared value (and hence is the best fit by
this metric).
Finally: write the function crackCode that takes a string that has been
encoded by a Caesar cipher, and applies the technique just described to try and
crack the code. Your function should return the shift that it predicts was
used to encode the given ciphertext.
Does this work? For short strings of ciphertext, no. But for even
moderately long strings, yes. Here is a sample test function which shows
the test working on every possible shift value for a paragraph-sized chunk of
ciphertext:def testCrackCode():
print "testing crackCode()...",
plaintext = (
"""
CHAPTER I. Down the Rabbit-Hole
Alice was beginning to get very tired of sitting by her sister on the bank,
and of having nothing to do: once or twice she had peeped into the book her
sister was reading, but it had no pictures or conversations in it, 'and what
is the use of a book,' thought Alice 'without pictures or conversation?'
So she was considering in her own mind (as well as she could, for the hot day
made her feel very sleepy and stupid), whether the pleasure of making a
daisy-chain would be worth the trouble of getting up and picking the daisies,
when suddenly a White Rabbit with pink eyes ran close by her.
""")
for shift in xrange(26):
ciphertext = encodeText(plaintext, shift)
crackedShift = crackCode(ciphertext)
assert(shift == crackedShift)
print "passed!"
- hasBalancedParentheses [20
pts]
It is possible to write a function that takes a string containing Python code
and does some sort of analysis on it. For example, this function is a
simplistic way to count the number of functions defined in a piece of code:def countDefs(code):
defs = 0
for line in code.splitlines():
if line.startswith("def"):
# found another one
defs += 1
return defs
code = """\
# this is sample code to submit to our countDefs function
def f(x):
return x/5
def g(x):
if (x > 0):
return 5
else:
return 6
def h(x):
return "wow"
for n in xrange(50):
print f(n), g(n), h(n)
"""
print countDefs(code) # prints 3 (f, g, and h)
In a similar fashion, write the function
hasBalancedParentheses, which takes a string of Python code and returns True if
that code has balanced parentheses and False otherwise. We say that
parentheses are balanced in a block of text if each right parenthesis closes
(matches) an open (unmatched) left parenthesis, and no left parentheses are left
unclosed (unmatched) at the end of the text. So, for example, "( ( ( ) ( )
) ( ) )" is balanced, but "( ) )" is not balanced, and "( ) ) (" is also not
balanced.
To keep things simple, we will ignore the fact that parentheses in Python cannot
span different function definitions. In fact we'll pretty much ignore that
the code is Python and just focus on the parentheses, but with two exceptions:
comments and strings. Do not count parentheses that occur in comments
(that is, after a hash mark (#) on a line). Also, do not count parentheses
that occur in strings -- whether those strings start with single-quote ('),
double-quote ("), or three-double-quotes ("""). You also must deal with
escaped quotes that do not close strings. For example, the string "ab\"(c"
contains a left parenthesis that should not be counted by your function.
Here are some specifics to keep in mind:
- Do not worry about syntax errors.
- Thus, for example, you may assume all
non-triple-quoted strings end on the line they began on.
- Do not worry about backslashes at ends of lines
(which technically continue the line on the next line)
- In general, match parentheses across newlines, even if this would accept
some syntactically incorrect Python. Remember, we are
assuming the Python is syntactically correct!
- Do not match braces { }, brackets [ ], or other
delimiters. Just parentheses.
- Here is how you do and do not include escaped
strings in code samples you might provide to your own test function:
wrongCode = """\
print "This code does not escape the double-quote: \" (Oh no!)"
"""
print "wrongCode=",wrongCode
rightCode = """\
print "This code does escape the double-quote: \\" Whew!"
"""
print "rightCode=",rightCode
print "Now we'll watch the wrongCode fail and the rightCode work."
try:
print "\n****** running wrongCode *****\n"
exec(wrongCode)
except Exception as err:
print "Oh dear, the wrong code failed!"
print " Error:", err
try:
print "\n****** running rightCode *****\n"
exec(rightCode)
except Exception as err:
print "Gadzooks, the 'right' code failed, too!"
- justifyText [20
pts]
Write the function justifyText that takes a string (which may be multi-line, or
just one potentially very long line) and a width (which you may assume is
a positive integer), and returns that same text as a multi-line string that is
left- and right- justified to the given line width. For example, consider
the following text:text = """\
We hold these truths to be self-evident: that all men are created equal;
that they are endowed by their Creator with certain unalienable rights;
that among these are life, liberty, and the pursuit of happiness."""
WIth this text, a call to justifyText(text, 30) would
return this text left- and right-justified with 30 characters per line, as such:
"""\
We hold these truths to be
self-evident: that all men are
created equal; that they are
endowed by their Creator with
certain unalienable rights;
that among these are life,
liberty, and the pursuit of
happiness."""
Here are some specifics to keep in mind:
- You should first replace all sequences of any length of whitespace
(spaces, tabs, newlines) with a single space.
- Then, for each line except the last line, find the space as far to the
right as possible while still allowing the line to remain under the given
width restriction. That space will be replaced with a newline ("\n").
- What if a line has no such space, and instead it is solid non-spaces?
This is unlikely, but we still have to deal with it. In this case,
just insert a newline at the given line width. Do not add a hyphen in
this case.
- Now, unless the line is a perfect fit in the given width (as in the
second line above ), you will have
to add some extra spaces to make it fit exactly. Consider the first
line in the example above ("We hold these truths to be"). There are 5
total spaces (one between each word), and since the line is 26 characters
long, it requires 4 more spaces to pad out to a width of 30. This is
done by adding an extra space to each space from left-to-right. In
this case the first 4 spaces are now 2 spaces, and the last space is still
one space. This process works in general, so you may have to add more
than one extra space of padding to each gap (such as the 3 total spaces
between "certain" and "unalienable"), and there cannot be more than a
1-space difference between any two gaps on a line, and the extra spaces must all be on the
left side of the line.
- The last row is not justified.
-
bonus/optional: findFirstRecursiveFunctionCall
[2.5 pts]
Write the function findFirstRecursiveFunctionCall that takes a string of Python
code (similar to the previous problem), and returns the name of the first
occurrence in the code of a recursive function -- that is, a function that calls
itself. If there are no recursive functions, your function should return
the value None. You are responsible for properly handling comments and
strings. You are not responsible for mutually-recursive functions (for
example, a function f that calls g, which in turn calls f). And don't
worry about unnamed lambda functions or other inobvious cases (such as functions
as parameters). If you don't understand that previous sentence, just
understand the first few words ("don't worry").
- bonus/optional:
contest1 [5 pts -- 1/2 point per
problem]
Students may complete the problems from our first programming contest (contest1) for bonus
on hw3. Do not submit this with hw3, however. Instead, submit
directly with contest1. Also, on this part and only this part of the
assignment, you may work in teams of 2 rather than SOLO. You may not
change teams, though, so choose partners wisely. If you choose to work on
a team, each student must make their own submission, though it can be of the
same file, and the file must indicate at the top the names of both students.
These are unusual rules, especially for homework assignments, so be sure to
understand they apply exclusively to the contest1 problems for bonus and
definitely not in general. If you actually participated in contest1, you
may reuse your code from there (though now you have to clean it up, since style
does count in homeworks, even in bonus!). Also, if you were on a
team in contest1, you may continue working as a team on this bonus, or as
individuals, but not as a team with anyone other than who you started with.