15-112 Fall 2015 Homework 4
Due Monday, 28-Sep, at 10pm
No late or grace days this week

Read these instructions first!
  1. lookAndSay(a) [15 pts]
    First, read about look-and-say numbers here. Then, write the function lookAndSay(a) that takes a list of numbers and returns a list of numbers that results from "reading off" the initial list using the look-and-say method, using tuples for each (count, value) pair. For example:
      lookAndSay([]) == []
      lookAndSay([1,1,1]) == [(3,1)]
      lookAndSay([-1,2,7]) == [(1,-1),(1,2),(1,7)]
      lookAndSay([3,3,8,-10,-10,-10]) == [(2,3),(1,8),(3,-10)]
    

  2. inverseLookAndSay(a) [15 pts]
    Write the function inverseLookAndSay(a) that does the inverse of the previous problem, so that, in general:
      inverseLookAndSay(lookAndSay(a)) == a
    
    Or, in particular:
      inverseLookAndSay([(2,3),(1,8),(3,-10)]) == [3,3,8,-10,-10,-10]
    

  3. solvesCryptarithm(puzzle, solution) [20 pts]
    Background: a cryptarithm is a puzzle where we start with a simple arithmetic statement but then we replace all the digits with letters (where the same digit is replaced by the same letter each time). We will limit such puzzles to strings the form "A+B=C" (no spaces), where A, B, and C are positive integers. For example, "SEND+MORE=MONEY" is such a puzzle. The goal of the puzzle is to find an assignment of digits to the letters to make the math work out properly. For example, if we assign 0 to "O", 1 to "M", 2 to "Y", 5 to "E", 6 to "N", 7 to "D", 8 to "R", and 9 to "S" we get:
        S E N D       9 5 6 7
      + M O R E     + 1 0 8 5
      ---------     ---------
      M O N E Y     1 0 6 5 2
    
    And so we see that this assignment does in fact solve the problem! Now, we need a way to encode a possible solution. For that, we will use a single string where the index of the letter corresponds to the digit it represents. Thus, the string "OMY--ENDRS" represents the assignments listed above (the dashes are for unassigned digits).

    With this in mind, write the function solvesCryptarithm(puzzle, solution) that takes two strings, a puzzle (such as "SEND+MORE=MONEY") and a proposed solution (such as "OMY--ENDRS"). Your function should return True if substituting the digits from the solution back into the puzzle results in a mathematically correct addition problem, and False otherwise. You do not have to check whether a letter occurs more than once in the proposed solution, but you do have to verify that all the letters in the puzzle occur somewhere in the solution (of course). You may not use the eval() function. Also, you almost surely will want to write at least one well-chosen helper function.

  4. bestScrabbleScore(dictionary, letterScores, hand) [20 pts]
    Background: In a Scrabble-like game, players each have a hand, which is a list of lowercase letters. There is also a dictionary, which is a list of legal words (all in lowercase letters). And there is a list of letterScores, which is length 26, where letterScores[i] contains the point value for the ith character in the alphabet (so letterScores[0] contains the point value for 'a'). Players can use some or all of the tiles in their hand and arrange them in any order to form words. The point value for a word is 0 if it is not in the dictionary, otherwise it is the sum of the point values of each letter in the word, according to the letterScores list (pretty much as it works in actual Scrabble).

    In case you are interested, here is a list of the actual letterScores for Scrabble:
       letterScores = [
       #  a, b, c, d, e, f, g, h, i, j, k, l, m
          1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3,
       #  n, o, p, q, r, s, t, u, v, w, x, y, z
          1, 1, 3,10, 1, 1, 1, 1, 4, 4, 8, 4,10
       ]
    
    Note that your function must work for any list of letterScores as is provided by the caller.

    With this in mind, write the function bestScrabbleScore(dictionary, letterScores, hand) that takes 3 lists -- dictionary (a list of lowercase words), letterScores (a list of 26 integers), and hand (a list of lowercase characters) -- and returns a tuple of the highest-scoring word in the dictionary that can be formed by some arrangement of some subset of letters in the hand, followed by its score. In the case of a tie, the first element of the tuple should instead be a list of all such words in the order they appear in the dictionary. If no such words exist, return None.

    Note: there is no fixed dictionary here. Each time we call the function, we may provide a different dictionary! It may contain 100 words or perhaps 100,00 words.

  5. runSimpleProgram(program, args) [30 pts]
    First, carefully watch this video that describes this problem:

    Then, write the function runSimpleProgram(program, args) that works as described in the video, taking a legal program (do not worry about syntax or runtime errors, as we will not test for those cases) and runs it with the given args, and returns the result.

    Here are the legal expressions in this language:

    • [Non-negative Integer]
      Any non-negative integer, such as 0 or 123, is a legal expression.

    • A[N]
      The letter A followed by a non-negative integer, such as A0 or A123, is a legal expression, and refers to the given argument. A0 is the value at index 0 of the supplied args list. It is an error to set arg values, and it is an error to get arg values that are not supplied. And you may ignore these errors, as we will not test for them!

    • L[N]
      The letter L followed by a non-negative integer, such as L0 or L123, is a legal expression, and refers to the given local variable. It is ok to get an unassigned local variable, in which case its value should be 0.

    • [operator] [operand1] [operand2]
      This language allows so-called prefix expressions, where the operator precedes the operands. The operator can be either + or -, and the operands must each be one of the legal expression types listed above (non-negative integer, A[N] or L[N]).

    And here are the legal statements in this language (noting that statements occur one per line, and leading and trailing whitespace is ignored):

    • ! comment
      Lines that start with an exclamation (!), after the ignored whitespace, are comments and are ignored.

    • L[N] [expr]
      Lines that start with L[N] are assignment statements, and are followed by the expression (as described above) to be stored into the given local variable. For example: L5 - L2 42 This line assigns (L2 - 42) into L5.

    • [label]:
      Lines that contain only a lowercase word followed by a colon are labels, which are ignored except for when they are targets of jump statements.

    • JMP [label]
      This is a jump statement, and control is transferred to the line number where the given label is located. It is an error for such a label to not exist, and you may ignore that error.

    • JMP+ [expr] [label]
      This is a conditional jump, and control is transferred to the line number where the given label is located only if the given expression is positive. Otherwise, the statement is ignored.

    • JMP0 [expr] [label]
      This is another kind of conditional jump, and control is transferred only if the given expression is 0.

    • RTN [expr]
      This is a return statement, and the given expression is returned.

    Hints:
    1. Do not try to translate the program into Python! Even if you could do so, it is not allowed here. Instead, you are going to write a so-called interpreter. Just keep track of the local variables, and move line-by-line through the program, simulating the execution of the line as appropriate.
    2. You will find it useful to keep track of the current line number.
    3. How long do you run the program? Until you hit a RTN statement! You may assume that will always eventually happen.
    4. We used strip, split, and splitlines in our sample solution, though you of course may solve this how you wish.

    Finally, here is a sample test function for you. You surely will want to add some addition test cases. In fact, a hint would be to build your function incrementally, starting with the simplest test cases you can think up, which use the fewest expression and statement syntax rules. Then add more test cases as you implement more of the language.
    def testRunSimpleProgram(): print("Testing runSimpleProgram()...", end="") largest = """! largest: Returns max(A0, A1) L0 - A0 A1 JMP+ L0 a0 RTN A1 a0: RTN A0""" assert(runSimpleProgram(largest, [5, 6]) == 6) assert(runSimpleProgram(largest, [6, 5]) == 6) sumToN = """! SumToN: Returns 1 + ... + A0 ! L0 is a counter, L1 is the result L0 0 L1 0 loop: L2 - L0 A0 JMP0 L2 done L0 + L0 1 L1 + L1 L0 JMP loop done: RTN L1""" assert(runSimpleProgram(sumToN, [5]) == 1+2+3+4+5) assert(runSimpleProgram(sumToN, [10]) == 10*11//2) print("Passed!")

  6. Bonus/Optional: linearRegression(a) [2.5 pts]
    Write the function linearRegression(a) that takes a list of (x,y) points and finds the line of best fit through those points. Specifically, your function should return a triple of floats (a,b,r) such that y = ax+b is the line of best fit through the given points and r is the correlation coefficient as explained here (and yes, you must follow this exact approach). For example (taken from the text), linearRegression([(1,3), (2,5), (4,8)]) should return a triple of 3 values approximately equal to (1.6429, 1.5, 0.9972), indicating that the line of best fit is y = 1.6429x + 1.5, with a correlation coefficient of 0.9972 (so it's a really good fit, too). Note that the result is approximately equal to these values. Also note: the notes we refer you to do not discuss the meaning of the sign of r, so just return the absolute value |r|. And: you may ignore the case of a vertical line in your linearRegression code.

  7. Bonus/Optional: bogusDataFinder(csvData) [2.5 pts]
  8. Background: for this problem, we will consider the data in this file, which is a modified data file obtained from http://baseballguru.com/MLB2014.xls. Be sure to use our version of the file. This file contains real comma-separated batting data for 2014 Major League Baseball. Cool. We have deleted most of the lesser-known statistics. The columns that are included are all straightforward, so for those of you who do not know much about baseball, you could ask most anyone who does and they could easily explain their meaning. In fact, however, you can do this problem without ever knowing their meaning, so that's not required.

    The problem: we are going to take this file, delete some lines, and then shuffle the remaining lines (except the first line, with the column names, which will remain intact at the top). So far, no problem. But then we will deliberately change some data values. Your task is to identify which values we changed.

    Of course, there is a way to make this problem trivial: just copy the entire table into your code. Then you can just check each value against its original, and be certain to find the changes. And, of course, we won't allow this!

    More specifically: while you may include summary statistics derived from this data, you may not include any numbers directly from the data. Also, you may not include any lists of data in your code (or encoded variants, say in strings) with more than 100 total values. Finally, you may not have more than 300 lines total in your solution, including any such data. The point is: don't just record the data and uncleverly compare the original to the changed copy, but instead cleverly analyze the data to detect the changes.

    With this in mind, write the function bogusDataFinder(csvData) that takes a multiline string that contains the csv data as described above, and returns a list of tuples indicating the changed items as the tuples (playerID, columnTitle). The list should be sorted alphabetically first by playerID and then by columnTitle.

    So you know: At first, we'll make very large changes, to ease in their detection. But then the changes will get increasingly smaller, until nobody can detect them.

    Also: if we see especially remarkable solutions here, we may award some additional bonus as appropriate. We'll see. But don't do this for the points, do it for the joy of learning. Truly.

    Here is an example. Note that Bobby Abreu actually had 12 runs (which we edited to 120), and Jose Abreu actually had 556 atBats (which we edited to 5560). We did not make any edits to Zoilo Almonte.
    bogusData = """\
    playerID,nameFirst,nameLast,teamID,games,atBats,runs,hits,doubles,triples,homeRuns,runsBattedIn,stolenBases,caughtStealing,Walks,StrikeOuts,battingAverage
    abreujo02,Jose,Abreu,CHA,145,5560,80,176,35,2,36,107,3,1,51,131,0.317
    almonzo01,Zoilo,Almonte,NYA,13,36,2,5,0,0,1,3,1,0,0,14,0.139
    abreubo01,Bobby,Abreu,NYN,78,133,120,33,9,0,1,14,1,0,20,21,0.248"""
    
    assert(bogusDataFinder(bogusData) == [("abreubo01", "runs"),
                                          ("abreujo02, "atBats")
                                         ]
          )
    
    Also note that you may use 2d lists for this problem (and only this problem) if you wish, though that is not required.

    Have fun!