15-112 Spring 2016 Homework 5
Due Sunday, 14-Feb, at 10pm

Read these instructions first!

This hw is SOLO. See the syllabus for details.
Starting this week, you may use sets and maps, but (as usual) you may not use constructs we have not yet covered (recursion, lambda functions, etc), unless explicitly allowed below.
This week you may only make up to 5 submissions max to Autolab. As usual, only your last one counts. Also, as usual, we will add 50% more submissions (3 this week) for late/grace days.

Better Big Oh [20 pts] [manually graded]
In a triple-quoted string at the top of your file (just below your name), include solutions to this exercise. For each of the following functions:

State in just a few words what it does in general.
State and prove (in a few words) its worst-case big-oh runtime.
Provide an equivalent Python function that is more than a constant-factor faster (so it's worst-case big-oh runtime is in a different function family). The better your solution's big-oh runtime, the more points you get!
State and prove (in a few words) your better solution's worst-case big-oh runtime.

You should assume that lists all contain at least 5 values, and only integer values. Also, if a function takes two lists, you should assume they are the same length N.

import copy

def slow1(a):
    (b, c) = (copy.copy(a), 0)
    while (b != [ ]):
        b.pop()
        c += 1
    return c

def slow2(a):
    n = len(a)
    count = 0
    for i in range(n):
        for j in range(n):
            if (a[i] == a[j]):
                count += 1
    return (count == n)

def slow3(a, b):
    # assume a and b are the same length n
    n = len(a)
    assert(n == len(b))
    result = 0
    for c in b:
        if c not in a:
            result += 1
    return result 

def slow4(a, b):
    # assume a and b are the same length n
    n = len(a)
    assert(n == len(b))
    result = abs(a[0] - b[0])
    for c in a:
        for d in b:
            delta = abs(c - d)
            if (delta > result):
                result = delta
    return result

def slow5(a, b):
    # Hint: this is a tricky one!  Even though it looks syntactically
    # almost identical to the previous problem, in fact the solution
    # is very different and more complicated.
    # You'll want to sort one of the lists,
    # and then use binary search over that sorted list (for each value in
    # the other list).  In fact, you should use bisect.bisect for this
    # (you can read about this function in the online Python documentation).
    # The bisect function returns the index i at which you would insert the
    # value to keep the list sorted (with a couple edge cases to consider, such
    # as if the value is less than or greater than all the values in the list, 
    # or if the value actually occurs in the list).
    # The rest is left to you...
    #
    # assume a and b are the same length n
    n = len(a)
    assert(n == len(b))
    result = abs(a[0] - b[0])
    for c in a:
        for d in b:
            delta = abs(c - d)
            if (delta < result):
                result = delta
    return result

invertDictionary(d) [20 pts] [autograded]
Write the function invertDictionary(d) that takes a dictionary d that maps keys to values and returns a dictionary of its inverse, that maps the original values back to their keys. One complication: there can be duplicate values in the original dictionary. That is, there can be keys k1 and k2 such that (d[k1] == v) and (d[k2] == v) for the same value v. For this reason, we will in fact map values back to the set of keys that originally mapped to them. So, for example:
```
assert(invertDictionary({1:2, 2:3, 3:4, 5:3}) == 
       {2:set([1]), 3:set([2,5]), 4:set([3])})
```
Also, you may assume that the values in the original dictionary are all immutable, so that they are legal keys in the resulting inverted dictionary.

friendsOfFriends(d) [20 pts] [autograded]
Background: we can create a dictionary mapping people to sets of their friends. For example, we might say:
```
    d["fred"] = set(["wilma", "betty", "barney", "bam-bam"])
    d["wilma"] = set(["fred", "betty", "dino"])
```
With this in mind, write the function friendsOfFriends(d) that takes such a dictionary mapping people to sets of friends and returns a new dictionary mapping all the same people to sets of their friends of friends. For example, since wilma is a friend of fred, and dino is a friend of wilma, dino is a friend-of-friend of fred. This set should exclude any direct friends, so even though betty is also a friend of wilma, betty does not count as a friend-of-friend of fred (since she is simply a friend of fred). Thus, in this example, if fof = friendsOfFriends(d), then fof["fred"] is a set containing just "dino" and fof["wilma"] is a set containing both "barney" and "bam-bam". Also, do not include anyone either in their own set of friends or their own set of friends-of-friends.

Note: you may assume that everyone listed in any of the friend sets also is included as a key in the dictionary. Also, do not worry about friends-of-friends-of-friends.

largestSumOfPairs(a) [10 pts] [autograded]
Write the function largestSumOfPairs(a) that takes a list of integers, and returns the largest sum of any two elements in that list, or None if the list is of size 1 or smaller. So, largestSumOfPairs([8,4,2,8]) returns the largest of (8+4), (8+2), (8+8), (4+2), (4+8), and (2+8), or 16.

The naive solution is to try every possible pair of numbers in the list. This runs in O(n**2) time and is much too slow.

containsPythagoreanTriple(a) [10 pts] [autograded]
Write the function containsPythagoreanTriple(a) that takes a list of positive integers and returns True if there are 3 values (a,b,c) anywhere in the list such that (a,b,c) form a Pythagorean Triple (where a**2 + b**2 == c**2). So [1,3,6,2,5,1,4] returns True because of (3,4,5).

A naive solution would be to check every possible triple (a,b,c) in the list. That runs in O(n**3). You'll have to do better than that.

mergeSortWithOneAuxList(a) [10 pts] [manually graded]
Write the function mergeSortWithOneAuxList(a) that works just like mergeSort from the notes, only here you can only create a single aux list just one time, rather than once for each call to merge. To do this, you will need to create the aux list in the outer function (mergeSortWithOneAuxList) and then pass it as a parameter into the merge function. The rest is left to you. In a comment at the top of this function, include some timing measurements comparing this function to the original mergeSort, and a brief reflection on whether or not this change was worthwhile.

3-Way Mergesort [10 pts] [manually graded]
First, write the function threeWayMergesort(L) that takes a list L and does a modified version of mergesort on L, where instead of combining two sorted sublists at a time, you should combine three sorted sublists at a time (so after one pass, you have sublists of size 3, and after two passes, sublists of size 9, and so on). You may not assume that the size of the list is a power of 3.

Next, in a triple-quoted string just below your threeWayMergesort function, write a short proof that the function runs in O(nlogn) -- that is, in the same big-oh worst-case runtime as normal two-way mergesort. Then, write the function threeWayMergesortTiming() that empirically demonstrates that threeWayMergesort runs in the same big-oh as normal (two-way) mergesort (no more than a constant time faster or slower, even as n grows large). So all that extra work to write threeWayMergesort was for naught. Sigh. When you are done with this function, run it, get the timing data that proves your point, and include it with a very brief explanation at the end of that triple-quoted string just below threeWayMergesort.

Bonus/Optional: linearRegression(a) [2.5 pts] [autograded]
Write the function linearRegression(a) that takes a list of (x,y) points and finds the line of best fit through those points. Specifically, your function should return a triple of floats (a,b,r) such that y = ax+b is the line of best fit through the given points and r is the correlation coefficient as explained here (and yes, you must follow this exact approach). For example (taken from the text), linearRegression([(1,3), (2,5), (4,8)]) should return a triple of 3 values approximately equal to (1.6429, 1.5, 0.9972), indicating that the line of best fit is y = 1.6429x + 1.5, with a correlation coefficient of 0.9972 (so it's a really good fit, too). Note that the result is approximately equal to these values. Also note: the notes we refer you to do not discuss the meaning of the sign of r, so just return the absolute value |r|. And: you may ignore the case of a vertical line in your linearRegression code.

Bonus/Optional: bogusDataFinder(csvData) [2.5 pts] [manually graded]

this file

http://baseballguru.com/MLB2014.xls

bogusData = """\
playerID,nameFirst,nameLast,teamID,games,atBats,runs,hits,doubles,triples,homeRuns,runsBattedIn,stolenBases,caughtStealing,Walks,StrikeOuts,battingAverage
abreujo02,Jose,Abreu,CHA,145,5560,80,176,35,2,36,107,3,1,51,131,0.317
almonzo01,Zoilo,Almonte,NYA,13,36,2,5,0,0,1,3,1,0,0,14,0.139
abreubo01,Bobby,Abreu,NYN,78,133,120,33,9,0,1,14,1,0,20,21,0.248"""

assert(bogusDataFinder(bogusData) == [("abreubo01", "runs"),
                                      ("abreujo02, "atBats")
                                     ]
      )

15-112 Spring 2016 Homework 5 Due Sunday, 14-Feb, at 10pm

15-112 Spring 2016 Homework 5
Due Sunday, 14-Feb, at 10pm