CMU 15-112: Fundamentals of Programming and Computer Science
Homework 8 (Due Sunday 22-Mar at 8pm EDT)


Note: this is the first homework to be completed under our new fully-remote situation. Please read all the course communications carefully (starting with piazza post @1464) about how policies have changed. And please post any questions you may have to piazza, and we will gladly answer them!

Also note:
Required Problems

  1. Midsemester Survey [5 pts] [manually graded]
    We said you would get some points on hw8 for filling out the midsemester survey. Well, here are those points!

  2. TA-Led Mini-Lecture [15 pts] [manually graded]
    Each semester, our amazing TA's put together a wonderful series of mini-lectures on interesting topics, many of which may be helpful for your upcoming term projects. They have done the same this semester, and the TA's will forge ahead with these mini-lectures, even though they will all be done fully remotely. Thanks, TA's! Some notes:
    1. The schedule of TA-led mini-lectures is available on the course schedule.
    2. You must attend at least one of these, though you may attend more.
    3. You are very strongly encouraged to attend in real-time via livestream. If that is truly undoable, then you may watch the recorded video within 2 days of the lecture.

  3. movieAwards(oscarResults) [10 pts]
    Note: we will discuss this problem and provide strong hints (maybe very strong hints) in recitation this week (on Friday).
    Write the function movieAwards(oscarResults) that takes a set of tuples, where each tuple holds the name of a category and the name of the winning movie, then returns a dictionary mapping each movie to the number of the awards that it won. For example, if we provide the set:
      { 
        ("Best Picture", "Green Book"), 
        ("Best Actor", "Bohemian Rhapsody"),
        ("Best Actress", "The Favourite"),
        ("Film Editing", "Bohemian Rhapsody"),
        ("Best Original Score", "Black Panther"),
        ("Costume Design", "Black Panther"),
        ("Sound Editing", "Bohemian Rhapsody"),
        ("Best Director", "Roma")
      }
    
    the program should return:
      { 
        "Black Panther" : 2,
        "Bohemian Rhapsody" : 3,
        "The Favourite" : 1,
        "Green Book" : 1,
        "Roma" : 1
      }
    

    Note 1: Remember that sets and dictionaries are unordered! For the example above, the returned dictionary may be in a different order than what we have shown, and that is ok.

    Note 2: Regarding efficiency, your solution needs to run faster than the naive solution using lists instead of some other appropriate, more-efficient data structures.

  4. friendsOfFriends(friends) [10 pts]
    Note: we will also discuss this problem and provide strong hints (maybe very strong hints) in recitation this week (on Friday).
    Background: we can create a dictionary mapping people to sets of their friends. For example, we might say:
    d = { } d["jon"] = set(["arya", "tyrion"]) d["tyrion"] = set(["jon", "jaime", "pod"]) d["arya"] = set(["jon"]) d["jaime"] = set(["tyrion", "brienne"]) d["brienne"] = set(["jaime", "pod"]) d["pod"] = set(["tyrion", "brienne", "jaime"]) d["ramsay"] = set()

    With this in mind, write the nondestructive function friendsOfFriends(friends) that takes such a dictionary mapping people to sets of friends and returns a new dictionary mapping all the same people to sets of their friends-of-friends. For example, since Tyrion is a friend of Pod, and Jon is a friend of Tyrion, Jon is a friend-of-friend of Pod. This set should exclude any direct friends, so Jaime does not count as a friend-of-friend of Pod (since he is simply a friend of Pod) despite also being a friend of Tyrion's. Additionally, a person cannot be a friend or a friend-of-friend of themself.

    Thus, in this example, friendsOfFriends should return:
    { 'tyrion': {'arya', 'brienne'}, 'pod': {'jon'}, 'brienne': {'tyrion'}, 'arya': {'tyrion'}, 'jon': {'pod', 'jaime'}, 'jaime': {'pod', 'jon'}, 'ramsay': set() }

    Note 1: you may assume that everyone listed in any of the friend sets also is included as a key in the dictionary.

    Note 2: you may not assume that if Person1 lists Person2 as a friend, Person2 will list Person1 as a friend! Sometimes friendships are only one-way. =(

    Note 3: regarding efficiency, your solution needs to run faster than the naive solution using lists instead of some other appropriate, more-efficient data structures.

  5. invertDictionary(d) [10 pts] [autograded]
    Write the function invertDictionary(d) that takes a dictionary d that maps keys to values and returns a dictionary of its inverse, that maps the original values back to their keys. One complication: there can be duplicate values in the original dictionary. That is, there can be keys k1 and k2 such that (d[k1] == v) and (d[k2] == v) for the same value v. For this reason, we will in fact map values back to the set of keys that originally mapped to them. So, for example:
    assert(invertDictionary({1:2, 2:3, 3:4, 5:3}) == 
           {2:set([1]), 3:set([2,5]), 4:set([3])})
    
    Also, you may assume that the values in the original dictionary are all immutable, so that they are legal keys in the resulting inverted dictionary.

  6. Data Analysis, Sets and Dictionaries, and Dog Licenses
    In this problem, we will practice using dictionaries and sets, as well as practicing some data analysis, by exploring a real-world data set for dog licenses issued in Allegheny County (where CMU is). The raw data files, in CSV format, are available here. An explanation of what the fields mean is available here. Also, in order to allow for faster testing, we have created the file dog-licenses-2018-abridged.csv, which contains only the first 100 dogs from the 2018 data (so the file has 101 lines, since the first line contains the column labels).

    Notes:
    • This exercise involves writing 8 (relatively short) functions (plus perhaps one or two additional helper functions), and each function uses the previous functions, so you should definitely solve this in the order given!
    • Be sure to look very carefully at the test functions in hw8.py to help you fully understand the problems.

    1. readDogDataCsvFileInto2dList(year) [5 pts] [autograded]
      As a first step, we will read the raw csv data into a 2d list, where each row in the 2d list corresponds to a row in the csv data. To be specific, write the function readDogDataCsvFileInto2dList(year) that takes a year, which can be either an int like 2018 or a string like '2018' (so you have to handle both cases), and returns a 2d list of the data in the corresponding year's csv file (for 2018, that would be 'dog-licenses-2018.csv'). Note that you will want to use the helper function readFile, which we covered in the unit on strings, and which is already included for you in the hw8.py file.

    2. convert2dListToTable(data) [5 pts] [autograded]
      It's not very convenient accessing data in the 2d list. For example, to find the breed of a given dog, say in row 18, we would have to do this:
      data = readDogDataCsvFileInto2dList(2018)
      dog = data[18]
      breed = dog[1] # we just have to know that the breed is at index 1 (ugh)
      
      Instead, let's write the function convert2dListToTable(data) to convert the 2d list into a list of dictionaries, one dictionary for each row. We'll use the labels in row 0 (data[0]) as the keys. So then we can do this:
      data = readDogDataCsvFileInto2dList(2018)
      table = convert2dListToTable(data)
      dog = table[18]
      breed = dog['breed'] # so much nicer!  :-)
      
      Note: when you do this conversion, keep row 0 (the labels) in the result, so that the indexes of the original data and the resulting table match.

    3. cleanData(table) [10 pts] [autograded]
      Now that we have a table, let's write the function cleanData(table) to clean the data up a bit. Here are the rules to follow:
      1. Ignore all characters in the raw data that are neither alphanumeric nor whitespace. Hint: str.isalnum and str.isspace may be helpful here.
      2. Ignore table[0] (the labels)
      3. Convert LicenseType to only be one of: 'Male', 'Female', or 'Unknown'
      4. Make the Breed be so-called Title Case, where the first letter of each word is uppercase, and all others are lowercase. You may find str.title to be especially useful here. :-)
      5. Make the Color a list of lowercase colors. So if the raw color is 'WHITE', the Color becomes ['white'], and if the raw color is 'WHITE/BROWN', the Color becomes ['white', 'brown'].
      6. The DogName should be Title Case.
      7. The ExpYear should be an int (not a string)
      8. The ValidDate should only be the year, as an int. So for example, you would convert '2017-11-27T10:42:11' to just 2017.

    4. getCleanedTable(year) [5 pts] [autograded]
      This short step just combines the previous three steps. So it takes a year as an int, and uses the previous 3 functions to read in the csv file, convert it to a table, clean that table, and return the cleaned table.

    5. getValueSets(year) [5 pts] [autograded]
      This function takes a given year, gets the cleaned data for that year, and returns a dictionary mapping each column label to a set of all the unique values in that column. For colors, you need to add each color from the list (do not add the list to the set, since you cannot do that!). Again, see the test functions in hw8.py for more details.

    6. getValueCounts(year) [5 pts] [autograded]
      For the given year, return a dictionary that maps each label to another dictionary which maps each value to the count of the number of times that value occurs. Once again, see the test functions for details.

    7. getMostPopularValues(year) [10 pts] [autograded]
      Now write getMostPopularValues(year), that uses the getValueCounts function we just wrote to return a dictionary that maps each label to the most popular value in the table for that lable. If more than one value ties, then include a set of all the most popular values for that label.

    8. someMoreDataAnalysis() [5 pts] [autograded]
      In your hw8.py file, you will find the function someMoreDataAnalysis(). Edit that function to answer the questions as follows: Using or perhaps adapting the code you wrote above, replace all the 42's in the function someMoreDataAnalysis() in hw8.py with the actual answers. You must find all the answers using Python, and in no other way. And you must include all the Python code you wrote in your submitted hw8.py file, even though this function can contain the actual answers hardcoded rather than computing the answer each time (since that would make the autograder take a very long time and it may timeout).
      • You should only edit the answer lines, which are the lines that start with the label A1, A2, ...
      • On each answer line, you should make a single edit, replacing each 42 with a single word answer. Do not place quotes around your answers, even if they are strings.
      • Do not make any other edits! For example, do not add or remove any whitespace!

    That's it! Good luck, and have fun!

  7. Spicy Condensed Hw8
    Note: the spicy problem for this hw is different from previous hw's. Here, before starting the spicy portion, you should first complete the entire autograded portion of hw8, and you need to score full points on the autograder. Then... You should rewrite hw8.py using as few characters as possible -- the fewer the characters, the higher your spicy score (though of course your condensed solution must still pass all the autograded tests to get any spicy points). You also need to condense by a reasonable amount to get any spicy points (so just submitting your hw8.py file would of course not get any points). To begin with, you'd want to remove all comments and extra whitespace, and use shorter function and variable names. Also, do not include the function someMoreDataAnalysis(), nor any helper functions you wrote to solve that problem. But you can perhaps do even better than that!!!! So long as your code passes the autograder without obvious hardcoding the test cases in the autograder, the shorter the better. If you do this exercise, you will submit your spicy condensed hw8 solution to etc/spicy-hw8 in Autolab, which is different from hw8. Be sure you also separately submit a fully-working, normal, non-condensed hw8 to Autolab! Have fun!!!