Computer Science 15-110, Lecture 9 (Sections M-Q), Fall 2009
Homework 9
Due:  Thu 19-Nov-2009 at 10pm (email copy to your CA)
(no late submissions accepted).


Read these instructions first!



Programming guidelines:

Read this first!


  1. Review
    1. addLineNumbers
    Core
    1. Graphing Run Times
    2. The ListOfStrings Class (Custom ArrayList)
  2. Extension
    1. The SortableArrays Class
  3. Challenge
    1. Automated Run Time Graphing
    2. Quadratic Regression (Even Better Run Time Graphing)
    3. Shell Sort
    4. Polynomial Time Approximate Subset Sum

  1. Review
    1. addLineNumbers
      Write the following method:
         public static boolean addLineNumbers(String srcFilename, String destFilename)
      [ This method is inspired by an actual need I had this semester:  to write quiz5, I needed to add line numbers to Snake.java, and I did so by writing a program just like the one assigned here!  --DK ]
      This method takes two filenames, a srcFilename and a destFilename.  The method copies the source file into the destination file while adding line numbers to each line (with the first line numbered as 1, not 0), and returns true if the operation succeeded and false otherwise (if it cannot find the source file or cannot open the destination file for writing).  To be precise:  each destination line should begin with a line number, followed by a colon, a single space, and then the corresponding source line.  The only complication is that all the colons must be vertically aligned -- that is, all the line numbers must have leading spaces as required so that the line numbers are right-aligned.  Also, the largest line numbers must have no leading spaces before them.  Thus, before you can print anything to the destination file, you first need to know the total number of lines in the source file (why?).  For this, you may consider reading the source file twice -- once to determine the number of lines and a second time to actually generate the destination file.  Alternatively, you could read the source file lines into an array or ArrayList of strings, but that's much less efficient (for example, what might go wrong with this approach for a file with, say, 100 million lines?).
       
  2. Core
    1. Graphing Run Times
      Using the timing code from the class notes, neatly draw (by hand) graphs of the run times for selection sort and merge sort.  The x axis of each graph should represent N, the number of elements in the random arrays.  The y axis should represent T, the amount of time required for that algorithm to sort random arrays with N elements.  You may need to run the algorithm several times and average the results to get reliable data, especially for smaller N.  The graphs should include large enough values of N so that they clearly demonstrate the quadratic nature of selection sort (that is, you should see a parabola emerging), and the sub-quadratic nature of merge sort (that is, you should see that it is "flatter" than any parabola).

      Do not submit your graphs on Thursday night with the emailed submission of the rest of the assignment!  Instead, hand in the physical copies of your graphs on Friday in recitation.  Everything else is due Thursday night via email, as usual.

      Also: this should not be a very time-consuming task.  If you're spending more than 10 minutes on it, you're on the wrong track!
       
    2. The ListOfStrings Class (Custom ArrayList)
      Start with this file:     Hw9ListOfStrings.java
      Do not modify the Hw9ListOfStrings main class. Make this code work by adding the appropriate classes with the appropriate methods as described by the test methods called by this main method. Note that you do not have to add any code to the test cases, though you do have to solve them with general-purpose solutions (and not just hard-code the example test cases!).

      Discussion:  The ListOfStrings class is a simple version of the ArrayList class (at least ArrayList<String>).  It shows how you can make arrays "grow automatically", and much more efficiently, by allocating larger arrays than you need and then only using part of the allocated array.  For example, the strings "Let's", "Go", and "Pens" might be stored in the following array:
         { "Let's", "Go", "Pens", null }
      The array is of length 4, but only the first 3 elements are actually used to store the strings.  If we were to add another string to this list, we would not have to increase the size of the array -- we could just store it in that last unused location (in place of the null reference).  This is the same general technique that ArrayList uses!

      To accomplish this, our ListOfStrings not only has an array that stores the strings, but also a "size" instance variable that keeps track of how much of the array is currently in use.  To be safe, and also to let the garbage collector do its thing, we'll be sure to set all the unused elements in the array to null.  Also, for clarity, we will store all the unused elements in the array at the top (larger indexes) of the array.  And for further clarity, we will use the terms "size" for the number of used elements and "capacity" for the total number of used and unused elements, and we will avoid the term "length" entirely.

      When we add a string to a "full" array (with no unused/null elements, so its "size" equals its "capacity"), then we must finally allocate a new, larger array.  However, instead of making the array just one element larger, we will double the array's capacity.  In this way, we minimize the amount of allocating while still not excessively over-allocating.  So we'll double the capacity of the array, and copy all the strings from the first array into the same locations of the new array, and then finally add the new string (knowing that now there is room for it).
       
  3. Extension
    1. The SortableArrays Class
      Start with this file:     Hw9SortableArrays.java
      Do not modify the Hw9SortableArrays main class. Make this code work by adding the appropriate classes with the appropriate methods as described by the test methods called by this main method. Note that you do not have to add any code to the test cases, though you do have to solve them with general-purpose solutions (and not just hard-code the example test cases!).

      Be sure to study the examples in the course notes on Sortable Objects and also look at the Examples.  Also, read the comments in the test code carefully to understand how the SortableArray instances should be compared to each other.
       
  4. Challenge
    1. Automated Run Time Graphing
      In the file Hw9BonusRunTimeGraphing.java:  First, automate the process of graphing run times so that all the data is gathered automatically.  Second, place the data into an array.  Third, plot this array graphically (just draw lines between each successive pair of points).  Place plots of competing algorithms (clearly labeled and in different colors) on the same graph, to clearly show the difference between them.
       
    2. Quadratic Regression (Even Better Run Time Graphing)
      In the file Hw9BonusQuadraticRegression.java:  Expanding on the previous problem, use whatever resources you can to learn about quadratic regression, where you find the best-fitting parabola for a given data set.  Plot the best-fitting parabola for each curve in your "improved run time graphing" program from the previous problem.  This alone should visually demonstrate that, say, selection sort is indeed quadratic and merge sort is sub-quadratic.  Finally, for each graph, compute and display some measure of confidence or goodness of fit, which should further confirm our conclusions.
       
    3. Shell Sort
      In the file Hw9BonusShellSort.java:  Carefully read the Wikipedia page on Shell Sort.  Then, implement shell sort (you may refer to the pseudocode at the end of that page, but do not refer to any actual Java implementations, which are readily available, of course!).  Finally, adapt the timing code from the class notes to demonstrate that Shell Sort is indeed faster than the quadratic sorts but, as the data size increases, eventually loses out to the O(nlogn) sorts (eg, mergesort and/or quicksort).  If you did either part of the previous bonus on improving/automating the run time graphing, include shell sort is your automated analysis.
       
    4. Polynomial Time Approximate Subset Sum
      To complete this bonus problem, you must first have completed the previously-assigned "Subset Sum" bonus problem (which you may complete now, for some small bonus credit, if you have not already done so).  Next, read the Wikipedia page on Subset Sum, particularly the last section on a "Polynomial Time Approximate Algorithm".  Then, in the file Hw9BonusApproximateSubsetSum.java, implement an approximate solution to the subset sum problem.  Perform some analysis on your solution to demonstrate that it basically works as expected.

      Brief discussion:  As previously noted, the Subset Sum problem is NP-complete, which means (in part) that there is no known polynomial-time (that is, "fast") algorithm for solving the problem exactly.  Thus, for even modestly large arrays, finding the exact solution may require vast amounts of time  However, as this problem demonstrates, approximate solutions of NP-complete problems often may be found in much less time.

Carpe diem!