Would You Hurry Up?!

Topical Information

The purpose of this programming assignment is to test your ability to use implementation timings to estimate/confirm algorithm time complexity.

Program Background

First make sure you've read sections 9.1, 2, & 5 and section 11.16 (the 11.15 material is optional). Also check out this page which gives another perspective on timing code. It might be helpful if you've also read through sections 6.2, 3, 5, 8, 16, & 18, section 7.5, and sections 11.1 & 3 (about sprintf).

Program Information

A crucial part of programming well is making sure you've chosen the fastest algorithm for the job. The realm of comparing how fast different approaches to solving a problem are is called algorithm analysis. Although much of it is beyond the scope of this class, it should be clear that an algorithm that takes n² operations to solve with an input of 'size' n is going to be slower than one that takes only log(n) operations. (We are mainly concerned with the relative growth rates of the graphs as n approaches infinity. Also 'log' is generic here and not necessisarily the common logarithm. In fact, it is usually log₂.)

If you know the platform you'll be deploying on, you could run empirical tests -- time each algorithm on larger than reasonable inputs on the target platform and pick the quicker. But, this isn't necessarily going to work out. It might be that an algorithm will simply take too long (like 200 years) to complete. Or you may not be able to simulate deployment conditions accurately enough to get good readings (multiple users, network traffic, etc.). At any rate, just timing isn't enough. Luckily for us, it is a place to start so we can get a grasp on some of these loftier concepts early.

First we must be able to time how long code takes to execute. This can be done with a stop-watch, but not very accurately. So, in 11.16 and the supplemental notes, we discuss ways to time things within our program. (Obviously you'd want to remove timing code from a final production version...*smile*) The basic idea is to ask the system what time it is before we begin, repeat our code many thousands/millions of times, get the ending time, subtract, and divide by the number of repetitions. The means of getting the time from the system is what may vary from one platform to another.

The routines given both in the book and on the web are okay to start, but there are more interesting things you can do with this system. First, about the book's prn_time() function. What if we don't want to print it but just retrieve it? All we get back is the clock based figure -- not both. We can do better if we make a results structure to hold a clock and a time result. Create a static local variable of this type. Now return a pointer to it.

As to the printing issue, we can simply pass an enumeration value of PrintResults or NoPrintResults to indicate our preference.

And what's with the 'begin_' members of the time_keeper? They are never used again! Maybe we could pass a 'boolean' to 'report' from the beginning instead of the last saved time.

And while we're adding arguments, why not just accept a pointer to a results structure. If it is NULL, we'll return the address of our static local as before. If not, we'll fill in their members instead and return their pointer.

*whew* So, we should be able to call the prn_time() function in several different ways now:

    TimeResults t, *p;

    prn_time(NoPrintResults, FromBeginning, &t);
    prn_time(PrintResults, FromSaved, NULL);    /* behaves just as in book */
    p = prn_time(PrintResults, Saved, NULL);
    /* and so on and so forth... */

Another approach to timing is to encapsulate the timing routine itself (the begin, loop, end, calculate) in a function. To this function we would pass the number of loop iterations (maybe 0 to indicate using the default?) and the function we desire to time. The drawback to this in C is that you must specify the function pointer type in the prototype. We could write many versions of our timing function that took various common types of functions to be timed, but that would be error prone, tedious, and unlikely to be complete. (If only we had a template mechanism like in C++!!! *sigh*)

At any rate, your task is to make the suggested adjustments to the prn_time() function. Write drivers to test using your function to time various things: rand() (like in problem 2.23 but without the printing of the random values), pow (math.h; use a power of 0.5), sqrt (math.h), modulo (you write), fmod (math.h), iterative fibonacci (section 4.13), recursive fibonacci (section 5.14), qsort (section 8.5), bubble sort (section 6.7), merge sort (section 6.9), sort_words (section 6.13), and case insensitive sort_words (you write).

As to the estimating/confirming analyses part of this, collect timings of the fibonacci's and the sorts with different 'sizes' of input. Input size for a sort is the length of the array. Input size for the fibonacci is the value you pass. When you divide these timings by a function of the input size, you can discover if that function is representative of the algorithm's timing performance. For instance, dividing the bubble sort timings by the square of the size should converge to a positive value. This convergence shows that the proper function for bubble sort's time complexity is n². On the other hand, if you were to divide the timings by the input size itself, the values would continue to diverge. This indicates that the function n is an underestimate of the time complexity of bubble sort. Finally, dividing the timings by the cube of the input size would generate values that converge to zero (0). This indicates that n³ is an overestimate of the time complexity of bubble sort.

Try each collection of timings against the functions: 1, log(n), n, n*log(n), n², and n³. Good input sizes for the sorting would be doublings: 100, 200, 400, 800, etc. The number of zeros here would depend on the number of loops you are timing through. The more loops, the shorter the arrays can be. The fewer loops, the longer the arrays should be.

You might also consider calling the 'size' for the sort_words variants the number of elements multiplied by the average length of the words in the list. Try it both ways and see if this changes your results.

Input size for the fibonacci runs is more interesting. Investigate to see what gives you a better time spread. (This will be most difficult with the iterative fibonacci.)

So, which algorithm should be associated with which function?

This assignment is (Level 6).

Options

Add (Level 1) to alter your prn_time() to include getrusage and/or gettimeofday information in addition to time and clock. (This will work on the HP/UX, on Linux, and even Cygwin -- at least it is present...it might not be as accurate since it will have to map through the Windows timing API...*shrug*)

Add (Level 2) to make a few examples of encapsulated timing loop functions (as mentioned above) -- just to show it can be done. Maybe the function pointer types 'long(*)(short)' (for the fibonacci's), 'int(*)(void)' (for rand), and 'void(*)(void*,size_t,sizet,int(*)(const void*,const void*))' (for the sorts -- yes, you can redo merge and bubble to follow qsort's example) might prove useful? You can reuse the prn_time() function(s) from within your timing encapsulation functions. This would be best done by passing the three arguments for prn_time() into your timing encapsulation functions.

Add (Level 1) to make graphs of your collected timings vs. input size. Do the shapes of these graphs coincide in any way with the shapes of the functions you found to be representative? (Maple can graph points and functions. Excel can only graph points and some functions -- luckily all the ones needed here. Both of these programs are available in Harper labs.)