Everything By The Numbers...

Topical Information

The purpose of this programming assignment is to test your ability to apply mathematical techniques of summing sequences to approximate numerical answers to difficult questions. In this case, integration (the area between the curve and the x-axis) will be considered.

Program Background

First make sure you've read sections 6.2, 3, & 16 and the notes about generating (pseudo) random numbers. It will also be helpful if you've read through a calculus text about the following: Simpson's Rule, the Trapezoidal Rule, series and sequences, and Taylor series approximation.

Program Information

A basic approach to finding a numeric approximation of an integral is to apply Simpson's Rule or the Trapezoidal Rule with more and more segments until you seem to get a convergence upon an answer. (The Trapezoidal Rule will require more segments to converge than will Simpson's Rule.) In fact, those ideas are presented in this sample program. In the sample, 40 segments are used and Simpson's has already come to a convergence (it probably did before 40, but I was just experimenting). Also at 40, Trapezoidal has yet to converge.

To refine this program, you'll try two things. First you'll change segments from a constant to a variable and read it from the user. Note that Simpson's requires an even number of segments -- warn the user about this and then fix it if they enter an odd number of segments (you should probably add 1 instead of subtracting 1 to give a better approximation). Keep the rest of the program the same. Now you can run your program several times and test higher and higher segment counts to see when each method converges. (If you like, you can place a yes/no loop around these two rules so you don't have to re-run the program each time. This would actually be nicer for you and any prospective users...*HINT*)

Next, write a second version of the program (yes, a separate .c file!) that will continue each technique until the successive approximations are within a certain tolerance value of each other -- sometimes it just needs to be exact enough.

To do this, you'll first try with 1 segment (just the sum of the end-points and no interior points). Next try with 2 segments (add an internal point). If the difference between the old sum and the new sum is within tolerance, we're done. If not, we continue to 4 segments. Pictorially we have:

    |------------------------------------|    1 segment

    |-----------------|------------------|    2 segments

    |-------|---------|--------|---------|    4 segments

Notice, though, that at 2 segments we are only calculating 1 new value. Why not just calculate this new value? In fact, the difference between this sum and the previous one is just that new value (adjusted, of course, by the (b-a)/n/?? multiplier -- ?? is 2 for Trapezoidal and 3 for Simpson's). That automates our difference finding and reduces the number of points that need calculating. Does it continue to work? Yes, for 4 segments, we are adding two new points but keeping the other three the same! Wonderful. So, we want to add to our sum only the new point(s) needed by this round of segmenting. Which are they? n==1 is special and is the end-points. After that, though, we have odd multiples of the inverse of the segments:

    segment count    |    points added
   ------------------+---------------------
          2          |      1/2 * (b-a)
          4          |  1/4 * (b-a), 3/4 * (b-a)
          8          |  1/8 * (b-a), 3/8 * (b-a), 5/8 * (b-a), 7/8 * (b-a)

And so on, and so forth. (Recall that the odd numbers are given by 2n+1 for n=0, 1, 2, ... Here we'll stop when the odd number is greater than the number of segments.)

This is a little tricky for the Simpson's since the coefficients alternate through the sequence's internal terms (4, 2, 4, 2, 4, 2...). (Luckily the Trapezoidal coefficients are all 2.) To see what goes weird here, note the coefficients that would be applied:

    |------------------------------------|    1 segment

                      4
    |-----------------|------------------|    2 segments

            4         2        4
    |-------|---------|--------|---------|    4 segments

So on the 2 segments one, the '1/2' point has coefficient 4, but in the 4 segment one, that point has coefficient 2. *shiver* Seems icky. But, that's why you earn the big levels. *grin*

Another method of numerical approximation that has nothing to do with the underlying mathematics involved but does involve simple geometric principles is known as Monte Carlo Methods. This area of approximation is based on random numbers (well, as random as we can get them). Because of this randomness (and because gambling was one of its first areas of application) it is named after Monte Carlo -- the famous European casino town.

The idea is simple: bound the function in a box within the desired area. To simplify things, the box is required to include the x-axis as a border or completely contained. Next we randomly generate pairs of values within this box. If the y value of the pair is between the function and the x-axis at the x value of the pair, then we've found a point within our function's area. If not, it doesn't count and we try again. After we've generated many pairs, we calculate a percentage of the pairs that were within the function's area and multiply this by the area of the bounding box (which is a simple rectangle). In general, a larger number of pairs will give a better approximation, but this is not always the case. (Run the given sample program several times and see that the default trials is sometimes closer to the correct value than the 'higher resolution' 50,000 trials.)

For this method, you'll make the same first change: a loop to ask for number of trial pairs until they decide that's enough. But the separate program for approximating to a tolerance is insane! You'd like to think that just throwing more trial pairs in there would actually give you more accuracy, but you'd be wrong. Just now I ran this and got the better accuracy with the default 10,000 pairs as with 1,000,000 pairs! So don't just not try this at home, don't try it at all!

A word about tolerances. Typically 1e-6 is a wonderful approximation tolerance. However, you could also make a second yes/no loop to allow them to specify tolerances until they are done. Of course, don't forget to watch for negative differences as well as positive ones (it could happen!)

And, to make things more useful, have the program report the number of iterations required to reach tolerance for each test. Now you can compare tolerance to specified segments in an easy way.

This assignment is (Level 6).

Options

Add (Level 2) to accept arguments from the command line that invoked your program to set certain conditions. (New background for this would be sections 6.11 & 14. Also see the strto...() family of string conversion functions in Appendix A -- pp666-8.) The '-n' option will set the number of segments to be used in the first test of Trapezoidal and Simpson's. The '-p' option will set the number of pairs to generate for the first Monte Carlo. These two options are for the first (counted) version of the program.

The '-t' option will set the tolerance for the first test of each method. This option is for the second (toleranced) version of the program.

The '-y' option will turn off the yes/no loops so that only one test of each method is made. The '-h' option will print a help screen of what options do what and skip other processing. These options are for both versions of the program.

Some examples:

   $ num_integ
                   /* all interactive */

   $ num_integ -h
                   /* prints help and quits */

   $ num_integ -y -t 1e-4
                   /* runs each test once at tolerance 1e-4 */

   $ num_integ -n 11
                   /* runs first T and S test with 12 segments */

   $ num_integ -n 10 -p 400000 -h
                   /* prints help and quits -- oops... */

Add (Level 1) to write up your findings about tolerance vs segments/trials. How many segments were required to reach a certain tolerance? How many trials were required to reach a certain tolerance? How many segments were needed to reach convergence for Trapezoidal vs. Simpson's? How many Monte Carlo trials were needed to get really close to your Simpson's result?

Add (Level 2) to make a simple memoization of previously calculated x values to help avoid redundancy in the Monte Carlo routine. (This isn't the best way to use memoization, but it is an occasion so we'll take it!) You'll need a pair of local arrays: old_x and old_f. When you generate a new pair, before calling f(x), look for the x value in old_x. If found, simply use the parallel value in old_f. If not, calculate and insert this new x, f(x) pair in your arrays. To make things as simple as possible, just cache 5000 pairs. Keep the array old_x sorted by always placing the new values in the proper place to maintain order (ala insertion sort). That way you can binary search the old_x list for the value (p664 of Appendix A. Remember that when comparing floating point, == and != aren't recommended, try a different technique of determining equality -- there are several alternatives.)

Add another (Level 1) to make only one array in your memoization: an array of struct's which contain both an x and an associated f members. This will ease the pain of sorting the parallel arrays and only slightly complicate the binary search comparison function. (See various parts of sections 9.1-6 for more on structures.)

Add another (Level 2) to make the memoization a linked list of x, f(x) pairs instead of parallel arrays or an array of struct's. Linked lists are discussed in more detail in Chapter 10. See also the online notes about searching sorted linked lists.

Add another (Level 1) to write a 1-3 page idea of how complicated the f(x) needs to be before the memoization really helps. If the function is really fast to evaluate, then it might just be faster to do that than to keep a sorted array/list and search it. Try some other functions like: sec(x)*(tan(x)-sec(x)), (e^4)*(4^x)/(x!), or other mildly complicated functions (yes, they get worse). Do some simple timings to back up your idea (or dispute it).

Add another (Level 2) to fix the Monte Carlo function. That's right...FIX it. Right now it doesn't really calculate the integral between a and b of f(x). Instead, it calculates the integral between a and b of |f(x)|. For instance, the integral from -pi to pi of sin(x) is 0, but you'll calculate it as 2 (i.e. 1+1 instead of -1+1). So, you'll need to take into account that parts of the function below the x axis are part of the integral, but are negative!