Sunday, August 16, 2009

Memoize

I recently came across the computer science concept of memoization.  What's cool is that I had made some suggestions on designs at work that amount to the same thing.

In general, memoization refers to having a computer function remember the problems it's already solved, and just return the answer again if it recognizes the problem, rather than compute it again, as most software does.  The common example is the factorial function.  Since the factorial is the product of all the numbers up to the argument, on the way to, say 6!, you will compute all the factorials less than 6.  Usually the function would loop over all the numbers and multiply them together.  Instead, a memoized version would have stored all the results up to the highest argument it had previously seen, and then only compute those beyond that (storing the new results) if necessary.

One type of object-oriented class that I had proposed is a smart Angle class.  Since angles can be represented in many different ways, I wanted to be able to get any version the calling software wanted, without knowing what version it had been stored in.  In addition to radians and degrees, an angle can be (and often is in our work) stored as the combination of sine and cosine values.  Besides being able to use an angle without knowing what format it was in originally, I suggested that the other representations should be left uncomputed, but once they were asked for, the result should be stored to avoid unnecessary further computations.  Clearly this can been seen as memoization of the conversion.

In the general case, the "hit rate" of a memoized function may not be that high, depending on the number and valid range of the inputs.  However, for an object whose data never changes once it is created, essentially one of the inputs has been fixed, raising the chance that the same result will be requested repeatedly.  (In my design, each Angle object would be for a particular angle.  If the value of the variable changed, a new Angle object would be created to store it.)

By the way, I'm fascinated with the idea of "smart numbers" in computer programs - numbers as objects (or other collections of data, if not object-oriented) which store more than just their value, or that store exact representations in alternate formats.  The Angle example given, quadratic irrationals, storing numbers as rationals or continued fractions instead of decimal, and numbers that store their own error bounds, are all concepts that I've thought about over the years.  If anyone knows where one of these concepts has been used in a real program, I'd be interested to hear about it.

2 comments:

  1. With disk space no longer being a premium as it was in the past, memoization makes perfect sense; processing time is accelerated the next time a value is needed and the trade-off in disk space seems very justifiable!

    Numbers storing their own error bounds is interesting as statistical results that have error bounds are computed values by definition, right?

    Statistical results, if saved such as in memoization, should be allowed associated metadata such as confidence intervals. If any of the initial parameters upon which the statistical results is based were changed, one would want both the results and their metadata to be updated.

    I usually see applications where tests are run and the results and CIs not saved and updated in the same database; rather, researchers save the routines and re-calculate results in their usual workflow, saving results only in output files rather than within the database as new calculated fields. I like the discussion as I see the potential application to health science databases.

    ReplyDelete
  2. One of the cool things about techniques that track the range of a number, or the error bound, is that you can add the error due to the calculation itself.

    Even if the inputs are known exactly, if the function is an approximation to the true result, and knows its error bounds, that can be reflected in the range of the returned result.

    It seems pretty slick to handle errors in input value and errors due to approximations all at once. Some techniques even can let you know how much of the error in the output value is due to each error source.

    ReplyDelete