Optimizing Software From 20,000 Feet

The First Rule of Program Optimization: Don’t do it.
The Second Rule of Program Optimization (for experts only!): Don’t do it yet.”

- Michael A. Jackson

If you spend much time with people who have any involvement with software development, you’re going to run across a conversation about optimizing software. Either the program is too big or too slow or too chatty on the network or too something and somebody wants to do a little optimization to improve on that.

It usually starts with the natural impulse of geeks to tinker with their systems, but project leads, marketing departments, and managers are often drawn in by the promise of a better product.

Should you find yourself in that position, contemplating either undertaking an optimization project or authorizing one, stop and consider what the proposed optimization is and how much it will actually improve things.

Optimizing Line-By-Line Is Rarely Worth It

Micro-optimizations are generally not worth the time it takes to make them. Shaving off a couple milliseconds from a function that only runs once is entirely pointless - it will produce no perceptible improvement, yet it comes at a potentially high cost in development time. Simple optimizations of this type are often already done automatically by the programming language behind the scenes anyhow, while more complex attempts may backfire.

The primary exception to this is when the code in question will be repeated many, many times in rapid succession. (Programmers often call this a “tight loop”.) If the code will run 10,000 times, then making it take a millisecond less each time will save a total of 10 seconds. If your user is sitting and waiting for the task to complete, that 10 seconds is an eternity. On the other hand, if it’s part of a 6-hour non-interactive process to close out your monthly books, the 10 seconds saved is meaningless despite the repetition.

Optimizing Algorithms Works Much Better

Several years ago, as I was just starting my programming career, I had a job doing data entry on a system which needed to run a check for duplicate records before posting completed tasks into its archival database. For performance reasons, it checked only those active records which were marked complete - and it still took nearly half an hour to run.

Eventually, the programmer who wrote the system left the company and I inherited his responsibilities for it. One of the first things I did was take a hard look at the duplicate checking code.

The duplicate check, as originally written, looked at every single record in the archival database for each active record that was being checked. Testing 10 active records for duplication against a 10,000-record archive database required 100,000 record-level comparisons. No wonder it was slow. It used an extremely inefficient algorithm.

Within a day, I had rewritten it with a better algorithm which only needed to make one pass over each database. With 1,000 active records and 10,000 archived, it could test every active record (not just the completed ones) with only around 11,000 record-level comparisons. It also used the databases more efficiently, bringing total run time down to roughly 15 seconds. Vastly improved performance, plus a more thorough check. A truly worthwhile optimization!

The Most Important Optimization

If I were to revise the original version of that duplicate check today, I could do even better. I’m confident that I could get it under 5 seconds and probably down into the 1-2 second range, while still running on the same, now horribly outdated, hardware and software.

Why are such extreme improvements possible? Because, at the time, I was able to devise a better algorithm than the original programmer had and because I now have several more years experience behind me than I did then.

If presented with my revised version, though, I would argue against further optimization of that code. Since it was just run once a week, it wouldn’t be worth the effort involved in bringing it down from 15 seconds to 5, maybe not even if it could get down to 1. Rewriting to eliminate the need to run the check at all might be worthwhile, but making the check faster would not.

The most important optimization is to optimize the skills and experience of your developers. Software development is not a commodity product. Getting someone with the skill to choose the right algorithms and the experience to know when an optimization wouldn’t produce sufficient improvement to justify the time invested will get you better results, in less time, and often for a lower overall cost.

[Post to Twitter]   [Post to Plurk]   [Post to Digg]   [Post to ping.fm]