Validation Vexation

Be liberal in what you accept, and conservative in what you send.
- Postel’s Prescription

Previous posts here have discussed reducing the burden of data entry on your application’s users by cutting down on the number of items that they are required to provide and, even if something is required, allowing it to remain incomplete for as long as possible, only enforcing that “required” status when the time comes that it is actually needed.

But what about the values that are entered, whether required or not?

The next step in minimizing the burden placed upon your users is to “be liberal in what you accept”. Do not impose validation rules which must be satisfied unless they are unavoidable. If you really must require them, make sure that the rules themselves are complete and correct.

Don’t Validate Formatting

The biggest, most common, and most annoying case of pointless validation rules are those which impose restrictions on how the user must format his entry. Any programming language worth its salt can effortlessly remove spaces from a string or insert a space after every fourth digit, so there is absolutely no good reason for credit card validations to care whether you enter the card number as “1234 5678 9012 3456″ or as “1234567890123456″, or even “12 3 45678901 234 56″.

Phone numbers can trip you up here, too. Does the program want them entered as (555)555-1212 or 555-555-1212 or +15555551212 or what? The correct answer, again, is that it should accept any of these formats (and more), which is easily implemented by simply removing any non-numeric characters and then adding or removing a leading “1″, depending on whether you want the country code in your database. This will also accommodate international phone numbers, which may have their digits grouped in patterns you don’t expect.

As long as it’s the right set of digits, let the user group them however he wants to.

Remember The Rest Of The World

When I moved from the US to Sweden, I paid my bank a visit to change my address with them. It went reasonably well and their system was smart enough to recognize that, since my new address was outside the US, the “state” could be left blank.

But it still required a ZIP code. Beyond that, it required the ZIP code to be formatted as a string of five digits with no intervening characters. While it does just so happen that my Swedish postal code is 5 digits, they’re grouped “226 47″. Also, the postal code goes before the city here, so we ended up having to enter the city as “226 47 Lund”, with a ZIP code of “00000″. This is obviously incorrect, but it was the only option for satisfying the US-centric validation rules.

Trust the User

Speaking of ZIP codes, ZIP code-to-city databases are pretty nice. They can be great for letting users enter their ZIP code and then automatically filling in a default city for them.

Note the word “default“. The last US city I lived in was Eagan, MN, ZIP code 55122. About half the ZIP code databases out there correctly list 55122 as Eagan. The other half list it as St. Paul, which is a good 10 miles north of Eagan. I knew I lived in Eagan, and the post office knew I lived in Eagan, but several websites insisted I was in St. Paul and provided no way for me to correct their mistake.

Looking up data instead of making the user enter it is good, but your database will be wrong sometimes. Allow the user to correct the lookup results when this happens.

 
I do have a bit more to say on ways that data validation can go wrong, but that’s a topic for another post. Until next time, what issues have you run into where too-strict or too-lax data validation has caused you problems?

     

Open Source or No?

Free/Open Source Software (FOSS) has taken the world by storm. Startups everywhere have built their businesses upon a foundation of FOSS products, with the LAMP (Linux-Apache-MySQL-Perl/PHP/Python) platform being the best-known among them, and some have gone beyond using FOSS to producing it. Even some of the more traditional, established companies are contributing to existing FOSS projects and releasing their own products as FOSS.

Going open source can bring major benefits for your software and your company, but it can also invite disaster if you’re not familiar with both the concepts and the culture behind it. So let’s clear up some of the common misconceptions surrounding open source:

You Still Own Your Code

Whatever code you write (or pay someone to write for you as a “work for hire”) belongs to you, unless other arrangements are made. As the owner of the copyright on that code, you will never be legally bound by the terms of the code’s license, so you will still be able to use it in whatever non-open way you might desire.

Until you accept someone else’s patch.

If you release a FOSS project and I submit code for it, then I still own the copyright on my submission. By incorporating my changes into your project, you become bound by the project’s licensing terms because you no longer own the whole thing. The one exception to this is if I assign rights over my contribution to you above and beyond those in the FOSS license, most often through the use of a Contributor License Agreement.

You Can Sell FOSS

FOSS licenses place no restrictions on who can distribute the software covered by the license or how. Their restrictions apply to those who make changes to it. Depending on the license, anyone making modifications may be required to do such things as:

  • make the source code available to anyone they distribute the software to
  • provide attribution to the original author
  • submit the changes back to the original author

 
Even if you are subject to the FOSS license, you can still sell or give the software to whoever you like. But they can also sell or give it to whoever they like, so the cost will tend to drop to zero rather quickly unless you can provide some additional value to encourage people to pay you for your version when they can get it free (and legally) from someone else.

One common way of doing this is to make use of a Contributor License Agreement, as mentioned above, which allows the original copyright holder to sell the project under a commercial license in addition to free distribution under a FOSS license.

Open Source Is Sharing

Open source is not a gimmick for getting programmers around the world to write a program for free which you can then turn around and sell. Unless the programmers gain some benefit from contributing to your project, they are unlikely to do so and, worse, you may get bad press in the FOSS community for trying to take advantage of them.

For this reason, FOSS licensing tends to work better for software which potential developers are likely to run on their own computers, not for the custom portions of an online service or other “unique” offering that’s intended to only run in one place.

I recently advised someone who wanted to set up a web-based service and sell subscriptions for access, but also to make it open source so that outside developers would contribute and improve on it. Although basing the commodity portions of such a system on existing FOSS projects works extremely well - that’s the heart of LAMP, after all - trying to present the service itself as FOSS would have gone quite poorly.

I explained that developers would have no reason to make free contributions to his for-pay service and, even if they did, they would need to set up their own working installations of the service to develop against, at which point they would be in a position to compete against him by offering the same service at no cost. This was a case for which FOSS was simply not well-suited to his objectives.

 
Have you considered using FOSS licenses for any of your projects or do you have further questions about FOSS which have not been addressed here?

     

Looking Beyond the Obvious

Near its end, the CyberPenguin case study mentions the discovery of “some small accounting inaccuracies”. To be exact, users were occasionally being double-charged for sessions.

If you’ve done much software development, that statement alone should be enough to have you thinking “concurrency issue” or, more specifically, “race condition”. Given that the application involved both a web-based management interface and a once-a-minute scheduled task to handle accounting, the obvious point of conflict was for the problems to result when an operator submitted an update through the web interface while the accounting task was running.

So off I went to ensure that such a conflict wouldn’t cause issues, then sent the revised code off to the client. He reported back the next day that it hadn’t had any effect.

After looking harder at the code, I came up with a way that this conflict could conceivably have still occurred under some contrived circumstance, eliminated that possibility, and sent it off. Still no improvement.

We went through maybe half a dozen rounds of this before I ran out of ways that the web application and accounting task could conflict, no matter how far I stretched the bounds of probability, and finally took a closer look at the application’s logs. Once I started looking at them as a whole, instead of just records of single users, I quickly noticed that these mischarges came in clusters, affecting several users at once, not random individuals. These clusters also tended to hit right on the hour.

Checking in with the client, it turned out that the specific times when the problems occurred matched up with the times when the per-minute rates changed.

To keep the accounting simple, the application deals with rate changes by closing out each active user’s session, charging it at the old rate, and then opening a new session at the new rate. With larger numbers of users logged in, this process was taking more than a minute to complete. The conflict wasn’t between the web interface and the scheduled accounting task, it was between two (or more…) copies of the accounting task! This also explained why I was never able to reproduce the problem in my own development and testing environment, as I had never simulated enough concurrent users in my tests to produce the problem.

Once the correct culprit had been identified, it was a simple enough matter to resolve by adding a check to prevent multiple instances of the accounting task from running concurrently.

The moral of the story:

Even if the cause of an issue seems clearly obvious, don’t forget to fully examine the system to verify that you’re solving the right problem before you spend too much time solving the wrong one.

     

Customize or Build to Order?

You have a project that you need to have developed and you know why you’re going to hire a programmer. Don’t forget to discuss with your developer how you want to have your project built.

It doesn’t much matter whether your project is the latest addition to the world’s surplus of data-entry systems or the next killer application which will revolutionize our lives, it’s going to involve a lot of common functions either way. Common functions which have already been implemented, dozens if not hundreds of times over. How much of that can and should you make use of?

There are existing frameworks which handle these common functions in broadly generic ways which can generally be customized to get an application based on them up and running relatively quickly and at a low cost. Most good frameworks also allow for plug-ins to expand on their base feature set similarly easily.

The other major option is to build a fully-custom application. This route is more expensive and takes longer to complete, but it ensures that you can get any desired features without being restricted by a framework’s underlying design or the availability of plug-ins. In the long term, it also can provide greater flexibility for your application if it is done correctly.

In some cases, it can be appropriate to combine both approaches, first building a framework-based prototype, then replacing it with a fully-custom final version. Due to the additional time and expense involved, this is generally only an option for large or business-defining projects, but the improved quality of the final result can justify those costs in such cases.

Characteristics of Framework-Based Development

  • Faster initial implementation
  • Lower cost
  • Provides a solid, well-tested foundation for basic functionality
  • Potentially more secure, due to lessons learned from attacks against other applications based on the same framework
  • Many good frameworks are available for free under Open Source licenses, but some licenses may place requirements on how you use framework-based code
  • Some frameworks include a “standard” look and feel, which is good for getting something that looks decent built quickly, but may ultimately limit design options

Characteristics of Fully-Custom Development

  • Can produce whatever you desire
  • Higher performance, as it will contain only what you need
  • Potentially more secure, due to not being a well-known and thoroughly-analyzed target, provided the developer pays attention to security
  • You will fully own the end result and can set your own licensing terms

 
What other advantages or disadvantages of either approach are there beyond what I’ve listed? Which do you tend to prefer?

     

AJAX Gone Awry

If you’re developing or deploying software, then it should have a purpose. Unless you’re doing cutting-edge research, then that purpose is probably not to provide a tech demo, nor to show off how many buzzword-laden features you can pack into it.

I’m often surprised at how readily people will forget this.

In late January, I ran across a question on LinkedIn which related to a website the asker had developed which was 100% AJAX-driven. The site itself looked nice enough, but it was a disaster technically. Unless JavaScript was enabled, it didn’t even display its front-page content, just a (non-functional) navigation bar.

With JavaScript on, the content did appear, but it turned out to be nothing more than a standard six-page site which could have been done statically with plain HTML pages and regular links to navigate among them without using JavaScript at all. The only thing lost would have been having the old page’s content fade out, then the new page fade in, when you navigate from one page to another.

Yes, the fade in/out is a nice effect. Yes, it looks cool. And, yes, AJAX is a very buzzwordy “Web 2.0″ technology. But it was used all wrong on this site. It added nothing to the functionality or usability of the site, at the cost of making it inaccessible to a substantial segment of potential users1. If you’re into SEO, this choice of technology also killed that by making the site essentially invisible to search engines.

I’m sure you’ve also seen sites which use Flash, SilverLight, or another “more interactive” technology in a way which does nothing beyond what traditional HTML can provide and makes no use of interactivity at all. Once again, such designs limit the ability of users to access the site and its content, shut out search engines from indexing the site, and add no actual value.

This is not to say that AJAX or Flash or SilverLight or whatever are intrinsically evil, or even suspect, but rather that you need to know what you’re trying to accomplish with your site or software and then choose appropriate technology - and an appropriate use of that technology - to support that goal. Just using technology for its own sake or because it’s the hottest thing at the moment will, in 99% of cases, do nothing to support the software’s purpose and is likely to detract from it.

What’s the most useless (mis-)use of technology you’ve seen lately?

 


1 w3schools.com estimates that 5% of users currently have JavaScript disabled, but most reports I’ve found from web server admins who have checked on it report 13-20% or more of their users have it off. Even if it is only 5%, though, do you really want to shut out one of every twenty potential visitors to your site?