Validation Vexation

Be liberal in what you accept, and conservative in what you send.
- Postel’s Prescription

Previous posts here have discussed reducing the burden of data entry on your application’s users by cutting down on the number of items that they are required to provide and, even if something is required, allowing it to remain incomplete for as long as possible, only enforcing that “required” status when the time comes that it is actually needed.

But what about the values that are entered, whether required or not?

The next step in minimizing the burden placed upon your users is to “be liberal in what you accept”. Do not impose validation rules which must be satisfied unless they are unavoidable. If you really must require them, make sure that the rules themselves are complete and correct.

Don’t Validate Formatting

The biggest, most common, and most annoying case of pointless validation rules are those which impose restrictions on how the user must format his entry. Any programming language worth its salt can effortlessly remove spaces from a string or insert a space after every fourth digit, so there is absolutely no good reason for credit card validations to care whether you enter the card number as “1234 5678 9012 3456″ or as “1234567890123456″, or even “12 3 45678901 234 56″.

Phone numbers can trip you up here, too. Does the program want them entered as (555)555-1212 or 555-555-1212 or +15555551212 or what? The correct answer, again, is that it should accept any of these formats (and more), which is easily implemented by simply removing any non-numeric characters and then adding or removing a leading “1″, depending on whether you want the country code in your database. This will also accommodate international phone numbers, which may have their digits grouped in patterns you don’t expect.

As long as it’s the right set of digits, let the user group them however he wants to.

Remember The Rest Of The World

When I moved from the US to Sweden, I paid my bank a visit to change my address with them. It went reasonably well and their system was smart enough to recognize that, since my new address was outside the US, the “state” could be left blank.

But it still required a ZIP code. Beyond that, it required the ZIP code to be formatted as a string of five digits with no intervening characters. While it does just so happen that my Swedish postal code is 5 digits, they’re grouped “226 47″. Also, the postal code goes before the city here, so we ended up having to enter the city as “226 47 Lund”, with a ZIP code of “00000″. This is obviously incorrect, but it was the only option for satisfying the US-centric validation rules.

Trust the User

Speaking of ZIP codes, ZIP code-to-city databases are pretty nice. They can be great for letting users enter their ZIP code and then automatically filling in a default city for them.

Note the word “default“. The last US city I lived in was Eagan, MN, ZIP code 55122. About half the ZIP code databases out there correctly list 55122 as Eagan. The other half list it as St. Paul, which is a good 10 miles north of Eagan. I knew I lived in Eagan, and the post office knew I lived in Eagan, but several websites insisted I was in St. Paul and provided no way for me to correct their mistake.

Looking up data instead of making the user enter it is good, but your database will be wrong sometimes. Allow the user to correct the lookup results when this happens.

 
I do have a bit more to say on ways that data validation can go wrong, but that’s a topic for another post. Until next time, what issues have you run into where too-strict or too-lax data validation has caused you problems?

[Post to Twitter]   [Post to Plurk]   [Post to Digg]   [Post to ping.fm]