Snipping the top bit of Nicholas C. Zakas’s Top of the Month newsletter (go sign up!), with permission.
One of my favorite things in the world is painters tape (also called masking tape). It seems like something silly: some tape you put on a wall when you’re painting to avoid getting paint on the wall. The tape doesn’t have a strong adhesive, so it can be pulled back off the wall without damaging it. What I love about painters tape is the philosophy behind it: painting is messy, and rather than trying to avoid making a mess, painters tape allows you to make a mess initially and then clean it up easily. Even the best, most talented painter is going to splatter some paint here and there, get distracted, or otherwise end up with paint going where it shouldn’t. It’s a lot faster, easier, and less frustrating to use painters tape to cover up areas where paint is likely to go and then remove the tape to create a nice, clean, finished area. What does this have to do with software engineering?
Painters tape is all about a concept called fault tolerance. Instead of expecting everything to go well, you instead expect that there will be mistakes. When you expect there to be mistakes, you make decisions not to avoid all mistakes but rather to easily recover when a mistake occurs. Got paint where it shouldn’t be? It doesn’t matter if that spot was covered by painters tape. Forgot to put on the painters tape? Now that mistake is a bigger deal. As software engineers, we can think the same way with the code we write.
Making your code fault tolerant is about asking yourself the question: how will this fail? Not if it will fail, but assuming that it will fail, and in which ways will it fail?
Great analogy, I like it.
But having painted for several years professionally, no commercial house painter uses tape.
You get good enough with the tools you have(the brush) and technique(angling and paint volume) to paint very fast and very accurately without extraneous tools.
Another analogy there? You decide.
Oh yeah!!! Masking tapes are like ‘garbage collectors’ a feature invented to cleanup the mess some developers does in the system when they’re lazy enough to think in better planning, solution architecture.
Maybe not a commercial painter, but it’s heavily used when painting intricate patterns on cars, motorcycles, what have you. Maybe then think of it this way – if you’re building something static/less complex (like painting a huge bland canvas), you’ll need to account for less mistakes. But as the project gets trickier, so should your fault tolerance grow along with it.
Garbage collection doesn’t exist because of “lazy developers”… it exists because manual memory management is tedious and error-prone, and the errors have security implications that can be resolved automatically. Painters tape is not like garbage collection.
commercial painter not needing “extraneous tools” is like saying “code needs no comments”.
My uncle is a painter for about 40years, has his own company in Germany and uses tape a lot. Windows, sockets, etc.
But probably he’s just not as awesome as you are
The problem with this analogy is that users often see the sloppy result of the overspray or drips. Trapping a problem and hiding it from the user doesn’t make it go away, it just makes the behavior more perplexing to the user.
Unanticipated problems are HARD. The key is to have monitoring in place to detect them so you can fix them. Trying to predict what will fail, why, how to fix it, and what to tell the user is not only a fool’s errand, it’s a waste of time. Or, in short:
“Never test for an error condition you don’t know how to handle.” — Daniel Keys Moran
Not throwing software gore at the user is basically UX and branding 101. Anticipating failure modes is not only NOT futile, but necessary to write remotely-decent software. Logging and handling exceptional state are both separate non-mutually-exclusive tools we have available. As for “making the problem more perplexing to the user”, I don’t think failing to control the presentation of a problem is going to help a single thing.
I think of a thing that happened at CodePen this year. We display grids of items, based on a query. There were rare edge cases where something was afoul in the data, it caused a client side error, and the entire grid would fail/disappear. We built an error handling boundary around individual items such that an error like these would kill and individual item but not the whole grid. Painters tape. We still log and fix the data issues.