Fail fast

When failure is a possibility, design to fail fast rather than slowly. Doing so reduces the cost/impact of failure. What is equally important, failing fast makes further attempts feasible. Learning from previous failures makes future attempts more likely to succeed. This principle is widely applicable in software development:
  1. Methodologies that have fail fast mechanisms baked in are more likely to generate greater ROI. More on this later.
  2. Guerilla SOA is arguably a fail fast take on big up-front SOA.
  3. Code that is written to fail fast is likely to be more reliable in production.
  4. Small, frequent check-ins are likely to cause less overall rework than big, infrequent ones.

Verification: When and How?

But how do you decide at any given checkpoint if we have a success or failure at hand? The quality of verification is crucial. Verification by peer review, while valuable, is prone to oversight. The proof of the pudding is in the eating. The more times you get to eat the better. The analog of eating here is testing functionality. A truly iterative process of software development where functionality gets tested iteratively is likely to achieve better ROI (everything else remaining constant).



Okay, so no one uses waterfall anymore. But we still have projects where big up-front analysis and design are the norm and continuous integration means weekly build. In such cases, we only have limited verification (peer reviews of requirements, design and code) till the very end. Failures (if any) are slow and horrible.

Incremental agile is what almost all XP and Scrum teams follow. They run through the stories for a release doing just enough/just in time analysis, design, coding per story. The boundaries between design and code are often blurred but that is not material to this illustration. Truer verification now becomes possible at the end of every story (QA/customer testing/sign-off). However, each story still gets only attempt. Any changes (learnings?) after that go back into the backlog to be prioritized and taken up with everything else.

But as Jeff Patton points out, it is possible to view each story as a series of progressive enhancements:
  1. Necessity - core functionality (e.g. user registration)
  2. Safety - validations etc. (e.g. confirm via email, add a captcha)
  3. Flexibility (e.g. support openID)
  4. Luxury (e.g. add ajaxy feedback on available userids, password strength)
Yeah the example is a bit contrived (openID would mostly be another story even in incremental mode) but you get my drift. We can now design a release plan that allows the team to iterate on a story, progressively enhancing it. The story sponsor reviews (tests) each story multiple times. Failures (if any) are faster and cheaper. The team learns better. I like this line from a Werner Vogels interview:

With a new radical service, you try to go into prototype mode pretty quickly, and then you start iterating on that until you feel that you understand your business problem.

Stories in regular business applications may not qualify as radically new but very often the team is new to the application in question. "You only understand it when you do it" is a much under-appreciated truth of all knowledge activity (if not all activity).

Expected cost/time

It may be argued that for a given chunk of functionality, iterating increases overall cost/time as compared to doing it in one go. This is only true when the risk of failure is zero. For situations of non-zero risk, the expected cost/time can often be lower with an iterative approach. The table below shows this for a hypothetical but realistic risk profile where risk decreases as learning increases. Your mileage will vary depending on the risk profile of your team and functionality.

Failure is not an option

Sometimes you get to hear sponsors saying they don't care about downside risk because failure is not an option. Some of these projects run into a death march followed by a blow out followed by movement of key people. Then a new team and a new IT partner get to do it all over again.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.