project-management – Build Artifacts with Andrew

Timeline of the Universe. Courtesy: Wikimedia Commons

Let’s say you have a legacy piece of software, and something isn’t working about it. Maybe it’s too expensive to maintain, or nobody knows how, or it’s impossible to hire developers who know COBOL. Maybe it’s unreliable or unscalable.

The Big Bang approach is where you make a whole new piece of software, do your best to replicate the old software, and on one terrifying day, you disable the old software and turn on the new.
The Strangler Fig pattern is the generally-accepted way to modernize legacy software into a microservices architecture. It involves breaking services off of the legacy software and replacing them one-by-one with new microservices. This has the advantage of making smaller, incremental changes, with corresponding smaller and more frequent risks.

We’ve talked a bit about why it’s easy to start off in the easy-looking Big Bang approach. It’s simpler, and it’s tempting to avoid messing with your legacy system; after all, why make changes to the thing you’re trying to get rid of?

So, you’re attempting a Big Bang start-over project. What now? Normally, the project goes off on the risky path and continues that way for a while. You’re probably even making progress, getting things off the ground, gathering requirements, starting to re-implement everything. If you’re very, very lucky you complete your project under-budget and on-time, you flip the switch and everyone migrates to the new system with a minimum of fuss. I’m sure this does happen sometimes; the Big Bang is risky but it’s not impossible.

How do you know, then, if you’re in trouble? What are the risks, and what do they look like before they sink the project?

Are we there yet?

Big Bang projects often take years to get to their Minimum Viable Product (MVP). During that time, the organization is putting a lot of resources into the project, and there are no results to speak of. If you’re clever, you can claim small wins and keep stakeholders interested, but even so if you’re pinned down and asked “Are we there yet?” your answer is “no.”

Some of the problem here is that it’s really hard to estimate the time frame of a modernization project. Often you’re modernizing because nobody understands the existing system, and gathering your current requirements is a large part of the project. What’s more, figuring out your requirements can often be a matter of archaeology: “Hey, what’s this section of code?” followed by “Hey, is anybody still using this feature I discovered?” And without firm requirements, how can you guess how long it will take to complete the rewrite?

So projects are left with either a really wide time frame (“It could take anywhere from one week to six years”) or they take the Engineer Scotty Method and estimate the top end (“It’ll take six years, cap’n!”). In either case, an astute stakeholder would be extremely skeptical of such a long-running modernization project, but sometimes they just have no other choice.

A lot can happen over the course of a multi-year project. Budgets tighten, and unfinished projects are often among the first to be cut. Personnel change, and new managers might not buy into a years-long project with no tangible results. Attitudes change; people that agreed to the project in the past may no longer believe in it.

The real danger here, though, is when the project exceeds the time estimates. The original estimate might have been too optimistic. Requirements could be discovered in the course of code-spelunking. Requirements can change – there could be new legal or security requirements, or workflows might have changed. But any of these can cause the project to extend past the original time estimates.

And when the project extends beyond its schedule, that’s when patience really gets thin.

So, if you’re part of a legacy modernization project, how do you identify and head off these sorts of problems? I unfortunately don’t have too much of a solution. If you have multi-year time estimates, I’d say your risk of losing momentum is pretty high. But the main thing is to just keep an eye on stakeholders’ temperatures with regards to the project: Are people getting impatient? But also: Are budgets in doubt? Do requirements change frequently?

What if it also did this?

Another big risk is scope creep. With any project, you run the risk of stakeholders asking for new features before you’re even done with your MVP. Ideally, you’d have a product manager that can say no to changes in the scope, but even so it can sometimes be difficult to say no.

Obviously, these types of scope changes can extend your timeframe and stretch your developer resources. They’re challenging, and they’re not unique to legacy modernization projects. However, the extreme time to MVP on a Big Bang project comes with an equally high risk of scope creep.

What do you mean it’s supposed to do that?

Your Big Band project has carefully gathered requirements, worked with users to verify every use case is accounted for, combed through the old code to find everything that the legacy system used to do. Then you’ve painstakingly recreated all of the old features and maybe discarded a few that nobody needs anymore. You’ve maintained stakeholder momentum for months or years and managed to last all the way to the end of your long timeline. You’re ready to switch over from the legacy system to the replacement!

Now comes perhaps the biggest risk of the project: Day One. There are a million things that can go wrong: mistaken requirements, mis-implemented features, unmigrated or incorrectly migrated data, incorrect integrations with outside systems, networking problems, scaling problems, features marked for deletion that – oops! – it turns out Accounting is actually using. Many of these can be discovered with careful testing, but not all of them. Flipping the switch is the moment of truth: the moment when all of the changes first encounter a real user.

It almost goes without saying that frequent deployments and short user feedback loops are considered best-practices for a reason. User feedback is the gold standard for the success of a product, and the more changes you’ve made in between feedback, the greater the risk of doing things that don’t align with expectations. With potentially years without feedback, you’re running an awfully high risk!

Moreover, the blast radius on an all-at-once deployment is the entire application. If you release a single feature, you likely won’t cause other features to fail. If you release everything, anything can fail. And when the blast radius is huge, so is the area you have to search in troubleshooting the things that have failed.

So, how do you know when you’re at risk of this type of Day One catastrophe? Anytime you’re making a lot of changes at once, but especially if you’re changing out one application for another. Fixing it really just requires not doing that.

These aren’t an exhaustive list of the ways you can get in hot water in a Big Bang-style, but these are some of the worst trouble you can get into. If you’re seeing these things on your project, then you probably need to change course. In the next post, I’ll touch on how to change course once you’ve gone a significant way in the wrong direction.

Tag: project-management

Legacy Modernization: This is fine!

Are we there yet?

What if it also did this?

What do you mean it’s supposed to do that?