Managing the Uncertainty of Legacy Code: Part 1

On June 3, 2020, TechTalk hosted a meetup at which I spoke about managing the various kinds of uncertainty that we routinely encounter on projects that involve legacy code. I presented a handful of ideas for how we might improve our practices related to testing, design, planning, and collaboration. These ideas and practices help us with general software project work, but they help us even more when working with legacy code, since legacy code tends to add significant uncertainty and pressure to every bit of our work. Fortunately, we can build our skill while doing everyday work away from legacy code, then exploit that extra skill when we work with legacy code. Here are some questions that came up during this session and some answers to those questions.

You’ll find the remaining articles in this series here as they are released.

One of the issues is that the legacy code base consists of useful code and dead code and it’s hard to know which is which.

Indeed so. Working with legacy code tends to increase the likelihood of wasting time working with dead code before we feel confident to delete it. I don’t know how to avoid this risk, so I combine monitoring, testing, and microcommitting to mitigate the risk.

Microcommits make it easier to remove code safely because we can recover it more safely. Committing frequently helps, but also committing surgically (the smallest portion of code that we know is dead) and cohesively (portions of code that seem logically related to each other) helps. If our commits are more independent, then it’s easier to move them backward and forward in time, which makes it easier to recover some code that we mistakenly deleted earlier while disturbing the live code less. We will probably never do this perfectly, but smaller and more-cohesive commits make it more likely to succeed. This seems like a special case of the general principle that as I trust my ability to recover from mistakes more, I feel less worried about making mistakes, so I change things more aggressively. When I learned test-driven development in the early years of my career, I noticed that I become much more confident to change things, because I could change them back more safely. Practising test-driven development in general and microcommitting when working with legacy code combine to help the programmer feel more confident to delete code—not only code that seems dead.

Even with all this, you might still feel afraid to delete that code. In that case, you could add “Someone executed this code” logging statements, then monitor the system for those logging statements. You could track the length of time since you last saw each of these “heartbeat” logging messages, then make a guess when it becomes safe to delete that code. You might decide that if nothing has executed that code in 6 months, then you judge it as dead and plan to remove it. This could never give us perfect confidence, but at least it goes beyond guessing to gathering some amount of evidence to support our guesses.

More testing, especially microtesting, puts more positive pressure on the design to become simpler: less duplication, better names, healthier dependencies, more referential transparency. I have noticed a pattern: as I simplify the design, I find it easier to notice parts that look irrelevant and I find it clearer that those parts are indeed dead code. Moreover, sometimes obviously dead code simply appears before my eyes without trying! This makes it safer to delete that code, using the microcommitting and monitoring as a recovery strategy in case I get it wrong.

So not all legacy code adds value to the business… but it is hard to know which part does.

Indeed so. We have to spend time, energy, and money to figure this out. I accept responsibility as a programmer to give the business more options to decide when to keep the more-profitable parts running and to retire the less-profitable parts. As I improve the design of the system, I create more options by making it less expensive to separate and isolate parts of the system from each other, which reduces the cost of replacing or removing various parts. Remember: we refactor in order to reduce volatility in the marginal cost of features, but more-generally in the marginal cost of any changes, which might include strangling a troublesome subsystem or a less-profitable feature area.

The Strangler approach describes incrementally replacing something in place: adding the new thing alongside the old thing, then gradually sending traffic to the new thing until the old thing becomes dead. Refactoring the system to improve the health of the dependencies makes this strangling strategy more effective, which gives the business more options to replace parts of the legacy system as they determine that a replacement would likely generate more profit. As we improve the dependencies within the system, we give the business more options by reducing the size of the smallest part that we’d need to replace. If we make every part of the system easier to replace, then we increase the chances of investing less to replace less-profitable code with more-profitable code.

This illustrates a general principle of risk management: if we don’t know how to reduce the probability of failure, then we try reducing the cost of failure. If we can’t clearly see which parts of the legacy code generate more profit and which ones generate less, then we could instead work to reduce the cost of replacing anything, so that we waste less money trying to replace things. This uses the strategy outlined in Black Swan of accepting small losses more often in order to create the possibility of unplanned large wins.