While so many software projects start off with the best of intentions, such as a clean architecture, clear goals, and stated objectives, not all of them do. Moreover, of the ones that do, not all of them stay that way forever.
With time, feature requests, financial pressures, competing priorities, and changing developers, it is highly likely that what began as a shining example of code quality eventually becomes a monolith.
Monolithic codebases are, by their very nature, hard to maintain. This can be for any number of reasons, including functions that:
Do too much.
Know too much.
Have too many responsibilities.
Rely on (or too much on) global state; and
Aren't testable.
In these situations, if you change something, very often something else -- in a section of the application that has no apparent connection or association with the code that was changed -- breaks.
For these reasons and a host of others, the code becomes fragile, and people commonly believe that the best course of action is to rewrite applications from scratch. However, rewrites often end up as expensive failures.
Not sure why? Then ask yourself a couple of questions to uncover why:
How long would it take to recreate the existing level of functionality?
Is this less or more time than removing the technical debt in the existing application?
Can you cope with a lack of visible forward progress for that amount of time?
Do you have the resources and time to fix critical and security bugs on the old system, while rewriting it?
Will a new system be better than the existing one, or will it reimplement the same mistakes?
Will the new system be feature complete with the existing one?
Can you maintain two codebases and two development teams?
Will you repeat the lessons learned from the old system?
I'm not saying that refactoring an existing system is always the best choice. However, while not as flashy or attention grabbing, it can often be the less costly choice and the more sane approach.
With that said, I'm now going to step you through the essentials of how to refactor a monolithic codebase.
One: Do You Understand the Application?
Before you can do anything, what do you know about the application? While it's often tempting to dive on in and just get coding, that's the worst thing to do. The best thing you can do is to learn as much as you can about it instead.
If your monolith is like so many others, there's likely no single, centrally organized knowledge store. Instead, information will be stored in a wide variety of very disparate locations. These will likely include the following:
In the minds of previous developers, business owners, managers, project managers, and other stakeholders
Code comments
Code commit messages
TODOs
One or more README files
Code documentation
One or more wikis
A bug-reporting tool
Find as much of this information you can and bring it together, into one central location. As you're doing this, here is a series of questions to ask to help uncover as much as you can:
Why was the application created?
Who wanted it built?
Who worked on it?
What is it meant to do?
What are its key features?
What are its additional features?
What are its top bugs?
What are its supplementary bugs?
Hopefully, this list will inspire you to ask a host of follow-up questions, which will let you find out all that there is to know.
Two: Is It Under Version Control?
With knowledge of the application acquired, ask if its source code is stored under version control. If not, then get it under version control straightaway! The last thing you want to do is to make any changes and not be able to revert them.
I strongly encourage you to use Git, but Mercurial is another excellent choice if you have an aversion to Git. I'd also encourage you to store it in a remote repository as well, whether that's GitHub, Bitbucket, GitLab, or one of the myriad other code-hosting services.
Three: What Is the state of the Test Suite?
Next, what level of code coverage is in place? Depending on the age of the application, the number of developers who've worked on it (and their skill levels), the way those developers were employed, etc., there may be no code test suite in place.
If this is the case, you're going to have to put a basic test suite in place before you can begin. If you don't, you'll never be sure about the impact of changes that you'll make. If code coverage is already in place, ask yourself the following questions:
What level of coverage is available?
How long does the test suite take to complete?
Do they complete or do they exhaust available memory?
How many tests fail?
How many tests are skipped?
How many tests are out of date?
Is there a mix of unit, integration, and functional tests?
Are there sections of the codebase that have no tests?
Are there comments such as Quick Hack to be fixed later?
Are there comments such as Whatever You Do DO NOT TOUCH!?
What comments exist in the tests?
Is the test suite run with full error reporting enabled?
If done well, your test suite should help you get an understanding of how the code works far quicker than diving into every class file. Take the time to read through and thoroughly understand the tests.
Four: What Static Analysis Is in Place?
Now that you've learned more about the application and have a handle on the test coverage, is static code analysis used? If you're not familiar with it, static code analysis is:
...the analysis of computer software that is performed without actually executing programs. In most cases, the analysis is performed on some version of the source code, and in the other cases, some form of the object code.
By regularly running a static-code analyzer over your code, such as Phan, you can help ensure that code quality is improving, not declining. You can also trace the source of bugs back to specific commits that introduced them.
If your code isn't already using one, there are a host of third-party packages and online services you can use, regardless of your software language(s). These include:
Five: Start Refactoring
Now that your team has as much information to hand as can be expected, it is time to get started refactoring the application. So that you do it correctly, let's discuss some sage advice that I came across recently:
There is no perfect design, there is only a better design
Know that your code will never be perfect, even if that were at all possible. While refactoring will help you to improve it continuously, so that it is simpler, more readable, more maintainable, and more testable than it was before, the task will never end.
You may always feel that you can do better, but there comes a time when you have to accept that, at least for the time being, it is as good as it can be.
At this point, you have to discipline yourself to leave it and move on to something else. Don't fall into the trap of "just making it a bit better." You've improved it. It is better than it was. Let it go and move on.
The key principle to cleaning up a complex codebase is to always refactor in the service of a feature
Refactoring can get a negative perception if the changes being made are either trivial or just for the sake of aesthetics. However, sometimes, this kind of refactoring is necessary and, over time, helps ensure that the quality of the codebase is better.
However, if that's all that is being done, then the value is questionable. Instead, primarily ensure that the changes being made are for a clear and effective purpose. These can include creating a new feature or fixing an outstanding bug or defect.
Six: Have a Refactoring Project Plan
Now that you're fully aware of how the application works, it’s time to start refactoring it. However, you have to have a plan! What goes into such a plan? This talk from Mozilla recommends five key considerations:
Break into a series of achievable tasks.
Come up with a realistic timeline and resource requirements.
Work on the pieces in isolation or in parallel with other projects.
Staff it seriously.
If working in parallel, account for dependencies.
Seven: Implement the Habit of Opportunistic Refactoring
Next, encourage your team to build a habit of opportunistic refactoring. If you've not heard of the term, it was coined by Martin Fowler. Here's how he describes it:
Any time someone sees some code that isn't as clear as it should be, they should take the opportunity to fix it right there and then -- or at least within a few minutes. This opportunistic refactoring is referred to by Uncle Bob as following the Boy Scout rule --always leave the code behind in a better state than you found it.
While not a silver bullet, with these regular little cleanups, the code's quality should always improve, and there should be no need to dedicate sprints to code cleanup, as it will be in a continuous state of improvement.
Eight: Use Dedicated Refactoring Tools
One of the beauties of refactoring is that you don't need specific tools to do it. This is because, as Martin Fowler says, if you "take small steps and test frequently," then you should be fine. That said, refactoring manually will be a slower process and take more diligence, yet it's still achievable.
However, if you’re already experienced with refactoring, why not save yourself time and effort and make use of the tools that are built into, or available for, the major IDEs and text editors? Regardless of your approach, however, remember to take it slow and test, test, test.
Also, as you make each change, review your test suite. Does it need new tests? Have you uncovered a bug in another part of the system that is related to what you were working on? Then add one or more tests for it. Are some of your existing tests no longer relevant? Then remove them. Always ensure that your test suite stays up to date.
Where To From Here?
While this article hasn't delved too deeply into the specifics of refactoring, it has shown a series of eight principles that you can follow to approach the task correctly. It may be very tempting to rewrite your application from scratch, but I’d caution you to advance slowly to that conclusion.
Yes, it’s the trendy solution, but it’s not always the correct one. Take your time, learn about your application, ensure that it’s worth doing first, and if you believe that’s correct, then do so.
Otherwise, step through the advice that I’ve presented and make the most of that treasure trove of an application, maintain continuity of service, and remove the technical debt, one line at a time.