Yesterday we held our first Legacy Codereatreat in Budapest. While it is very similar in concept to regular Coderetreats, there are some very important differences. We start the day by introducing an existing legacy code base - a rather ugly one - and let the pairs figure out what it does. Yesterday we've been using the trivia code base on GitHub, so that participants could choose from a large variety of languages, but still have approximately the same code base.
According to Michael Feathers' definition of legacy code: "it is code without tests". So after the first getting to know session the next session was about adding characterization tests to the code that would help us during the remaining sessions. Since the only UI to the application are log messages, but they are scattered all around the code base, it is a relatively safe test suit to generate runs with random inputs, and save the output for reference. It won't help in figuring out where the error is, but it will catch most of the mistakes one makes during refactoring. They can also help when adding behaviour, but when fixing bugs they will brake, and need to be replaced.
There is an important gotcha related to characterization tests worth noting: with the same language and same seed you may end up with different random numbers on different computers. Actually it's enough to update your compiler to a newer version, to get in trouble. So always store the input data, and do not lean on seeding the random generator.
The remaining sessions focused on adding behaviour, and fixing bugs. Our aim was to create a realistic environment, while leting the participants experiment with techniques that will increase their confidence when changing code. Each requirement was designed to force some kind of refactoring in order to adhere to the SOLID principles, most importantly the Open-Closed principle. Unfortunately though most participants decided to hack the new behaviour into the existing code base.
The lesson we took home from this as facilitators is that during the first few sessions it's better to directly ask the participants to clean up a certain part of the code by extracting certain classes. The reason we tried to avoid directly requesting refactorings was that we didn't want to give the impression that refactoring existing code without a clear goal is a good idea. On the other hand I do believe that before adding code, one should think at least twice if the design can be adjusted, and brought to a state where the new behaviour is added as a new method or as a new class developed purely with TDD. Well... at least if you already have good characterization tests you can lean on.
All in all I think that our first stab at this format was successful: all of us learned a lot from it. As I'm working at a company where we have more legacy code, than code that is already covered with automated tests, I experience the horror of touching that kind of code on a day to day basis. When I started working at the company, at first it was daunting - even horrifying - even to fix bugs in the legacy parts, and gaining the confidence took some time. However I also know that there are already larger and larger islands of good code, and we tend to make most of our changes in these good parts. But it takes time to create these islands and to gain that confidence, and a legacy codereatreat is a really great way to let people experience this within a small 8 hour time frame. This way they can go back to work on Monday with a clear goal: we will be patient, but we will slowly make things better. It won't happen overnight, but by making just one small favour to yourself by making a part of the code just a little cleaner every time you make a change to it, you can make a huge difference in the long run.