Software developer blog

This year I have talked at different Hungarian venues about software development related topics several times. Although these presentations are captured on video and are available on the internet they are in Hungarian, so I decided that the next few posts will explore some of them in English, and in more depth.

I gave the Test Driven Mockery talk at the April event of PHP Meetup Budapest. Although code examples and frameworks are in PHP, the talk is basically language agnostic, and it’s just as useful for PHP programmers as it is for Java, C#, Ruby or even C++ coders. After all: it’s about test doubles in general.

At the time of writing this blog entry more and more software developers and companies decide to adapt test driven development as one of their primary practices to avoid code rot and fear of change. Of course there are crowds who still haven’t even tried TDD either because they haven’t heard about it, or it sounded too weird for them to try. However there is another lot more interesting group of educated professionals who after practicing TDD for a considerable amount of time decide to abandon it. As I was talking with a group of these people I recognized that most of the time their disappointment in the technique is easily tracked back to their misunderstanding of a very important related topic: test doubles and mocking. On one hand if you completely avoid using them you end up with a slow and fragile test suite that’s hard to maintain, but if you use too many of them the promise of ease of change, well tested and flexible code becomes a lie. Finding a good balance is kind of an art.


A world we wished upon

Before we dig deep into the topic let’s recap what is test driven development, and why do we like it so much. Test driven development consist of a really short cycle of three steps: red, green, refactor. Red means that you should write only as much of a test that is sufficient to fail, and not compiling is failing. Green means that you should write only as much of production code that is enough to pass the currently failing test. You should be done with the first two steps of the cycle within a 30-90 seconds. If you exceed this time frame chances are that you are taking too large steps. (Some suggest that you should revert in such a case.)

The third step – refactor – is the most important part of test driven development: you should clean your code while keeping all your test green. One refactoring step should not take you more than about 30-90 seconds, and after each you should be able to run your test and confirm that – while your code got more readable – everything still works. Once your code is as clean as you can make it, you can go back to the first step: writing the simplest possible test that would still fail.

Does that sound crazy to you? Well… I strongly disagree. The very beauty of test driven development is that it takes you back to the old days when you first tried writing a simple program as an exercise. What you probably did was to do the simplest case, than test it out. After that you added the next interesting case and then you took it for another spin. If it didn’t work then you fiddled around with it, and then tried again. What you probably did not do, was to clean your code, since after a few rounds testing out all the features got time consuming, and you did not want to break anything.

Test driven development formalizes this very natural process of iteratively adding the next simplest thing you can think of, and if you did not cheat it also provides a reliable automated test suite that makes it really easy to check if everything is still working thus eliminating fear of change. Instead of hacking new features into a design that do not support them, you can mold the design to fit the features. In fact you can have your tests drive the design: just wish for the perfect interface in your test, and it shall become reality. Your design will also be modular, since at least two applications will be using every single line of code: the production app, and the test suite. Your tests will be example usages of your classes providing a living technical documentation of your interfaces that can not get out of date without failing the build. But most importantly it puts you into a very tight feedback loop.

The loop gets loose

That last part about the short feedback loop is a pretty fragile statement though. The problem is that as your application grows the number of your test will get out of hand pretty quickly. A sizable application will have several hundreds of thousands of tests. If each one of those takes 0.01 seconds to run, running the whole suite will take about an hour. Of course you can run some tests in parallel on today’s multi core machines, but it is still pretty far from the short few seconds feedback loops that we strive for. The good news is that if your tests take this much time to run, then probably you are doing something wasteful, like accessing the hard drive, drawing on the screen to simulate a button click or testing a web application through the server it runs on. In short: you do not want your tests to depend on slow things. In-fact you do not want your test depend on anything apart from your own production code that they test, and speed is only one of the reasons for that.

We want to avoid as many of the dependencies as we possibly can! But how do we do that? How do we decouple the tests of an application from the plethora of libraries, frameworks and APIs that we have come to rely on so heavily? How do we avoid hitting the hard disc when the production code we test needs to reach the database at some point to function properly? The answer sounds simple: decouple the application from them by inverting outgoing dependencies, and then replace the dependent modules with test doubles.

Now I can hear some of you say that you have learned at the university that everything should be tested in the environment that it is used in. To some extent this might be true for the relatively small amount of integration tests that we write, but for the most part it’s a horrible advice. Any test – no matter how well you replicated the real environment – is flawed, and can only prove when a program is incorrect. Our aim thus is not to prove correctness but to provide a safety net that works most of the time, and is fast enough so that we can run it any time we wish to.

Obviously doing all that requires a careful layering of your application, and that is where good architectural design comes into play. Your aim should be to minimize the size of the interface between the DB layer, the business layer, the presentation later, and peripheries like the web or the window manager. By doing that you make sure that you can change your mind regarding tools even years into the project, and that you do not need to replace too many objects in your tests mitigating the risk of not testing the code in it’s original context.

I won’t go into the topic of architecture in more depth – at least not in this blog post – but it’s worth watching Bob Martins “Architecture, the lost years” on YouTube or read his screaming architecture and clean architecture blog entries.

So what the heck is a test double?

When I was new to test driven development my first tactic was not to replace anything, and my tests did have the problems I talked about before. I realized that this is not good, so what I started doing was that I passed everything up and down the call stack to the bottom of it where I had my business logic pass data to my persistence objects. That concept failed miserably: apart from forcing a strong “tell don’t ask” violation it also pushed logic down into the pluming code making it’s tests non trivial and plenty, so after all it solved nothing, and made a huge mess.

A better method is to create an interface that the business logic can call, and then inject the database layer object in the production version, and a simple surrogate in the tests. These surrogates are called test doubles: a dumb implementation of an interface the code you are testing depends on. The complexity of that implementation may vary from being a simple return statement to being a full blown in-memory version of your persistence layer. We will explore the possibilities later on.

Simple test doubles

For the code examples I will stick with PHP, but as a Java/C#, Ruby, Python, C++ or any other kind of developer you shouldn’t have a problem understanding this code. Let’s suppose we would like to write a simple “Hello World” application. We do not want to depend on the console output, so we will wrap it up in a nice little object and even create an “Output” interface the wrapper object can implement. This is how it looks like:

interface Output
{
    public function writeLine($line);
}
 
class ConsoleOut 
{
    public function writeLine($line)
    {
        echo "$line\n";
    }
}
 
class Greeter
{
    private $output;
 
    public function __construct(Output $output)
    {
        $this->output = $output;
    }
 
    public function hello($user)
    {
 
    }
}

I already added the greeter class with the output injected through the constructor and we wish to add the following line to the hello function:

$this->output->writeLine("Hello $user!");

But to do that we will have to write a test, that should look something like this:

class TestGreeter extends PHPUnit_Framework_TestCase
{
    /**
     * @test
     */
    public function testGreeter()
    {
        $greeter = new Greeter(/* Some test double should go in here */);
 
        $greeter->hello("User");
 
        // Some assertion should also go here.
    }
}

Now we have to find some object to pass into the constructor of Greeter. Obviously passing ConsoleOut in would make no sense: actually writing stuff to the console is a slow operation, not to mention that it would be kind of hard to write a meaningful assertion. (Unless of course you are willing to read the video card buffer or install a camera with character recognition. Neither seems a very appealing option to me. In PHP you can also use ob_start(), but there are rumors of a special place in hell for those who actually do that… ) So here is the simplest thing we can do: create a FakeOut class, that implements the writeLine function by pushing the argument on an array that’s a public member of the class:

class FakeOut implements Output
{
    public $calledWith = array();
 
    public function writeLine($line)
    {
        $this->calledWith[] = $line;
    }
}

Now we can finish our test:

...
    public function testGreeter()
    {
        $fakeOut = new FakeOut();
        $greeter = new Greeter($fakeOut);
 
        $greeter->hello("User");
 
        $this->assertEquals("Hello User!", $fakeOut->calledWith[0]);
    }
...

That’s a little too much to test a single line of code, isn’t it? In the slide deck for the talk I show a few different approaches to this problem but I will skip right to the solution here. Most unit test frameworks either come bundled with a mocking framework, or there is one available. In the case of PHPUnit it has it’s own mocking framework, but there are also alternative libraries. The aim of these libraries is to simplify the process of creating test doubles like the one above. Here is an example of how the built in mocking functionality of PHPUnit works:

...
    public function testGreeter()
    {
        $mockOut = $this->getMock('Output');
        $mockOut->expects($this->once())->method('writeLine')->with("Hello User!");
 
        $greeter = new Greeter($mockOut);
 
        $greeter->hello("User");
    }
...

This is not only a lot shorter, but it reads better. What happens is exactly what’s written there: we expect the method write line to be called exactly once with the parameter “Hello User!”. The only way it could read better is if we extracted the assertion like this:

...
    public function expectStringWrittenTo($mockOut, $string)
    {
        $mockOut->expects($this->once())->method('writeLine')->with($string);
    }
 
    public function testGreeter()
    {
        $mockOut = $this->getMock('Output');
        $this->expectStringWrittenTo($mockOut, "Hello User!");
 
        $greeter = new Greeter($mockOut);
 
        $greeter->hello("User");
    }
...

Mocking frameworks on other languages usually strongly resemble the one I just showed you. There is however one other interesting group for which the EasyMock framework from Java was the raw model. These frameworks have a record and playback attitude:

...
    public function testGreeterPhpShmock()
    {
        $shmockOut = \Shmock\Shmock::create($this,'ConsoleOut',
            function($shmock) {
               $shmock->writeLine("Hello User!");
            });
 
        $greeter = new Greeter($shmockOut);
        $greeter->hello("User");
    }
...

In the example code above we create a mock object for ConsoleOut. (Smock does not support mocking interfaces, so I had to use an implementation class instead.) The lambda function receives an object that has the same set of function names, but function calls set up expectations. Functions on the mock object are expected to be called the exact same number of times, in the exact same order with the exact same parameters as they are called in the lambda function on the helper object.

Mocks! Mocks everywhere!

So that’s great, right? Now we can write tests that are truly unit level: they test one single class without any other class disturbing the test. If we break a class we only break the tests for that single class, and we will know exactly what to fix!

There are people who think that the advice above is good. I’m not one of them. I think it’s the worst idea ever to do that. Especially the part where we assert on the exact number of calls. In fact not only do I believe that mocks should be used judiciously, I also believe that in general one should be pragmatic about the size of the “unit” in unit tests, and instead of testing one class at a time, one should test small subsets of interrelated classes when the ecosystem created by their interaction is more expressive of the intent than the behaviour of unique objects.

I will go into more depth on why I think that, but before that let me make it clear that there are serious and smart software professionals who do buy in to the “mockist” version of test driven development. Martin Fowler in his wonderful blog entry “Mocks Aren’t Stubs” categorizes different attitudes towards test driven development into “mockist” and “classicist”. I will keep on using the “mockist” expression the way Fowler uses it, but I’m arrogant enough to relabel “classicist” as “pragmatic”. There are also folks who – in complete contrast to “mockists” – avoid test doubles altogether in order to make sure that everything is tested in it’s original place. I will call them “naturalists”.

My personal experiences with test driven development indicated that the “pragmatic” attitude is the best way to go. Others have different experiences, and I encourage everybody to listen to their thoughts and arguments just as closely, and then you should be able to decide what works best for you.

I already made my point against “naturalism” earlier. To show you how mocks can go wrong let’s revisit the last record and playback style mock example again, after some refactoring:

class Greeter
{
   ...
    public function hello($user)
    {
	$this->output->writeLine("Hello $user!");
    }
}
 
class TestGreeterShmock extends PHPUnit_Framework_TestCase
{
    ...
    public function testGreeterPhpShmock()
    {
        $user = "User";
 
        $shmockOut = \Shmock\Shmock::create($this,'ConsoleOut',
            function($output) use ($user){
               $output->writeLine("Hello $user!");
            });
 
        $greeter = new Greeter($shmockOut);
        $greeter->hello($user);
    }
}

If you look closely you can see that the lambda function that sets up the expectations is practically exactly the same as the function under test. Actually I can replace that one liner with this:

...
        $shmockOut = \Shmock\Shmock::create($this,'ConsoleOut',
            function($output) use ($user){
                 $greeter = new Greeter($output);
        	 $greeter->hello($user);
            });
...

It looks a bit off, since by removing the duplication we increased the number of lines. But imagine if there were several function calls mocked on this object: than this change would actually simplify the code.

All we test with this, is that the hello function does the same thing to one object than it does with the other. This problem is clearly visible when we somewhat deliberately misuse the record and playback style library, but it’s there even when we don’t see it. Instead of expressing what we expect the function to accomplish we specify the exact implementation, effectively saying the same thing twice.

Duplication is always bad, and this one is no exception. First of all changing the interface of a class will trigger a change in all of the mocks. What’s even worse, is that the interface change can go unnoticed, since the classes using this one have tests that mock it away. Let’s say for example that at some point you decide to throw an exception instead of returning null. You fix the test of that function, but since all calls to this function from other classes are mocked non of those tests will fail. You will inevitably leave behind an unnecessary null check, and you may also end up with an unhandled exception. Ironically with this technique you can easily reach a 100% test coverage, and still have basically no protection even against trivial errors.

The point is that refactoring gets a lot less fun: you have to change in a lot more places, and your safety net will only catch trivial errors. Bugs that involve the interaction of only two classes will go unnoticed. Most errors involve at least two classes, and we don’t wish to have too many full integration tests either, since they can be slow. So allowing our unittests to test a small group of classes is desirable.

Since refactoring is harder test driven development no longer drives the design. Every time you find a problem with the code that does not involve a bug, you will walk away from the refactoring, since it’s hard, time consuming, and unsafe. If you do not flex your code, it becomes rigid, if it becomes rigid, it becomes risky to add behaviour, it will slow you down, and even with a slow pace changes will start to cause more and more bugs. In fact your code can get so rigid, that it becomes way easier to mock away your own code, than to fix those interfaces to make them easier to use. You may find yourself in a situation where the complex dependencies between your classes are completely hidden by the mockist tests, and you don’t actually realize the problem right until you have to reuse the class in a different context, where those dependencies can end up really hard to satisfy.

Before I also mentioned living documentation as an important advantage of test driven development. The problem with using too many mocks is that after a certain point your tests become littered by meaningless mock generating code that obscure the usage example that we hoped to find in there. In fact when you write tests that contain copious amounts of mocks you are not just mirroring the production code, but you are thinking in terms of the production code. So it all really burns down to this question: how do we honestly call our practice “test driven development” when instead of tests influencing the way we write our production code, the not yet written but already thought out production code drives the way we write the test. Even if the test is written prior to writing the production code, we actually cheated, since our mind has already came up with the implementation, it has all been written at a mental level.

Types of test doubles

Now it may seem that test doubles are pure evil, but not using them is just as bad as overusing them. To understand how we can use them to our advantage, while avoiding as many problems as we can, we have to categorize test doubles and the situations in which we use each.

Up until now I have talked about test doubles in general, and about mocks in specific, but I haven’t defined them, nor did I make it very clear which one I’m talking about, and what’s the most important difference between the two. Some of the problems that I described above are very specific to mocks which are only one kind of test doubles, and can be avoided by using more relaxed alternatives. Other problems can be mitigated by carefully choosing the type of test double that best fits the situation at hand.

The definitions I use below are mostly the same as the ones Gerald Meszaros gives in his book “xUnit Test Patterns”.

I use the expression test double to refer to any kind of object that replaces another production object in a test. This is a very broad category, and all other expressions below will define a smaller sub category of test doubles.

Test doubles may come in one of two forms: generated or native. I use the expression generated when the test object is created by some framework either through code generation or through meta programming. In other cases – when the test object is created by explicitly defining a new class in the code – I use the word native.

Probably the simplest kind of test double is a dummy object. The hallmark of a dummy is that it does not interact with anything during the execution of the test. In most languages passing a NULL constant is sufficient in these cases.

Fakes are usually native test doubles that are reused by several tests. They are dumb implementations of the interface, but smart enough to make the test pass. I sometimes use the expression complete fake when a fake is smart enough to be used in almost every unit test. Complete fakes are usually created to replace slow interface operations, like databases and hardware controllers. Fakes are the most prevalent forms of native test doubles.

Test specific subclasses are also native test doubles, but rather than reimplement the entire interface they derive from the real implementation, and override specific functions, or extend the object with data accessors and injectors.

Stubs are objects that can only respond to each function call with one specific value. However it is easy to write such stub objects natively most people prefer to generate them with a framework. Stubs are probably one of the simplest, yet most important kind of test doubles.

A mock may also return one specific value for each function call, but it also checks the input arguments, and may even check the order in which functions are called on an object. Mocks are a very strict form of replacement objects and can be dangerous. On the other hand the existing mocking frameworks make it really easy to generate mocks the way we have seen before.

A similar concept is a spy. The difference is that a spy only records the interactions, and exposes the result, but it does not assert on it.

A delegating mock or delegating spy is similar to a mock or spy respectively in that it asserts on/records the number of calls and arguments passed to the functions, but it also delegates all calls to the actual production object. Delegating mocks/spies are rarely used in practice.

Finally capturing mocks are specifically designed for testing legacy systems. These objects have a record and a playback mode. When our test is ready, we run the capturing mock in record mode, which will delegate calls to the production object and store all interface interactions into a file. After that we can set playback mode on the capturing mock, and after that instead of delegating the calls it will just assert that the same arguments are passed as before, and will return the same results as the original class did.

At this point, I’d like to call your attention to the visually subtle yet conceptually huge difference between a stub and a mock. Most mocking frameworks will allow you to generate stubs with almost the same syntax as you create mocks. For example this is a mock:

$this->expects($this->once())->method('someMethodName')->with(arg1, arg2)->will($this->returnValue(5));

While this is a stub:

$this->expects($this->any())->method('someMethodName')->will($this->returnValue(5));

Notice that in the second version I expect any number of calls (that means we do not really assert on anything) and completely omitted the “with” arguments part which would have asserted on the argument values. If we add the with part, or replace any with once, then we have a mock, otherwise we have a stub. While this is a tiny difference in code, the more relaxed nature of stubs can prove to be quite an advantage. Conceptually stubs are just helper objects, that replace something, while mocks are more about checking the exact way of interaction between the object under test and the mocked interface.

Guidelines for choosing the right test double

As with most things in software development there are no hard rules for choosing the right kind of test double. It’s you who has to make those decisions. I do have some guidelines that I will discuss here, but they are pretty general, and won’t give you much detail. After them I will give you a number of examples where test doubles are necessary, and which ones would I first think of using, and why. This second part should help you in making your own wise decisions in the future, maybe even ones that go against my guidelines.

The first rule of thumb is to choose the production object over a test double unless there is a pretty good reason not to do so. Actually there are several general principles that imply this one. Replacing an object too early on for performance reasons for example is an early optimization. When your build starts to get slow, and you find out that a specific object is slowing it down you still have two options: replacing it with a test double or making that object faster. The second choice is a lot more appealing since it may fix some of your production issues too. In this sense your build getting slow can act as a canary in a coal mine: it will notify you early about performance issues. Another benefit is that since our unit tests become tiny integration tests of small subsystems we can catch some of the integration problems a lot earlier. So in short: try to exercise as much of your code base in a test as you can without running into problems that would indicate the necessity of a test double.

Canaries are very sensitive to toxic gasses, and they were by used by coal miners until the 20th century to warn them.

Canaries are very sensitive to toxic gasses, and they were by used by coal miners until the 20th century to warn them.


Since it’s a lot easier to replace a direct dependency, you may be tempted to do so even if some other object makes you consider a test double. If you do this, you loose information about the interaction between your class and the class it directly depends on. Try to find the object that causes the trouble, and try to replace that one. Sometimes you may find that this leads to really awkward tests, especially if you are using a mock. Many times this may imply that you have a way too long chain of dependencies (usually also involving a deep call stack) and you should think of ways to break the chain at some point. However this is not always the case, and it might be perfectly fine to replace an object between the one you test and the one that causes the trouble, if you feel that the test is more expressive that way. The important thing to remember here is to make a choice based on what constitutes a more informative and useful test, and not based on ease of replacement.

The second guideline is to choose the most relaxed kind of test double that could work. A more general version of this is that you should never assert more in a test that is enough to ensure correct behaviour. When you assert on actual function calls you are solidifying the structure totally defeating the purpose of unit tests. There are situations when you have to use mocks, but this guideline tells you to choose a stub over a mock when possible.

Even when you decide to go with a mock you still have options to relax the assertion on the function call arguments: most mocking frameworks will provide you with matchers like smallerThan, greaterThan or matchRegularExpression that you can use in the “with arguments” section of your mocks setup. Another important aspect is the order of consecutive calls. Most of the time it’s best if you do not have more than one function call mocked in your test. This is not always possible, and in those cases many mocking frameworks will force you to assert on the order of calls, while others will let you specify calls, and let the implementation decide on the order for itself. This later option is very useful, and you may want to choose a mocking framework that supports this over one that does not.

Finally I prefer a more declarative style in my tests. Try to make sure that your assertions express the outcome of each action rather than the way it is achieved. If the outcome is that another component completes an action, then asserting on the fact that the right method representing that action was called is fine. The writeLine function in the introductory examples is such a method: it’s very name expresses that the action we wish to achieve is to take a string and display it in some way. However this is rather the exception than the rule. In most situations a list of assertions on function calls will not express the intent well.

The natural habitat of test doubles

I have already mentioned performance issues earlier as a possible reason for using a test double. This is mostly related to accessing physical interfaces outside of the CPU and memory. The most appealing solution in this case can be a complete fake. The only problem with that is that it might not be feasible to write a complete fake either because it would have to be too smart slowing it down too much, or because it’s too much effort to write it compared to the number of times we will use it.

A common argument against complete fakes is that they might have a different set of bugs, and quirks as the original. Although this is true, writing many specialized stubs is even more error prone than writing one somewhat generic fake. The bottom line is that you need to be pragmatic about this: amortize the cost and risk of creating a partial or complete fake over the tests using them. One other thing to keep in mind though, is that a fake should simulate the actual behavior of the replaced object. Mindlessly adding if statements to satisfy the needs of the next test case, instead of creating a separate stub for each is a well paved road to maintenance hell.

When creating a complete fake is not practical then a partial fake is still a viable option in some cases. If neither option seems cost effective enough then stubs and mocks are a way to take care of these problems on a test by test basis. If you have to replace too many instances of the same class in different existing tests, or you are writing characterization test for legacy code, then capturing mocks can be a good choice too. However be careful, since capturing mocks (just like characterization tests) are fragile, tend to fail after bug fixes, and make it hard to evolve the interfaces they mock.

A somewhat more general case is when the object has an undesirable side effect like changing global state, issuing monetary transactions online or launching weapons. In these situations – just like with the special case of performance issues – complete or partial fakes, stubs and mocks are a right choice. As always, try to choose the more relaxed fakes or stubs over mocks when possible.

Another common reason for using a test double is when the real object is fragile in some sense, either because it relies on something unstable or because it changes too much. The most frequent example for this kind of interfaces are that of user interface components. In particular you should always try to avoid asserting on substrings of HTML code. Fakes, stubs and mocks can all be used depending on the situation at hand.

Somewhat related to fragility is the case of unpredictable behaviour, like that of a random generator, or a date/time object that returns the current time. When we face this problem most of the time we are only interested in ensuring that the return value stays the same across all test runs. Stubs are usually the best choice. Some people use mocks for this, but in my experience asserting on the actual call is unnecessary, and can be harmful.

For certain objects it is hard or impossible to sense the result of a function call. Typical examples are displaying and printing things or playing/generating sounds. When there is no undesired side effect, then a delegating mock can be used. Most of the time however, the thing we can not assert on easily is also something we want to avoid calling. In that case a stub or a mock is a better choice.

Framework writers often face the problem of depending on interfaces that have no implementation yet. In this situation a complete fake is most probably the best choice. Not only will it serve as a universal test double for a large number of tests, but it’s also an example implementation of the interface. Users of your framework will appreciate that example.

I usually don’t advocate overly defensive coding. Still sometimes the current code base makes it impossible for an error to happen, yet we see a chance for someone making an unintended mistake that would cause that error, and we wish to send a meaningful message when that happens. In that case substituting the object that could cause the error by a stub simulating the error can be a useful technique.

Dummy(Complete) FakeStubDelegating MockMockCapturing mock
Unused dependency x
Slow operations (usually IO) x x x x
Method call has undesired effect x x x
Fragile, frequently changing behaviour x x x
Function has unpredictable result x
Effect of function call is hard to sense x x
No implementation available yet x
Generate an impossible error x

The table above sums it all up, when we add that you should try to choose the ones to the left over the ones to the right when possible. Also complete fakes are preferred over fakes.

Test doubles and design smells

Many times a proliferation of certain kinds of test doubles can indicate design smells in the production code. One of the more obvious ones is that the appearance of dummy objects (or pass nulls) in several tests is a clear sign of low cohesion within the class that is being tested. It is usually not hard and well worth getting rid of these unnecessary dependencies by splitting up the class into smaller more cohesive ones.

Similarly, a test specific subclass can imply that the class we are subclassing is doing too much. Test specific sub classes expose the internals of an object to a test, and promote asserting on implementation details instead of the results of an action. Before creating such an object try to think of alternative ways to assert on the effect of the function you are testing. If you do not find a simple way to do that without exposing object internals, then probably those values, or functions do not belong into that class, and you have a single responsibility violation, i.e. the class has at least two different purposes. Again: splitting up the class might be a good idea at this point.

A mock is called partial when only some of the objects functions are mocked, and others are exercised by the test. In extreme cases you may have a function that calls all other methods of the object in sequence, and you decide to mock every other function to test this one. This is obviously the same situation as the case of a test specific subclass. While test specific subclasses do have valid usages, partial mocks should almost always be avoided.

One of the most annoying ways test doubles can plague a test is when you need to create a stub returning a stub, that returns a mock object. This kind of onion structure of test double objects is a sign of a violation of Demeter’s Law.

The single assert rule states that one should have only one assert per unit test. This is misinterpreted by many to mean that one should only call one function that has assert in it’s name. It’s a pretty easy rule to circumvent in JavaScript if you use the should.js assertion library, isn’t it?

For one thing it’s important to note, that what we mean by that is that we should assert on one outcome, but that does not mean that we should not write more than one assert, if there are several important aspects. Non the less having more than one mock object in a single test is almost always a smell. If you are tempted to add another mock then ask yourself if that test is really still testing one thing. Note that in this context it’s important that I’m talking about a mock, and not about any other non-asserting test double. Sometimes it’s perfectly fine to have a stub and a mock in the same test for example, as long as it’s more the exception than the rule.

Crash test dummy

I have already described why I think that complete fakes are superior to generated stubs and mocks. I also pointed out that although they are more expensive to create the first time around, when amortized over a large number of tests they become a lot cheaper than mocks. The problem is that this implies that creating a complete fake involves an upfront investment, and any time during development it is cheaper to add yet another generated test double, than to create a reusable fake. Although this problem can sometimes be mitigated to some extent by growing the fake in an evolutionary manner, usually that is not enough to crack the ice. For that reason you should consciously make an investment into a reusable fake when you find yourself generating simple stubs and mocks for the same class all the time.

On the other hand if you do find yourself considering this investment regularly, but you repeatedly decide not to go with it, then most probably there is a serious issue with the interface you are trying to replace by these test doubles. It’s too revealing, too broad, or too verbose. There is some inherent design flaw lurking there. If that interface is within your project, then probably you should think about how you can improve it. Try to make it smaller, and make sure that you do not leak responsibilities that should be part of your module to the user of your module.

If an interface like this falls outside of your project, then most probably you will keep writing generated mocks. One of the most obvious examples of such an interface is SQL. I will write about this in more detail in an upcoming blog entry, but for now let’s just accept that SQL gets too intimate with the code that uses it by misplacing responsibilities, and realize that ORMs are nothing else, but wrappers that try to fix this. This is actually a quite useful pattern: when you are faced with an inconvenient interface, then try to wrap it into something better, test the wrapper, and implement a complete fake of the wrapper for your tests.

Conclusion

When I started writing this blog post version of the talk I never thought it would take me several days to finish it, or that it would be a whole book chapter worth in length. However, that shows well that this is an important and complicated topic. Many times when I saw people argue against test driven development it turned out that their disappointment was easily tracked back to either not using test doubles or overusing mocks. Learning a mocking framework and mindlessly adding a mock into each and every one of your tests is irresponsible and it will hurt you. Choose your weapons wisely, and understand what are the trade-offs of that choice.

This is how I felt when I published this post

An this is how I felt when I published this post :)

Switch to our mobile site