Software developer blog

How to find maximal cliques

Facebook used to have this awesome list of programming puzzles that were freely available. One of them was peak-traffic, which can be rephrased as the maximal cliques problem pretty easily. Unfortunately a graph can have exponential number of maximal cliques, so it's not surprising that we can not find a polynomial algorithm for listing all possible maximal cliques in a graph. However there are output sensitive algorithms: as long as the number of maximal cliques is relatively low each can be reported in a short time.

A complete sub-graph of a graph is called a clique, much like we call a group of friends a clique if they all know each other.

I knew that I was looking for such an algorithm, but it was kind of hard to find a detailed enough description on the Internet.  At the time of writing Wikipedia does provide a short description of the algorithm devised by Tsukiyama et al. but there are a number of details omitted, making it kind of hard to come up with the right implementation.

My aim is to provide a simple to understand description of the algorithm (without proof that it works correctly) and share my implementation in C++. (Type `make smoke` to compile and test.) The source I've been using - apart from Wikipedia - was a 2004 paper written by Kazuhisa Makino and Takeaki Uno: New Algorithms for Enumerating All Maximal Cliques.

Unit test framework for C++ in about a 100 lines

Those who know me well are also aware of the fact, that I'm a very strong proponent of test automation and test driven development. Unfortunately there are times in our lives when installing an existing framework is not an option. So when I've been challenged to write a test framework that is so simple one can memorize it, and flush it out of their brain anytime needed, I was up to the task. The entire framework is 113 lines of C++ code, and using it is just as simple as using google test. Of course it doesn't come with all the features, it's fragile and has a good number of limitations, but it's good enough if you have absolutely no other option.

Here is how to use it, along with a sample output:

#include "TestBase.hpp"
 
class TestGroup : public TestBase {
};
 
TEST_F(TestGroup,PassingTest)
{
	EXPECT_TRUE(true);
	EXPECT_TRUE(true);
}
 
TEST_F(TestGroup,FailingTest)
{
	EXPECT_TRUE(true);
	EXPECT_TRUE(false);
	EXPECT_TRUE(true);
}
 
TEST_F(TestGroup,SecondPassingTest)
{
	EXPECT_TRUE(true);
	EXPECT_TRUE(true);
}
Sample output

Sample output of a simple unit test framework

Game of Life on the GPU

Conway's Game of Life is one of the most popular code katas, and the one we will exercise on our next code retreat this Saturday. I have done this kata, quite a number of times, but this time there was a twist: I used the GPU on my Nvidia GTX 560 to calculate the steps of evolution, and to draw them as well. Here is how the result looks like:

Puffer train in Game of Life

Puffer train in Game of Life. Living cells are green, they increase the blue chanel by 1 in each generation, and leave a faiding red trail after they die.

The igniter of this exercise was that I read a book on CUDA programing, and it was quite a disappointment. The examples were really far from anything I would call clean code, or at least easy to understand: it was spaghetti with huge meatballs in it. That kind of examples require a lot of explanation, which basically took up most of the books volume, and there was little left for the useful stuff. So I got curious: is it CUDA that promotes bad code, or the writers of the book didn't care enough? I had a preconception that there was nothing wrong with CUDA  - and actually it's a pretty neat language - so I set out on a quest to code Game of Life in CUDA, and try to make it as clean as possible.  I don't claim, that my code is perfect, but I hope it's going to be an easy read for anyone who tries.

You can download the result of my quest by clicking here, or just browse it as HTML by clicking here. The rest of the blog entry - basically a retrospective of the project, and a few remarks on CUDA features - might not make much sense without looking at parts of the code. If you'd like to compile, than you will need the nvcc compiler for CUDA, the CUDA Toolkit, GLUT and GoogleTest to run my unit tests. Please note, that the code is not production quality: it may well throw exceptions at you, it's only tested on Ubuntu, and it is not performance optimized in any way. (Even the algorithm is the most naive one.)

@ //

How to generate a random binary tree and visualize it with Graphviz

Yesterday I was testing a small piece of code, that was capable of finding the nth item in the in-order traversal of a binary tree. It was a template function with the root node and n as arguments, and only required that the children nodes should be pointed by public members "left" and "right", while unused children pointers are NULL. (Pretty much the old school way of doing it.)

Of course I didn't want to manually generate test cases. Instead I wrote a little function that can generate a large tree on demand, and used that. Since the boost random library uses a constant seed by default I can rely on the tree to be the same each time the test is ran.

@ //

Snake cube

Few weeks ago one of my colleagues showed me this puzzle:

Snake cube

Snake cube puzzle

It's basically composed of 27 cubes, which are linked together by their faces. Each pair can be rotated around their common face. The aim is to fold the snake into a cube like this:

Solved snake cube

The snake cube once it is solved

Needless to say, my first thought was to write a backtracker that solves the cube for me. After all it's more fun to write the program, than to play around with it for hours. It didn't take long. After I was done with that, I started to ask myself questions like:

  • Are there other combinations with a single solution?
  • Are there combinations with more than one solution?
  • Are there palindromic snakes, that are solvable?
  • How many different solvable snakes are there?

I quickly found the answers to those on Jaap's Puzzle Page. I won't repeat the results that are available there, go and check it out if you are interested. On the other hand, it did not contain the source code for the backtracker, so I decided to show you how something like that could be done.

@ //

I have to sort that out

By the time a computer scientist gets through university (s)he hears about sorting algorithms like a dozen times. Probably (s)he will also learn that a good sorting algorithm - unlike bubble sort - should run in O(n log n) time. And yet once in a while one still runs into a hard coded bubble sort in production code. When that happens to me, I just silently and modestly raise my eyebrows,  while the physiological equivalent of a painful and violent shriek crosses my mind. And I'm not the only one who feels like that! It gets even more annoying when the language itself has an efficient sort implementation.

Anyway... I had some free time on my hands, and I thought it would be nice to see: exactly how bad is bad. I implemented 9 different popular and less popular algorithms for sorting, and tested them on the same dataset. I also added the STL sort implementation as a reference. I used g++ as a compiler (with -O3) on an Ubuntu 11.10 box. The cpu is an AMD Athlon 64 X2 Dual Core Processor 4200+@ 1Ghz. Data reading time is not included, running times were measured using the boost::timer library. You can download the sort implementations here: sort.h The functions are similar in design to the STL sort, but < is hard coded into them. I didn't care much about optimizing the code by hand, so don't expect miracles.

To see what these asymptotic behaviors look like, I repeated the experiments on increasingly larger data sets. Each algorithm had 1 second to finish the task, and 1,000,000 random integers was the largest dataset I feed them with. Here is a nice plot as a summary:

@ //

Test driven development and unit testing frameworks

In the last few years I've been using test driven development, and grew really accustomed to it. TDD not only helps in avoiding strange errors, it also helps in software design, implementation and enforces a pro-active approach towards debugging. At first it may seem that writing tests for each and every small part of the code is time well wasted, but in fact it is definitely worth the effort, especially if an automated unit test framework is used. As not everyone may be convinced at first, here is a quick summary about how TDD works, how it improves productivity, and what are it's pros and cons.

The work-flow starts by writing the test using an automated unit test framework, and see it fail. There are several reasons for first writing the test, and implementing the feature afterwards, but I'll get to that later. After the test fails as it should, implement the new feature, and test again. If the test still fails, fix the code, and test again. If your new feature works, but tests of earlier features fail, than see how the broken code relies on the functionality you have changed, and make sure that you fix all bugs. Once all test pass, commit your code to the version control system, and continue by writing the next test, for your next feature. It is important, that you always write only as much code, as necessary to pass the test.

@ //