Software developer blog

How to find maximal cliques

Facebook used to have this awesome list of programming puzzles that were freely available. One of them was peak-traffic, which can be rephrased as the maximal cliques problem pretty easily. Unfortunately a graph can have exponential number of maximal cliques, so it's not surprising that we can not find a polynomial algorithm for listing all possible maximal cliques in a graph. However there are output sensitive algorithms: as long as the number of maximal cliques is relatively low each can be reported in a short time.

A complete sub-graph of a graph is called a clique, much like we call a group of friends a clique if they all know each other.

I knew that I was looking for such an algorithm, but it was kind of hard to find a detailed enough description on the Internet.  At the time of writing Wikipedia does provide a short description of the algorithm devised by Tsukiyama et al. but there are a number of details omitted, making it kind of hard to come up with the right implementation.

My aim is to provide a simple to understand description of the algorithm (without proof that it works correctly) and share my implementation in C++. (Type `make smoke` to compile and test.) The source I've been using - apart from Wikipedia - was a 2004 paper written by Kazuhisa Makino and Takeaki Uno: New Algorithms for Enumerating All Maximal Cliques.

How to generate a random binary tree and visualize it with Graphviz

Yesterday I was testing a small piece of code, that was capable of finding the nth item in the in-order traversal of a binary tree. It was a template function with the root node and n as arguments, and only required that the children nodes should be pointed by public members "left" and "right", while unused children pointers are NULL. (Pretty much the old school way of doing it.)

Of course I didn't want to manually generate test cases. Instead I wrote a little function that can generate a large tree on demand, and used that. Since the boost random library uses a constant seed by default I can rely on the tree to be the same each time the test is ran.

@ //

Snake cube

Few weeks ago one of my colleagues showed me this puzzle:

Snake cube

Snake cube puzzle

It's basically composed of 27 cubes, which are linked together by their faces. Each pair can be rotated around their common face. The aim is to fold the snake into a cube like this:

Solved snake cube

The snake cube once it is solved

Needless to say, my first thought was to write a backtracker that solves the cube for me. After all it's more fun to write the program, than to play around with it for hours. It didn't take long. After I was done with that, I started to ask myself questions like:

  • Are there other combinations with a single solution?
  • Are there combinations with more than one solution?
  • Are there palindromic snakes, that are solvable?
  • How many different solvable snakes are there?

I quickly found the answers to those on Jaap's Puzzle Page. I won't repeat the results that are available there, go and check it out if you are interested. On the other hand, it did not contain the source code for the backtracker, so I decided to show you how something like that could be done.

@ //

I have to sort that out

By the time a computer scientist gets through university (s)he hears about sorting algorithms like a dozen times. Probably (s)he will also learn that a good sorting algorithm - unlike bubble sort - should run in O(n log n) time. And yet once in a while one still runs into a hard coded bubble sort in production code. When that happens to me, I just silently and modestly raise my eyebrows,  while the physiological equivalent of a painful and violent shriek crosses my mind. And I'm not the only one who feels like that! It gets even more annoying when the language itself has an efficient sort implementation.

Anyway... I had some free time on my hands, and I thought it would be nice to see: exactly how bad is bad. I implemented 9 different popular and less popular algorithms for sorting, and tested them on the same dataset. I also added the STL sort implementation as a reference. I used g++ as a compiler (with -O3) on an Ubuntu 11.10 box. The cpu is an AMD Athlon 64 X2 Dual Core Processor 4200+@ 1Ghz. Data reading time is not included, running times were measured using the boost::timer library. You can download the sort implementations here: sort.h The functions are similar in design to the STL sort, but < is hard coded into them. I didn't care much about optimizing the code by hand, so don't expect miracles.

To see what these asymptotic behaviors look like, I repeated the experiments on increasingly larger data sets. Each algorithm had 1 second to finish the task, and 1,000,000 random integers was the largest dataset I feed them with. Here is a nice plot as a summary:

@ //

Event based template method

The last few weeks were more than hectic for me. We are approaching the end of a bigger project, and starting a new one, so I didn't have much time for my blog. Anyhow... during the design phase of the newer project I came across a very interesting design problem. We are developing an algorithm that has a pretty rigid structure at a higher level with steps that are invariable, and other (more heuristic) steps that can vary independently. Since we want to be able to quickly replace the heuristic steps of the algorithm (maybe even at runtime), and have a clean design with as few code duplication as possible I had to come up with a way to decouple these steps from the higher level structure of the algorithm.

The first solution that came to my mind was using the Template Method design pattern as described in the excellent book on Design Patterns by Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides. While their version comes with many advantages, in this particular situation it has a quite obvious drawback: it uses inheritance, which would make it less convenient to adjust the algorithm, and vary the different heuristic steps independently, and downright impossible to do this at runtime.

I decided to use an event based approach. Instead of defining pure virtual functions for the hooks I defined callbacks similar to events. When the algorithm runs on a callback during execution, it calls the function(s) registered for it. There is however a significant difference between these callbacks and traditional events: in some cases you may want to make sure, that one and only one function is registered for a specific callback, and you may also want to use the return value of the function for something more complex. (Some implementations of event handling require the return value to be a boolean, and use it to prevent further listeners from getting executed when one of the listeners returns false.)

@ //

Gomoku perfect play

The other day I had a very boring job to do. I had to listen to a sound recording of a workshop I've been attending few weeks ago and sum up the ideas that appeared during brain storming. I couldn't come up with any other solution, but to listen to it while playing Gomoku. It's not a time critical game so I could switch back to writing LaTeX code whenever I needed, and since it's quite an easy game, I still had the brain computing power to multitask. (By the way I use GNU screen to manage my terminal sessions, and vim to code so it just seemed convenient to use the BSD Games implementation of Gomoku... pretty old school, huh?)

Anyhow... as most of you probably know Gomoku is a pretty unfair game, as the first player has a winning strategy... but how hard is it to memorize the winning strategy? Is it even possible? Well... I set out on a journey to find out about that. I spent about half an hour to research Google for any existing perfect play algorithms, but didn't have much luck. Next I tried to find out how computer plays Gomoku... The key concept seems to be Minimax search and Alpha-Beta Pruning, but that is a highly heuristic approach, and since I could beat the algorithm as second player, it does not result in a perfect play either. (Although I could guess that even without winning against the computer.)

Well... I'm stuck at this point. I think I'll read what L. Victor Allis has written on the topic in his thesis. But before doing so, I'll have to get back to coding for food. (-;

@ //