Software developer blog

How to generate a random binary tree and visualize it with Graphviz

Yesterday I was testing a small piece of code, that was capable of finding the nth item in the in-order traversal of a binary tree. It was a template function with the root node and n as arguments, and only required that the children nodes should be pointed by public members "left" and "right", while unused children pointers are NULL. (Pretty much the old school way of doing it.)

Of course I didn't want to manually generate test cases. Instead I wrote a little function that can generate a large tree on demand, and used that. Since the boost random library uses a constant seed by default I can rely on the tree to be the same each time the test is ran.

@ //

Snake cube

Few weeks ago one of my colleagues showed me this puzzle:

Snake cube

Snake cube puzzle

It's basically composed of 27 cubes, which are linked together by their faces. Each pair can be rotated around their common face. The aim is to fold the snake into a cube like this:

Solved snake cube

The snake cube once it is solved

Needless to say, my first thought was to write a backtracker that solves the cube for me. After all it's more fun to write the program, than to play around with it for hours. It didn't take long. After I was done with that, I started to ask myself questions like:

  • Are there other combinations with a single solution?
  • Are there combinations with more than one solution?
  • Are there palindromic snakes, that are solvable?
  • How many different solvable snakes are there?

I quickly found the answers to those on Jaap's Puzzle Page. I won't repeat the results that are available there, go and check it out if you are interested. On the other hand, it did not contain the source code for the backtracker, so I decided to show you how something like that could be done.

@ //

I have to sort that out

By the time a computer scientist gets through university (s)he hears about sorting algorithms like a dozen times. Probably (s)he will also learn that a good sorting algorithm - unlike bubble sort - should run in O(n log n) time. And yet once in a while one still runs into a hard coded bubble sort in production code. When that happens to me, I just silently and modestly raise my eyebrows,  while the physiological equivalent of a painful and violent shriek crosses my mind. And I'm not the only one who feels like that! It gets even more annoying when the language itself has an efficient sort implementation.

Anyway... I had some free time on my hands, and I thought it would be nice to see: exactly how bad is bad. I implemented 9 different popular and less popular algorithms for sorting, and tested them on the same dataset. I also added the STL sort implementation as a reference. I used g++ as a compiler (with -O3) on an Ubuntu 11.10 box. The cpu is an AMD Athlon 64 X2 Dual Core Processor 4200+@ 1Ghz. Data reading time is not included, running times were measured using the boost::timer library. You can download the sort implementations here: sort.h The functions are similar in design to the STL sort, but < is hard coded into them. I didn't care much about optimizing the code by hand, so don't expect miracles.

To see what these asymptotic behaviors look like, I repeated the experiments on increasingly larger data sets. Each algorithm had 1 second to finish the task, and 1,000,000 random integers was the largest dataset I feed them with. Here is a nice plot as a summary:

@ //

Test driven development and unit testing frameworks

In the last few years I've been using test driven development, and grew really accustomed to it. TDD not only helps in avoiding strange errors, it also helps in software design, implementation and enforces a pro-active approach towards debugging. At first it may seem that writing tests for each and every small part of the code is time well wasted, but in fact it is definitely worth the effort, especially if an automated unit test framework is used. As not everyone may be convinced at first, here is a quick summary about how TDD works, how it improves productivity, and what are it's pros and cons.

The work-flow starts by writing the test using an automated unit test framework, and see it fail. There are several reasons for first writing the test, and implementing the feature afterwards, but I'll get to that later. After the test fails as it should, implement the new feature, and test again. If the test still fails, fix the code, and test again. If your new feature works, but tests of earlier features fail, than see how the broken code relies on the functionality you have changed, and make sure that you fix all bugs. Once all test pass, commit your code to the version control system, and continue by writing the next test, for your next feature. It is important, that you always write only as much code, as necessary to pass the test.

@ //

JavaScript based slide show

Although I think ActionScript is a well designed language one has to admit it's numerous drawbacks: it's a proprietary format, the interpreter is a poorly written black hole of resources and it is banned from Apple devices from time to time. Few years back one of my colleagues went as far as claiming that he only needs a dual core processor, so that one core is free for working while the other one handles YouTube videos. So here is an advice from me: try to avoid using flash as long as possible. JavaScript has it's own set of drawbacks, but at least it is widely available, and with the appearance of Google's Chrome the race between browsers shifted towards improving their JS platforms significantly in terms of speed, compatibility and reliability.

There are more than a handful of websites that include a slide show of images. However most of them uses a flash based solution. As fate would have it, slide shows are a perfect example of unnecessary usage of flash. In what follows I'd like to show you how easy it is to write one in JavaScript. I will use the Prototype and script.aculo.us frameworks for this example, but it should not take you more than a few minutes to make this work with jQuery. (I may even publish a jQuery version soon.) This is how it will look like:

The images are taken randomly from the website
of the Gallery Diabolus artist community.

@ //

We are crawling at last...

The last few weeks were really busy, and exciting. Until now I could not reveal any of the details of what we were up to, but a few weeks ago the project went public, and on-line: please feel free to visit the Erdős WebGraph project.

About two month ago we were asked to create a graph of domains where the directed edges represent hyper links. Our first attempt was to install several of the publicly available crawlers, but those all had indexing features that we did not need, and neither seemed lightweight enough for our extremely limited hardware capacities. Another inconvenience arouse from the almost complete lack of documentation for these software. Finally we decided to implement our own crawler.

@ //

Drop in, Drop out

Well... I couldn't come up with any better name for this:

My original aim was to create a website element, that provides an intuitive way to add, and remove certain things from the users inventory. I didn't really find any good use for it yet, so I decided to make it open source. (Well... you don't have many other choices with JS anyway, since obfuscators are basically useless in protecting code...) It's released under a GPL license, but if you use it be nice and drop me a comment with the link, so that I can check it out.

@ //

Postponable iterator

When I started this blog I had no idea how hard it will be to post useful things without revealing too much about the "top secret" projects we are involved in. 🙂 This time though I've been allowed to release an entire class that turned out to be useful in an application we were developing a while back. The piece of software below was created with a financial support by Uratim Ltd. and can be used under the terms of the GPL license. YAY for free software.

Postponable Iterator source

The iterator pattern is one of the most frequently occurring patterns used in software design. Even before one learns about design patterns inevitably implements an iterator without actually realizing. The problem I came across not so long ago was that I had an iterator that was returning  jobs from job packets that would have to be processed but there were a number of constrains I had to keep in mind:

  • For each job there is a resource assigned for the job. It can only be executed using the resource assigned to it.
  • As long as possible certain resources should not be accessed too frequently. (But they can be accessed more frequently when it is necessary.)
  • Packages are processed semi asynchronously: all jobs in one package are returned at once, but there may be more that one pending package. The maximal number of packages under simultaneous processing is a parameter.
  • When there are too many jobs waiting for a single resource than access times should be gradually decreased to avoid too long waiting times for jobs.
@ //
« Previous Page