Tuesday, January 4, 2011

Drawing stemmas in php

Tree view is part of the MVD-GUI. It shows a phylogenetic tree generated from the MVD variant data. The problem with this is that it uses the Phylip package and so uses two commandline C-programs that have to be compiled for the architecture of the server. This makes building an installer for MVD-GUI very difficult. Either I have to ensure that the user has a C-compiler installed, and that it works on their platform with that code, or I have to add a lot of prebuilt binaries to the download, fattening it out nicely. So I gave up on that route.

Building a Newick Tree

An alternative is to draw the tree directly in php. No one seems to have done this before. The first stage is to draw a text version of the tree. This uses a format called a Newick Tree. It assumes a hierarchy like a true stemma even for an unrooted tree. Here's an example:

((A2a:0.025555,(A2b:0.006838,A2c:0.014863)I5:0.023295)I3:0.078086,(Ba:0.00787,(Bb:0.041552,(D:0.002623,(Ea:0.032508,Eb:0.063992)I15:0.011577)I13:0.025273)I11:0.006917)I9:0.05633,A1:0.027864)I7;

The Phylip package I was using before uses Fitch to make the text-version of the tree and Fitch is about the slowest algorithm out there for this kind of work. Neighbour Join is a much faster technique but occasionally it produces edges of negative length, and there is no nice way around that. Methods like maximum parsimony are supposed to produce the best trees but they are expensive to run and are character-based (not difference-based like Neighbour Join). They are also designed for genetic sequences, not plain text data. In the end I settled on a technique called FastME, which doesn't produce negative branch-lengths. I translated their program into Java and am about to incorporate it into nmerge.

Drawing a tree in php

Drawing can be done directly in php using GD. This uses primitive drawing functions to draw into a bitmap image. Since the branch-lengths hold much of the information about the stemma I elected not to use standard force-directed algorithms that obliterate the lengths. But I did adapt force-directed layout to improve the roughly drawn tree. Results are still preliminary but it is starting to look reasonable:

The idea is to incorporate this view into a revised version of MVD-GUI that I will soon release, that is:

  1. Fully debugged and works on several popular browsers
  2. Installable in Joomla as a single component

My ambition when that is finished and all the other views have been added (i.e. compare, view, edit, list and variants) is to port it onto the iPad.