evolutionary tree

tqDist - a triplet and quartet distance library

Distance measures between trees are useful for comparing trees in a systematic manner and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, are defined as the number of subsets of three or four leaves, respectively, where the topologies of the induced sub-trees differ. These distances can trivially be computed by explicitly enumerating all sets of three or four leaves and testing if the topologies are different, but this leads to time complexities at least of the order n3 or n4 just for enumerating the sets. The different topologies can be counted implicitly, however, and using this tqDist computes the triplet distance between rooted trees in O(n log n) time and the quartet distance between unrooted trees in O(dn log n) time, where d degree of the tree with the smallest degree.

  1. Source files
  2. Building and installing
  3. Executables
  4. File format
  5. Literature
  6. Contact

If you use tqDist in your research, please cite it in the following way:

You can e.g. use this BibTex entry.

The datasets used for the performance experiments in this Applications Note are available for download at http://birc.au.dk/~cstorm/software/tqDist/data/.

Source files

Building and installing

To install tqDist from source (on Linux or MacOS) download and unpack the source code, and execute the following commands in a terminal:

$ cd <path-to-file>
$ tar -xvf tqDist-1.0.2.tar.gz
$ cd tqDist-1.0.2/
$ cmake .
$ make
$ make test
$ make install

If you do not have permisions to install the package in the default location, you need to replace the fourth command above by something on this form:

$ cmake -DCMAKE_INSTALL_PREFIX=<install prefix

Finally, if CMake is not installed on your system, you can download the newest version from http://cmake.org/.

Executables

The following executables are installed with tqDist:

triplet_dist

Usage: triplet_dist [-v] <filename1> <filename2>

Where <filename1> and <filename2> point to two files each containing one tree in Newick format. In both trees all leaves should be labeled and the two trees should have the same set of leaf labels. The triplet distance between the two trees will be printed to stdout.

If the -v option is used, the following numbers will be reported (in this order):

quartet_dist

Usage: quartet_dist [-v] <filename1> <filename2>

Where: <filename1> and <filename2> point to two files each containing one tree in Newick format. In both trees all leaves should be labeled and the two trees should have the same set of leaf labels. The quartet distance between the two trees will be printed to stdout.

If the -v option is used, the following numbers will be reported (in this order):

pairs_triplet_dist

Usage: bin/pairs_triplet_dist [-v] <filename1> <filename2> [<output filename>]

Where: <filename1> and <filename2> point to two files each containing the same number of trees in Newick format. The two trees on line i in the two files must have the same set of leaf labels. The output is a list of numbers, where the i'th number is the triplet distance between the two trees on line i in the two files. If [output filename] is specified the output is written to the file pointed to (if the file already exists the current content is deleted first), otherwise the output is written to stdout.

If the -v option is used, the following numbers will be reported for each pair of trees (in this order):

pairs_quartet_dist

Usage: bin/pairs_quartet_dist [-v] <filename1> <filename2> [<output filename>]

Where: <filename1> and <filename2> point to two files each containing the same number of trees in Newick format. The two trees on line i in the two files must have the same set of leaf labels. The output is a list of numbers, where the i'th number is the quartet distance between the two trees on line i in the two files. If [output filename] is specified the output is written to the file pointed to (if the file already exists the current content is deleted first), otherwise the output is written to stdout.

If the -v option is used, the following numbers will be reported for each pair of trees (in this order):

all_pairs_triplet_dist

Usage: all_pairs_triplet_dist <input filename> [output filename]

Where: <input filename> is the name of a file containing multiple trees in Newick format. Each tree should be on a seperate line. In each tree all leaves should be labeled and all trees should have the same set of leaf labels. If [output filename] is specified the output is written to the file pointed to (if the file already exists the current content is deleted first), otherwise the output is written to stdout. The output is a lower triangular matrix in which the i, j'th entry is the pairwise triplet distance between the tree on line i and the tree on line j in <input filename>.

all_pairt_quartet_dist

Usage: all_pairs_quartet_dist <input filename> [output filename]

Where: <input filename> is the name of a file containing multiple trees in Newick format. Each tree should be on a seperate line. In each tree all leaves should be labeled and all trees should have the same set of leaf labels. If [output filename] is specified the output is written to the file pointed to (if the file already exists the current content is deleted first), otherwise the output is written to stdout. The output is a lower triangular matrix in which the i, j'th entry is the pairwise quartet distance between the tree on line i and the tree on line j in <input filename>.

File format

All input files for tqDist should contain trees encoded in Newick format. Here is couple of examples that can be used with the quartet_distance and triplet_distance executables:

And here is an example of a file containing multiple trees, which can be used with the all_pairs_quartet_distance, all_pairs_triplet_distance, pairs_triplet_distance and pairs_triplet_distance executables:

For details on how to encode trees in Newick format please see http://en.wikipedia.org/wiki/Newick_format.

Literature

The algorithms implemented in tqDist was developed by Andreas Sand, Gerth Stølting Brodal, Rolf Fagerberg, Thomas Mailund, Christian N. S. Pedersen, Jens Johansen and Morten K. Holt, and they are explained in the following two papers:

The implementation is desribed in:

Contact

If you encounter any problems or have questions about using tqDist, please contact Christian N. S. Pedersen.