A Toolbox for Phylogenetic Supertree Construction and Analysis
D. Chen, O. Eulenstein, and D. Fernández-Baca
Department of Computer Science; Iowa State University; Ames, IA 50011, USA
Table of Contents
Introduction to Rainbow
Creating an MRP matrix
Bugs / Updates
Supertrees are phylogenetic trees built from sets of input trees that share some, but not necessarily all, of their terminal taxa. Supertree methods may be useful for reconstructing large phylogenies and combining phylogenetic hypotheses made from disparate types of data. Though there has been much recent interest in building supertrees and developing supertree methods, there are few available programs that allow researchers to utilize new supertree methods. The goal of Rainbow is to provide a user-friendly environment in which scientists can utilize tools for building and analyzing supertrees. Rainbow provides a graphic user interface (GUI) to construct supertrees using several different methods. Currently this includes matrix representation with flipping (MRF; Chen et al., 2001; Burleigh et al., 2004; Eulenstein et al., 2004), matrix representation with parsimony (MRP; e.g., Baum, 1992; Ragan, 1992), and the modified Mincut algorithm (MMC; Page, 2002). Rainbow also provides tools to analyze the quality of the supertree. Rainbow is available for MacOSX, MS-Windows, and Unix.
For further reading about supertrees and supertree methods, there areeview by Sanderson et al. (1998), Bininda-Emonds et al. (2002), and Bininda-Emonds (2004). Also, the recent book "Phylogenetic supertrees: combining information to reveal the tree of life" from Kluwer Academic Press contains chapters on many aspects of supertree research.
Downloading Rainbow: Macintosh, Windows, and Linux/Unix versions of Rainbow may be downloaded for free at the following website: http://genome.cs.iastate.edu/CBL/download/.
System Requirements: Rainbow is a GUI shell, and it uses other external command line applications to construct supertrees. The following programs are required to access all features ofRainbow:
NOTE: A command-line version of PAUP* is not publicly available for the Macintosh system at this time. Therefore, the Mac version of Rainbow cannot build MRP supertrees or compute some of the tree distance measures. To build MRP supertrees on a Mac, one can create and save a nexus file with an MRP matrix using Rainbow (see "Creating an MRP matrix" below), and then use the GUI version of PAUP* to do a parsimony search using this matrix, outside of the Rainbow environment.
Macintosh: The Mac version currently requires Mac OS X Panther V10.3. To download the Mac version of Rainbow, simply place the Rainbow application in the desired folder. NOTE: Just double-clicking on the "Windows OS X disk image" link may not install Rainbow. Try dragging the link into the desired folder on the Mac.
Windows: Rainbow is available in 32-bit (Win32) version only. An installation program will automatically install Rainbow; simply click on the link and follow the instructions. NOTE: Rainbow supports the "Uninstall" feature of Windows 95/98/2000/XP so that Rainbow can be removed from the computer using the "Add/Remove Programs" control panel application.
Linux/Unix: A binary executable is available for Linux x86. For other UNIX platforms, the user will need to download the source file and rebuild the target. Consult your system administrator if this does not mean anything to you.
Rainbow is a multiple document interface (MDI) application. There are two types of documents: tree documents, displayed as graphs, and report documents, displayed as text. Several files can be open at the same time. Each file corresponds to one window, and will be closed if its window is closed.
Below is a screen-shot of Rainbow (Mac OSX). The right frame is the tree navigation bar, showing the currently open files and their trees. At the bottom is a log window that displays the progress of a supertree heuristic. At the center is workspace that can have many windows. To view different trees, simply click the tree name on the tree navigation bar at the right of the screen.
The Open menu command opens tree files. A tree window will be created for the opened tree file, and the filename as well as the tree names will be displayed in the tree navigation bar. Trees within the opened files can then be used to build supertrees or create an MRP matrix. Rainbow reads most NEXUS format tree files. In the Windows version of Rainbow, the Open dialog box has a range of filters corresponding to commonly used extensions for tree files.
The Save As command enables the user to save tree(s) to new NEXUS tree files.
This command closes the current tree window and the opened tree file.
Prints the current tree.
NOTE: This menu will be disabled (gray) when the active window is not a tree window.
Draw Slant Tree / Draw Rectangular Tree
These commands allow the tree(s) to be displayed as either slanted or rectangular.
Tree Font... / Order Tree
The Tree Font command can change the font used to label the leaves (taxa) in the trees in the current active window. The Order Tree command changes the tree display so that "heavier" nodes (i.e., those with more descendants) are drawn either to the left or to the right. It can also restore the original order.
Zoom In / Zoom Out / Zoom to Fit
The Zoom In command allows the user to examine portions of a large tree in more detail. A vertical scrollbar is displayed after selecting this command. The Zoom Out command shrinks the tree window in half, and the Zoom To Fit command returns the tree to its original size.
Create MRP matrix ...
Create a weighted MRP matrix. For details, see the "Creating an MRP matrix" section below.
Construct Supertree ...
A dialog box will appear to guide the user in building a supertree step by step. For details, see the "Building Supertrees" section below.
Terminate current running supertree calculation ...
A dialog will pop up to ask you whether you like to terminate current running supertree calculation. This menu item is disabled unless a supertree program is running under background. The supertree run can also be terminated by clicking on the
Show/Hide the log window. This can also be accomplished by clicking the "show log" button on the right side of the status bar.
Creating an MRP matrix
This feature allows the user to convert one or more nexus tree files into a nexus file with a binary "matrix representation", or an "MRP matrix" (e.g., Ragan, 1992; Baum, 1992). In the MRP matrix, each character (or column in the matrix) represents a clade from an input tree. Taxa in the clade have a character state "1", taxa not in the clade have character state "0", and taxa not sampled for the tree have a "?". MRP matrices are used to build MRP and MRF supertrees.
To build an MRP matrix, first click the menu item "Create MRP Matrix ..." from the supertree menu. A dialog box will pop up (as shown below). Tree files can be added/removed from the current tree list using the "Add…" and "Remove" features. Also, the weight for each tree can be adjusted by selecting a tree and using the spin button at the bottom of tree drawing window. If the weight of a tree is w, the generated matrix will contain w copies of each character corresponding to that tree. The weight must be an integer between 1 and 99, and the default weight for each tree is 1. Clicking the OK button will generate the MRP matrix. The MRP matrix will contain an outgroup node whose sequence is composed of all 0s unless the user unchecks the "with outgroup node" box in the bottom left corner.
NOTE: Though building an MRP matrix is necessary to build MRP or MRF supertrees, it is not necessary to construct an MRP matrix before constructing a supertree with Rainbow. The "Construct Supertree…" feature of Rainbow automatically builds an MRP matrix for MRP or MRF analyses.
NOTE: All columns in the MRP matrix represent clades. Single taxa (or tips on trees) are not represented in the MRP matrix and are not used to calculate the flip scores for supertrees. There are no autapomorphies in any MRP matrices.
NOTE: Rainbow only implements the binary coding scheme followed by Baum (1992). The Purvis (1995) coding method and other coding methods are not implemented in this version.
There are three supertree methods currently available in Rainbow: MRP and MRF heuristic algorithms and the modified Mincut algorithm. Before building a supertree, it will be helpful to understand some of the details of these methods. This manual only briefly describes the methods, and therefore, users of Rainbow are strongly encouraged to read about these methods further.
Matrix representation with parsimony (MRP) is the most commonly used method for building supertrees. MRP takes a binary matrix representation of a set of input trees (the MRP matrix) and seeks the topology that requires the fewest number of state changes based on the MRP matrix. Thus, MRP can use the same tree search heuristics as a conventional maximum parsimony tree search.
Matrix representation with flipping (MRF) also starts with an MRP matrix. The binary MRP matrix from a rooted input tree may be transformed into a subset of the columns of a rooted supertree by changing some of the "0"s to "1"s and "1"s to "0"s. Each change in the character state is a flip, and the minimum number of flips needed to transform the input tree into a supertree is the flip distance. The MRF heuristic seeks the rooted supertree(s) that minimizes the total flip distance from all input trees. The MRF heuristic used in Rainbow is similar to conventional hill-climbing tree search heuristics used for phylogenetics (including the MRP heuristic), but it searches rooted trees, not unrooted trees. These rooted branch swaps are similar to the unrooted versions, but the tree modifications are applied to the nodes (in the rooted version) rather than the branches. For further discussion of the MRF supertree problem see Chen et al. (2003) or Burleigh et al. (2004). Details regarding the MRF heuristic algorithm may be found in Eulenstein et al. (2004).
Finally, the modified minCut (MMC) algorithm is a polynomial time algorithm for building supertrees that does not use a matrix representation of the trees. Page (2002) developed the MMC algorithm, which is based on the original minCut algorithm of Semple and Steel (2000). This algorithm is fast and may be useful for building very large supertrees. However, in simulation studies MMC supertrees generally are not as accurate as MRF or MRP supertrees (Eulenstein et al., 2004).
Building a Supertree
When you click the menu item "Construct Supertree ..." in the supertree menu or the tool button in the toolbar, a wizard dialog will pop up. If the active window is a tree window with input trees, the controls for building a supertree will be enabled (as in the figure below). If there are no trees in the tree window, most controls will be disabled.
There are five steps to constructing a supertree. These steps correspond to the five sections of the supertree control window (see below).
NOTE: The quartet supertree method (Piaggio-Talice et al., 2004) is not implemented in this version of Rainbow, but it will be added in future versions. If you would like to use the quartet supertree method, you can download a free program at the following web site: http://genome.cs.iastate.edu/CBL/download/.
NOTE: The MRF heuristic generally is slower than an equivalent MRP heuristic. It may be best to run a short search first to get some idea of the running time.
NOTE: In numerous cases, we found that the MRF heuristic with SPR branch swapping finds supertrees with a lower total flip distance to the input trees than TBR branch swapping, even when they start from the same starting tree. The tree search using NNI branch swapping generally is fast, but almost always finds supertrees with worse flip scores than SPR or TBR. While TBR occasionally finds more supertrees than SPR with the same optimal flip distance, it appears to rarely or ever find trees with lower flip scores than SPR. At this time, we cannot rule out that a "bug" may be responsible for the poor relative performance of the TBR branch swapping in the MRF heuristic.
NOTE: It has been reported
that the MRF heuristic
NOTE: This feature currently DOES NOT WORK in the Mac version! If any distance measures are checked, Rainbow will crash at the completion of the supertree run. To prevent this problem, you should run any supertree calculations without calculating any measures of supertree quality.
Terminating the supertree calculation
There are two ways to terminate the supertree calculation. One is to use the menu command "Terminate current running supertree calculation ..." in the menu. The other is to click the green stop sign icon in the rightmost status bar. In either case, a message box will pop up asking for confirmation. If the answer is "Yes", the background calculation will be terminated. Although it is possible to terminate the background job using the "kill" command in a Unix/Mac system, this is NOT recommended.
Bugs / Updates
In an earlier version of the program, there was a problem with the MRF heuristic (kindly pointed out by P. Goloboff). One symptom of this problem was that when all input trees were identical, the MRF supertree search occasionally identified a supertree that differed from the identical input trees. This is fixed in the current versions. However, we are still examining the performance of the TBR branch-swapping for the MRF heuristic. We do not recommend solely relying on TBR branch-swapping when running the MRF heuristic. In our experience, multiple runs using both SPR and TBR branch swapping often identify supertrees with the better flip scores than a single run using TBR branch swapping.
If you find a bug or have questions or comments regarding Rainbow please contact Duhong Chen (firstname.lastname@example.org).
Baum, B. R. 1992. Combining trees as a way of combining data sets for phylogenentic inference, and the desirability of combining gene trees. Taxon 41:3-10.
Bininda-Emonds,O.R.P. 2004. The evolution of supertrees. Trends Ecol. Evol. 19:315-322.
Bininda-Emonds,O.R.P., Gittleman,J.L. and Steel,M.A. 2002. The (super)tree of life: Procedures, problems, and prospects. Annu. Rev. Ecol. Syst. 33:265-289.
Burleigh, J. G., O. Eulenstein, D. Fernández-Baca, and M. J. Sanderson. 2004. MRF supertrees. Pp. 65-85. In "Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life" (O. R. P. Bininda-Emonds, Ed). Kluwer Academic Press, The Nethlerlands.
Chen, D., L. Diao, O. Eulenstein, D. Fernández-Baca, and M. J. Sanderson. 2003. Flipping: A supertree construction method. Pages 135-160 in DIMACS series in discrete mathematics and theoretical computer science, Volume 61. Bioconsensus (M. F. Janowitz, F.-J. Lapointe, F. R. McMorris, B. Mirkin, and F. S. Roberts, eds.). American Mathematical Society, Providence, Rhode Island.
Eulenstein, O., D. Chen, J. G. Burleigh, D. Fernández-Baca. and M. J. Sanderson. 2004. Performance of flip-supertree construction with a heuristic algorithm. Syst. Biol., 53:299-308.
Page, R. D. M. 2002. Modified mincut supertrees. Pp. 537-551 in R. Guigo and D. Gusfield (eds.), WABI 2002, LNCS 2452.
Piaggio-Talice, R., J. G. Burleigh, and O. Eulenstein. 2004. Quartet supertrees. Pp. 173-191. In Phylogenetic supertrees: combining information to reveal the tree of life. (O. R. P. Bininda-Emonds, ed.). Kluwer Academic Press, The Netherlands.
Purvis, A. 1995. A modification to Baum and Ragan's method for combining phylogenetic trees. Syst. Biol. 44: 251-255.
Ragan, M. A. 1992. Phylogenetic inference based on matrix representation of trees. Mol. Phylogenet. Evol. 1:53-58.
Robinson, D. F., and L. R. Foulds. 1981. Comparison of phylogenetic trees. Mathematical Biosciences 53: 131-147.
Sanderson, M. J., A. Purvis, and C. Henze. 1998. Phylogenetic supertrees: assembling the trees of life. Trends Ecol. Evol. 13:105-109.
Semple, C., and M. Steel. 2000. A supertree method for rooted trees. Disc. Appl. Math. 105:147-158.
Swofford D. 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 10. Sinauer, Sunderland, MA.
Rod Page generously provided his program for calculating modified mincut supertrees and TreeLib for displaying and handling trees. Gordon Burleigh and Mike Sanderson helped test the software and write the manual. This work was supported by NSF grants 1053164 (Sanderson and Eulenstein), CCR-9988348 (Fernández-Baca), and EF-0334832 (Eulenstein, Fernández-Baca).