Rainbow:

A Toolbox for Phylogenetic Supertree Construction and Analysis

 

D. Chen, O. Eulenstein, and D. Fernández-Baca

Department of Computer Science; Iowa State University; Ames, IA 50011, USA

 

 


 

Table of Contents

 

Introduction to Rainbow

Installing Rainbow

Windows

Menus

Creating an MRP matrix

Building Supertrees

Bugs / Updates

References

Acknowledgments

 

 

Introduction

      Supertrees are phylogenetic trees built from sets of input trees that share some, but not necessarily all, of their terminal taxa.  Supertree methods may be useful for reconstructing large phylogenies and combining phylogenetic hypotheses made from disparate types of data.  Though there has been much recent interest in building supertrees and developing supertree methods, there are few available programs that allow researchers to utilize new supertree methods. The goal of Rainbow is to provide a user-friendly environment in which scientists can utilize tools for building and analyzing supertrees. Rainbow provides a graphic user interface (GUI) to construct supertrees using several different methods.  Currently this includes matrix representation with flipping (MRF; Chen et al., 2001; Burleigh et al., 2004; Eulenstein et al., 2004), matrix representation with parsimony (MRP; e.g., Baum, 1992; Ragan, 1992), and the modified Mincut algorithm (MMC; Page, 2002). Rainbow also provides tools to analyze the quality of the inferred supertrees.  Rainbow is available for MacOSX, MS-Windows, and Unix.

For further reading about supertrees and supertree methods, there are reviews by Sanderson et al. (1998), Bininda-Emonds et al. (2002), and Bininda-Emonds (2004).  Also, the recent book "Phylogenetic supertrees: combining information to reveal the tree of life" from Kluwer Academic Press contains chapters on many aspects of supertree research.

 


Installing Rainbow

 

Downloading Rainbow: Macintosh, Windows, and Linux/Unix versions of Rainbow may be downloaded for free at the following website: http://genome.cs.iastate.edu/CBL/download/.

 

System Requirements: Rainbow is a GUI shell, and it uses other external command line applications to construct supertrees. The following programs are required to access all features ofRainbow:

NOTE: A command-line version of PAUP* is not publicly available for the Macintosh system at this time.  Therefore, the Mac version of Rainbow cannot build MRP supertrees or compute some of the tree distance measures.  To build MRP supertrees on a Mac, one can create and save a nexus file with an MRP matrix using Rainbow (see "Creating an MRP matrix" below), and then use the GUI version of PAUP* to do a parsimony search using this matrix, outside of the Rainbow environment.

 

 

          System PreferencesMacintosh: The Mac version currently requires Mac OS X Panther V10.3. To download the Mac version of Rainbow, simply place the Rainbow application in the desired folder. NOTE: Just double-clicking on the "Windows OS X disk image" link may not install Rainbow.  Try dragging the link into the desired folder on the Mac.

 

Windows: Rainbow is available in 32-bit (Win32) version only.  An installation program will automatically install Rainbow; simply click on the link and follow the instructions.  NOTE: Rainbow supports the "Uninstall" feature of Windows 95/98/2000/XP so that Rainbow can be removed from the computer using the "Add/Remove Programs" control panel application.

 

Linux/Unix: A binary executable is available for Linux x86.  For other UNIX platforms, the user will need to download the source file and rebuild the target. Consult your system administrator if this does not mean anything to you.


 

Windows

Rainbow is a multiple document interface (MDI) application. There are two types of documents: tree documents, displayed as graphs, and report documents, displayed as text.  Several files can be open at the same time.  Each file corresponds to one window, and will be closed if its window is closed. 

Below is a screen-shot of Rainbow (Mac OSX). The right frame is the tree navigation bar, showing the currently open files and their trees.  At the bottom is a log window that displays the progress of a supertree heuristic. At the center is workspace that can have many windows. To view different trees, simply click the tree name on the tree navigation bar at the right of the screen.

 

 

 

 


Menus

File menu

 

 

Open

The Open menu command opens tree files. A tree window will be created for the opened tree file, and the filename as well as the tree names will be displayed in the tree navigation bar.  Trees within the opened files can then be used to build supertrees or create an MRP matrix.  Rainbow reads most NEXUS format tree files. In the Windows version of Rainbow, the Open dialog box has a range of filters corresponding to commonly used extensions for tree files. 

 

Save As...

The Save As command enables the user to save tree(s) to new NEXUS tree files.

 

Close

This command closes the current tree window and the opened tree file.

 

Print

Prints the current tree.

 

Trees Menu

NOTE: This menu will be disabled (gray) when the active window is not a tree window.

 

 

Draw Slant Tree / Draw Rectangular Tree

These commands allow the tree(s) to be displayed as either slanted or rectangular. 

 

Tree Font... / Order Tree

The Tree Font command can change the font used to label the leaves (taxa) in the trees in the current active window.  The Order Tree command changes the tree display so that "heavier" nodes (i.e., those with more descendants) are drawn either to the left or to the right.  It can also restore the original order.

 

Zoom In / Zoom Out / Zoom to Fit

The Zoom In command allows the user to examine portions of a large tree in more detail. A vertical scrollbar is displayed after selecting this command.  The Zoom Out command shrinks the tree window in half, and the Zoom To Fit command returns the tree to its original size.

 


Supertree Menu

 

 

Create MRP matrix ...

Create a weighted MRP matrix.  For details, see the "Creating an MRP matrix" section below.

 

Construct Supertree ...

A dialog box will appear to guide the user in building a supertree step by step.  For details, see the "Building Supertrees" section below.

  

Terminate current running supertree calculation ...

A dialog will pop up to ask you whether you like to terminate current running supertree calculation. This menu item is disabled unless a supertree program is running under background.  The supertree run can also be terminated by clicking on the

 

Show Log

Show/Hide the log window.  This can also be accomplished by clicking the "show log" button on the right side of the status bar.


 

Creating an MRP matrix

            This feature allows the user to convert one or more nexus tree files into a nexus file with a binary "matrix representation", or an "MRP matrix" (e.g., Ragan, 1992; Baum, 1992).  In the MRP matrix, each character (or column in the matrix) represents a clade from an input tree.  Taxa in the clade have a character state "1", taxa not in the clade have character state "0", and taxa not sampled for the tree have a "?".  MRP matrices are used to build MRP and MRF supertrees. 

To build an MRP matrix, first click the menu item "Create MRP Matrix ..." from the supertree menu.  A dialog box will pop up (as shown below). Tree files can be added/removed from the current tree list using the "Add…" and "Remove" features. Also, the weight for each tree can be adjusted by selecting a tree and using the spin button at the bottom of tree drawing window.  If the weight of a tree is w, the generated matrix will contain w copies of each character corresponding to that tree.  The weight must be an integer between 1 and 99, and the default weight for each tree is 1. Clicking the OK button will generate the MRP matrix. The MRP matrix will contain an outgroup node whose sequence is composed of all 0s unless the user unchecks the "with outgroup node" box in the bottom left corner.

 

NOTE: Though building an MRP matrix is necessary to build MRP or MRF supertrees, it is not necessary to construct an MRP matrix before constructing a supertree with Rainbow.  The "Construct Supertree…" feature of Rainbow automatically builds an MRP matrix for MRP or MRF analyses. 

 

NOTE: All columns in the MRP matrix represent clades.  Single taxa (or tips on trees) are not represented in the MRP matrix and are not used to calculate the flip scores for supertrees.  There are no autapomorphies in any MRP matrices.

 

NOTE: Rainbow only implements the binary coding scheme followed by Baum (1992).  The Purvis (1995) coding method and other coding methods are not implemented in this version.

 

 


 

 

Building Supertrees

Introduction

            There are three supertree methods currently available in Rainbow: MRP and MRF heuristic algorithms and the modified Mincut algorithm.  Before building a supertree, it will be helpful to understand some of the details of these methods.  This manual only briefly describes the methods, and therefore, users of Rainbow are strongly encouraged to read about these methods further. 

Matrix representation with parsimony (MRP) is the most commonly used method for building supertrees.  MRP takes a binary matrix representation of a set of input trees (the MRP matrix) and seeks the topology that requires the fewest number of state changes based on the MRP matrix.  Thus, MRP can use the same tree search heuristics as a conventional maximum parsimony tree search. 

Matrix representation with flipping (MRF) also starts with an MRP matrix.  The binary MRP matrix from a rooted input tree may be transformed into a subset of the columns of a rooted supertree by changing some of the "0"s to "1"s and "1"s to "0"s.  Each change in the character state is a flip, and the minimum number of flips needed to transform the input tree into a supertree is the flip distance. The MRF heuristic seeks the rooted supertree(s) that minimizes the total flip distance from all input trees.   The MRF heuristic used in Rainbow is similar to conventional hill-climbing tree search heuristics used for phylogenetics (including the MRP heuristic), but it searches rooted trees, not unrooted trees.  These rooted branch swaps are similar to the unrooted versions, but the tree modifications are applied to the nodes (in the rooted version) rather than the branches.  For further discussion of the MRF supertree problem see Chen et al. (2003) or Burleigh et al. (2004).  Details regarding the MRF heuristic algorithm may be found in Eulenstein et al. (2004). 

Finally, the modified minCut (MMC) algorithm is a polynomial time algorithm for building supertrees that does not use a matrix representation of the trees.  Page (2002) developed the MMC algorithm, which is based on the original minCut algorithm of Semple and Steel (2000).   This algorithm is fast and may be useful for building very large supertrees.  However, in simulation studies MMC supertrees generally are not as accurate as MRF or MRP supertrees (Eulenstein et al., 2004). 

 

Building a Supertree

When you click the menu item "Construct Supertree ..." in the supertree menu or the construct tool button in the toolbar, a wizard dialog will pop up. If the active window is a tree window with input trees, the controls for building a supertree will be enabled (as in the figure below). If there are no trees in the tree window, most controls will be disabled.

 

There are five steps to constructing a supertree.  These steps correspond to the five sections of the supertree control window (see below). 

  1. Select Input Trees and Set Weights.  Clicking on the "Select Trees" option opens a window in which input trees can be added or deleted and weights for input trees can be set (the default weight for all input trees is 1).  If input trees were previously selected using the "Open" command in the File menu, this step does not have to be repeated.
  2. Choose a Supertree Method.  Currently, three methods are available: Matrix Representation with Parsimony (MRP), Matrix Representation with Flipping (MRF), and the modified MinCut algorithm (MMC). 

NOTE: The quartet supertree method (Piaggio-Talice et al., 2004) is not implemented in this version of Rainbow, but it will be added in future versions.  If you would like to use the quartet supertree method, you can download a free program at the following web site: http://genome.cs.iastate.edu/CBL/download/.

  1. Set parameters for the tree search heuristic (MRF and MRP only).  Allows one to specify the number of addition sequence replicates (1-99), the maximum number of supertrees saved (1-99) and the type of branch swapping for the MRF or MRF heuristic (none, SPR, TBR, or NNI). 

NOTE:  The MRF heuristic generally is slower than an equivalent MRP heuristic. It may be best to run a short search first to get some idea of the running time.  A faster algorithm for calculating minimum flip distance will soon be implemented in Rainbow.

NOTE: In numerous cases, we found that the MRF heuristic with SPR branch swapping finds supertrees with a lower total flip distance to the input trees than TBR branch swapping, even when they start from the same starting tree.  The tree search using NNI branch swapping generally is fast, but almost always finds supertrees with worse flip scores than SPR or TBR.  While TBR occasionally finds more supertrees than SPR with the same optimal flip distance, it appears to rarely or ever find trees with lower flip scores than SPR.  At this time, we cannot rule out that a "bug" may be responsible for the poor relative performance of the TBR branch swapping in the MRF heuristic. 

NOTE: It has been reported that the MRF heuristic will inaccurately calculate the flip score when the set of input trees contains only two identical topologies.  In this case, the minimum flip score should be 0.  This problem has been fixed.

  1. Measuring supertree quality. This option chooses the distance metrics used to compare the input trees with the resulting supertree(s).  The Windows version can calculate the Normalized Symmetric Distance (Robinson and Foulds, 1981). The Mac and Linux versions additionally offer the Normalized MAST.  The normalized MAST is the number of taxa in the maximum agreement subtree between an input tree and its resulting supertree divided by the number of total taxa in the input tree. 

NOTE: This feature currently DOES NOT WORK in the Mac version!  If any distance measures are checked, Rainbow will crash at the completion of the supertree run.  To prevent this problem, you should run any supertree calculations without calculating any measures of supertree quality. 

  1. Save the results to a file.  This saves the resulting supertree(s) as well as any associated reports.

 

Known Problems:

  1. Program cannot read nexus tree file (e.g., message  "This file doesn't contain taxa-character matrix" in the log window).  Do any taxon names in the tree file contain underscore characters (e.g., "Arabidopsis_thaliana")?  Try removing the underscores.  
  2. MRP heuristic is not progressing (e.g., no progress reported in log window).  Is this the Mac version of Rainbow?  It cannot execute the MRP heuristic.  You can build MRP supertrees with the Mac version of PAUP*(Swofford, 2003).
  3. Program crashes after at the completion of the tree heuristic.   Is this the Mac version of Rainbow?  Try executing the heuristic withoutchoosing any options for measuring supertree quality.  The Mac version will crash if you try to calculate the normalized MAST or SymDist.

 

Terminating the supertree calculation

There are two ways to terminate the supertree calculation. One is to use the menu command "Terminate current running supertree calculation ..." in the menu.  The other is to click the green stop sign icon in the rightmost status bar. In either case, a message box will pop up asking for confirmation. If the answer is "Yes", the background calculation will be terminated.  Although it is possible to terminate the background job using the "kill" command in a Unix/Mac system, this is NOT recommended.


 

Bugs / Updates

In an earlier version of the program, there was a problem with the MRF heuristic (kindly pointed out by P. Goloboff). One symptom of this problem was that when all input trees were identical, the MRF supertree search occasionally identified a supertree that differed from the identical input trees.  This is fixed in the current versions.  However, we are still examining the performance of the TBR branch-swapping for the MRF heuristic.  We do not recommend solely relying on TBR branch-swapping when running the MRF heuristic.  In our experience, multiple runs using both SPR and TBR branch swapping often identify supertrees with the better flip scores than a single run using TBR branch swapping.

 

Reporting Bugs

If you find a bug or have questions or comments regarding Rainbow please contact Duhong Chen (duhong@iastate.edu). 

 


 

References

 

Baum, B. R. 1992. Combining trees as a way of combining data sets for phylogenentic inference, and the desirability of combining gene trees.  Taxon 41:3-10.

Bininda-Emonds,O.R.P.  2004.  The evolution of supertrees.  Trends Ecol. Evol. 19:315-322.

Bininda-Emonds,O.R.P., Gittleman,J.L. and Steel,M.A.  2002. The (super)tree of life: Procedures, problems, and prospects.  Annu. Rev. Ecol. Syst. 33:265-289.

Burleigh, J. G., O. Eulenstein, D. Fernández-Baca, and M. J. Sanderson. 2004. MRF supertrees. Pp. 65-85. In "Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life" (O. R. P. Bininda-Emonds, Ed). Kluwer Academic Press, The Nethlerlands.

Chen, D., L. Diao, O. Eulenstein, D. Fernández-Baca, and M. J. Sanderson.  2003.  Flipping: A supertree construction method.  Pages 135-160 in DIMACS series in discrete mathematics and theoretical computer science, Volume 61. Bioconsensus (M. F. Janowitz, F.-J. Lapointe, F. R. McMorris, B. Mirkin, and F. S. Roberts, eds.). American Mathematical Society, Providence, Rhode Island.

Eulenstein, O., D. Chen, J. G. Burleigh, D. Fernández-Baca. and M. J. Sanderson. 2004.  Performance of flip-supertree construction with a heuristic algorithm.  Syst. Biol., 53:299-308.

Page, R. D. M.  2002.  Modified mincut supertrees.  Pp. 537-551 in R. Guigo and D. Gusfield (eds.), WABI 2002, LNCS 2452.

Piaggio-Talice, R., J. G. Burleigh, and O. Eulenstein.  2004.  Quartet supertrees.  Pp. 173-191. In Phylogenetic supertrees: combining information to reveal the tree of life. (O. R. P. Bininda-Emonds, ed.).  Kluwer Academic Press, The Netherlands.

Purvis, A.  1995.  A modification to Baum and Ragan's method for combining phylogenetic trees.  Syst. Biol. 44: 251-255.

Ragan, M. A. 1992. Phylogenetic inference based on matrix representation of trees.  Mol. Phylogenet. Evol. 1:53-58.

Robinson, D. F., and L. R. Foulds.  1981.  Comparison of phylogenetic trees.  Mathematical Biosciences 53: 131-147.

Sanderson, M. J., A. Purvis, and C. Henze.  1998.  Phylogenetic supertrees: assembling the trees of life.  Trends Ecol. Evol. 13:105-109.

Semple, C., and M. Steel.  2000.  A supertree method for rooted trees.  Disc. Appl. Math. 105:147-158.

Swofford D. 2003. PAUP*.  Phylogenetic analysis using parsimony (*and other methods).  Version 10.  Sinauer, Sunderland, MA.

 
 

Acknowledgments

Rod Page generously provided his program for calculating modified mincut supertrees and TreeLib for displaying and handling trees.  Gordon Burleigh and Mike Sanderson helped test the software and write the manual. This work was supported by NSF grants 1053164 (Sanderson and Eulenstein), CCR-9988348 (Fernández-Baca), and EF-0334832 (Eulenstein, Fernández-Baca).