COAL Description

COAL -- Program for computing gene tree probabilities and simulating gene trees in species trees under the coalescent model.

Basic COAL features and description:

- Computes probabilities of gene tree topologies given a fixed species tree with branch lengths specified in coalescent units (number of generations divided by twice the effective population size for a diploid population).

- Both trees are assumed to be bifurcating, although species tree branch lengths can be set to zero to get around this.

- Multiple (or zero) individuals sampled within species are allowed, but the notation is specific; labels for individuals should be the species name followed by a hyphen, then a number. For example, if A is a species name, then genes sampled from A should be called A-1, A-2, etc.

- There are three input files: a species tree file with species trees in Newick format; a gene tree file, with gene trees in Newick format; and an input file with commands including the names of species and gene tree files, the number of taxa, and options for output.

- Output options allow coalescent histories to be listed or not. If so, symbolic formulas for gene tree probabilities can be given in terms of p_{uv} terms (see paper). Other options include printing the gene tree topology, computing cumulative probabilities for gene trees within each species tree, and counting the number of coalescent histories. See the help file for various options.

- Gene trees can be simulated within species trees. Currently this is limited to the one gene per species case and requires constant population sizes in the species tree. Gene trees are generated so that subtrees are rotated uniquely. For example, if ((A,B),C) is a subtree, (C,(A,B)) will not be generated.

Known bugs:

* COAL can crash if taxon names are too long.

* There have been problems with a few of the sample input files on the MAC version. I am frequently (but not always) getting "bus error" messages on MAC.

* For cases with intraspecific sampling, there can be errors if the order of species names in the infile does not match the order that COAL encounters for the species tree file. The list of taxa names in the infile should be the same order as encountered in the species tree in NEXUS format where each node has at least as many left descendants as right descendants.

* Because of the problems in the previous bullet, when intraspecific sampling is used, only one species tree topology should be used in any COAL run (multiple species trees with the same topology but different branch lengths should be ok).

* Due to numerical precision limitations in COAL, gene trees with very low probability can be reported as having a probability of 0.0. This is more of a limitation than a bug.

Thanks to Erik Erhardt, Laura Salter Kubatko, Mike Hickerson, and Bryan Carstens for testing the software and to Bryan Carstens for sending the MAC executable.

