# R Hackathon 1/Trait Evolution SG

From Phyloinformatics

Revision as of 16:37, 24 January 2008 by Lukeh@uidaho.edu (talk)

- Participants: Harmon, Hipp, Hunt

## Targets.

- Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite)
- Improve functionality of character fitting in r
- Identify gaps in current implementation

## Accomplishments.

- Evaluated the results of continuous character analyses in different packages
- Packages are mostly consistent
- Discrepancies come from two sources:
- Different approaches (e.g. marginal versus joint likelihood)
- Difficulties in finding the ML solution

- For continuous characters:
- geiger and OUCH tend to return the same parameter estimates
- But they return different likelihoods

- For discrete characters
- geiger and mesquite are consistent, returning the same parameter estimates and likelihoods
- geiger and ape are different
- ape is reporting the joint likelihoods for ancestral states. This uses the single set of ancestral states that together result in the highest likelihood on the whole tree.
- mesquite and geiger use marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values.
- This also means that you get different ancestral state reconstructions from ape and mesquite

- Improved functionality
- geiger was modified to give more reliable results by a more thorough search of the likelihood surface (fitContinuous)
- geiger can deal with a more general set of discrete character models (fitDiscrete)
- geiger's tree transformations now work for nonultrametric trees

- Identify gaps in current implementation
- The main gap, from an end-user perspective, is obtaining estimates of ancestral character states in r
- ape does this, but only for joint likelihoods, and the function sometimes has trouble finding the ML solution
- There is no way to get marginal ancestral character states for discrete characters in r other than interfacing with Mesquite

## To do

- Implement "white noise" model in geiger's fitContinuous
- Investigate statistical properties of these methods
- Which models can we tell apart?
- How much data is needed?
- Are parameter estimates biased?