R Hackathon 1/Trait Evolution SG

From Phyloinformatics
Revision as of 16:36, 24 January 2008 by Lukeh@uidaho.edu (talk)
Jump to: navigation, search
  • Participants: Harmon, Hipp, Hunt

Targets.

  1. Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite)
  2. Improve functionality of character fitting in r
  3. Identify gaps in current implementation

Accomplishments.

  1. Evaluated the results of continuous character analyses in different packages
    • Packages are mostly consistent
    • Discrepancies come from two sources:
      • Different approaches (e.g. marginal versus joint likelihood)
      • Difficulties in finding the ML solution
    • For continuous characters:
      • geiger and OUCH tend to return the same parameter estimates
      • But they return different likelihoods
    • For discrete characters
      • geiger and mesquite are consistent, returning the same parameter estimates and likelihoods
      • geiger and ape are different
      • ape is reporting the joint likelihoods for ancestral states. This uses the single set of ancestral states that together result in the highest likelihood on the whole tree.
      • mesquite and geiger use marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values.
      • This also means that you get different ancestral state reconstructions from ape and mesquite
  2. Improved functionality
    • geiger was modified to give more reliable results by a more thorough search of the likelihood surface (fitContinuous)
    • geiger can deal with a more general set of discrete character models (fitDiscrete)
    • geiger's tree transformations now work for nonultrametric trees
  3. Identify gaps in current implementation
    • The main gap, from an end-user perspective, is obtaining estimates of ancestral character states in r
    • ape does this, but only for joint likelihoods, and the function sometimes has trouble finding the ML solution

To do

  1. Implement "white noise" model in geiger's fitContinuous
  2. Investigate statistical properties of these methods
    • Which models can we tell apart?
    • How much data is needed?
    • Are parameter estimates biased?