Difference between revisions of "R Hackathon 1/Trait Evolution SG"

From Phyloinformatics
Jump to: navigation, search
(added a few accomplishments)
(Accomplishments.)
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
* Participants: Alfaro, Harmon, Hipp, Hunt
+
* Participants: Harmon, Hipp, Hunt
* Targets/Accomplishments
+
== Targets.==
** Evaluated the results of continuous character analyses in different packages, and corrected some errors
+
# Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite)
** Modified functions to rescale trees and variance-covariance matrices for non-ultrametric trees
+
# Improve functionality of character fitting in R
 +
# Identify gaps in current implementation
 +
 
 +
== Accomplishments. ==
 +
# Evaluated the results of continuous character analyses in different packages
 +
#* Packages are mostly consistent
 +
#* Discrepancies come from two sources:
 +
#** Different approaches (e.g. marginal versus joint likelihood)
 +
#** Difficulties in finding the ML solution
 +
#* For continuous characters:
 +
#** geiger and OUCH tend to return the same parameter estimates, but different likelihoods.
 +
#*** the Brownian motion model is relatively easy to fit, with results that seem to be compatible across packages
 +
#*** the Ornstein-Uhlenbeck model with one optimum is a more difficult model to fit. There is often a large, nearly flat ridge in the log-likelihood surface, and different optimization routines and settings may stop at different places along this ridge.  This can yield rather different parameter estimates, although the log-likelihoods should not vary as much.
 +
#* For discrete characters
 +
#** geiger and mesquite are consistent, returning the same parameter estimates and likelihoods
 +
#** geiger and ape are different
 +
#** ape is reporting the joint likelihoods for ancestral states. This uses the single set of ancestral states that together result in the highest likelihood on the whole tree.
 +
#** mesquite and geiger use marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values.
 +
#** This also means that you get different ancestral state reconstructions from ape and mesquite
 +
# Improved functionality
 +
#* geiger was modified to give more reliable results by a more thorough search of the likelihood surface (fitContinuous)
 +
#* geiger can deal with a more general set of discrete character models (fitDiscrete)
 +
#* geiger's tree transformations now work for nonultrametric trees
 +
# Identify gaps in current implementation
 +
#* The main gap, from an end-user perspective, is obtaining estimates of ancestral character states in R
 +
#* ape does this, but only for joint likelihoods, and the function sometimes has trouble finding the ML solution
 +
#* There is no way to get marginal ancestral character states for discrete characters in r other than interfacing with Mesquite
 +
 
 +
==To do==
 +
# Implement "white noise" and Brownian motion with a trend models in geiger's fitContinuous
 +
# Investigate statistical properties of these methods
 +
#* Which models can we tell apart?
 +
#* How much data are needed?
 +
#* Are parameter estimates biased?
 +
#* Performance of different model selection criteria (LRT, information criteria)
  
  
 
[[Category:R Hackathon 1]]
 
[[Category:R Hackathon 1]]

Latest revision as of 12:04, 28 January 2008

  • Participants: Harmon, Hipp, Hunt

Targets.

  1. Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite)
  2. Improve functionality of character fitting in R
  3. Identify gaps in current implementation

Accomplishments.

  1. Evaluated the results of continuous character analyses in different packages
    • Packages are mostly consistent
    • Discrepancies come from two sources:
      • Different approaches (e.g. marginal versus joint likelihood)
      • Difficulties in finding the ML solution
    • For continuous characters:
      • geiger and OUCH tend to return the same parameter estimates, but different likelihoods.
        • the Brownian motion model is relatively easy to fit, with results that seem to be compatible across packages
        • the Ornstein-Uhlenbeck model with one optimum is a more difficult model to fit. There is often a large, nearly flat ridge in the log-likelihood surface, and different optimization routines and settings may stop at different places along this ridge. This can yield rather different parameter estimates, although the log-likelihoods should not vary as much.
    • For discrete characters
      • geiger and mesquite are consistent, returning the same parameter estimates and likelihoods
      • geiger and ape are different
      • ape is reporting the joint likelihoods for ancestral states. This uses the single set of ancestral states that together result in the highest likelihood on the whole tree.
      • mesquite and geiger use marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values.
      • This also means that you get different ancestral state reconstructions from ape and mesquite
  2. Improved functionality
    • geiger was modified to give more reliable results by a more thorough search of the likelihood surface (fitContinuous)
    • geiger can deal with a more general set of discrete character models (fitDiscrete)
    • geiger's tree transformations now work for nonultrametric trees
  3. Identify gaps in current implementation
    • The main gap, from an end-user perspective, is obtaining estimates of ancestral character states in R
    • ape does this, but only for joint likelihoods, and the function sometimes has trouble finding the ML solution
    • There is no way to get marginal ancestral character states for discrete characters in r other than interfacing with Mesquite

To do

  1. Implement "white noise" and Brownian motion with a trend models in geiger's fitContinuous
  2. Investigate statistical properties of these methods
    • Which models can we tell apart?
    • How much data are needed?
    • Are parameter estimates biased?
    • Performance of different model selection criteria (LRT, information criteria)