Difference between revisions of "R Hackathon 1/Trait Evolution SG"

From Phyloinformatics
Jump to: navigation, search
(Accomplishments.)
 
(8 intermediate revisions by 2 users not shown)
Line 2: Line 2:
 
== Targets.==
 
== Targets.==
 
# Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite)
 
# Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite)
# Improve functionality of character fitting in r
+
# Improve functionality of character fitting in R
# Investigate statistical power of discriminating among various models
 
 
# Identify gaps in current implementation
 
# Identify gaps in current implementation
 +
 
== Accomplishments. ==
 
== Accomplishments. ==
 
# Evaluated the results of continuous character analyses in different packages
 
# Evaluated the results of continuous character analyses in different packages
Line 12: Line 12:
 
#** Difficulties in finding the ML solution
 
#** Difficulties in finding the ML solution
 
#* For continuous characters:  
 
#* For continuous characters:  
#** geiger and OUCH tend to return the same parameter estimates
+
#** geiger and OUCH tend to return the same parameter estimates, but different likelihoods.
#** But they return different likelihoods
+
#*** the Brownian motion model is relatively easy to fit, with results that seem to be compatible across packages
 +
#*** the Ornstein-Uhlenbeck model with one optimum is a more difficult model to fit. There is often a large, nearly flat ridge in the log-likelihood surface, and different optimization routines and settings may stop at different places along this ridge.  This can yield rather different parameter estimates, although the log-likelihoods should not vary as much.
 
#* For discrete characters
 
#* For discrete characters
 
#** geiger and mesquite are consistent, returning the same parameter estimates and likelihoods
 
#** geiger and mesquite are consistent, returning the same parameter estimates and likelihoods
 
#** geiger and ape are different  
 
#** geiger and ape are different  
#** ape is reporting the joint likelihoods for ancestral states. This represents the single set of ancestral
+
#** ape is reporting the joint likelihoods for ancestral states. This uses the single set of ancestral states that together result in the highest likelihood on the whole tree.
states that together result in the highest likelihood on the whole tree. mesquite and geiger use
+
#** mesquite and geiger use marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values.
marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values.
+
#** This also means that you get different ancestral state reconstructions from ape and mesquite
# Improve functionality or, at least, interpretability of output
+
# Improved functionality
#* GEIGER was modified to give more reliable results by a more thorough search of the likelihood surface
+
#* geiger was modified to give more reliable results by a more thorough search of the likelihood surface (fitContinuous)
#* Some ape functions seem (to us) unreliable for large trees
+
#* geiger can deal with a more general set of discrete character models (fitDiscrete)
 +
#* geiger's tree transformations now work for nonultrametric trees
 +
# Identify gaps in current implementation
 +
#* The main gap, from an end-user perspective, is obtaining estimates of ancestral character states in R
 +
#* ape does this, but only for joint likelihoods, and the function sometimes has trouble finding the ML solution
 +
#* There is no way to get marginal ancestral character states for discrete characters in r other than interfacing with Mesquite
  
Test
+
==To do==
** Clarified why one sometimes gets different results from different programs
+
# Implement "white noise" and Brownian motion with a trend models in geiger's fitContinuous
** Modified GEIGER package so that results for fitting models of character evolution are more robust
+
# Investigate statistical properties of these methods
** Modified functions to rescale trees and variance-covariance matrices for non-ultrametric trees
+
#* Which models can we tell apart?
 +
#* How much data are needed?
 +
#* Are parameter estimates biased?
 +
#* Performance of different model selection criteria (LRT, information criteria)
  
  
 
[[Category:R Hackathon 1]]
 
[[Category:R Hackathon 1]]

Latest revision as of 12:04, 28 January 2008

  • Participants: Harmon, Hipp, Hunt

Targets.

  1. Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite)
  2. Improve functionality of character fitting in R
  3. Identify gaps in current implementation

Accomplishments.

  1. Evaluated the results of continuous character analyses in different packages
    • Packages are mostly consistent
    • Discrepancies come from two sources:
      • Different approaches (e.g. marginal versus joint likelihood)
      • Difficulties in finding the ML solution
    • For continuous characters:
      • geiger and OUCH tend to return the same parameter estimates, but different likelihoods.
        • the Brownian motion model is relatively easy to fit, with results that seem to be compatible across packages
        • the Ornstein-Uhlenbeck model with one optimum is a more difficult model to fit. There is often a large, nearly flat ridge in the log-likelihood surface, and different optimization routines and settings may stop at different places along this ridge. This can yield rather different parameter estimates, although the log-likelihoods should not vary as much.
    • For discrete characters
      • geiger and mesquite are consistent, returning the same parameter estimates and likelihoods
      • geiger and ape are different
      • ape is reporting the joint likelihoods for ancestral states. This uses the single set of ancestral states that together result in the highest likelihood on the whole tree.
      • mesquite and geiger use marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values.
      • This also means that you get different ancestral state reconstructions from ape and mesquite
  2. Improved functionality
    • geiger was modified to give more reliable results by a more thorough search of the likelihood surface (fitContinuous)
    • geiger can deal with a more general set of discrete character models (fitDiscrete)
    • geiger's tree transformations now work for nonultrametric trees
  3. Identify gaps in current implementation
    • The main gap, from an end-user perspective, is obtaining estimates of ancestral character states in R
    • ape does this, but only for joint likelihoods, and the function sometimes has trouble finding the ML solution
    • There is no way to get marginal ancestral character states for discrete characters in r other than interfacing with Mesquite

To do

  1. Implement "white noise" and Brownian motion with a trend models in geiger's fitContinuous
  2. Investigate statistical properties of these methods
    • Which models can we tell apart?
    • How much data are needed?
    • Are parameter estimates biased?
    • Performance of different model selection criteria (LRT, information criteria)