# Difference between revisions of "R Hackathon 1/Trait Evolution SG"

From Phyloinformatics

(→Accomplishments.) |
|||

(8 intermediate revisions by 2 users not shown) | |||

Line 2: | Line 2: | ||

== Targets.== | == Targets.== | ||

# Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite) | # Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite) | ||

− | # Improve functionality of character fitting in | + | # Improve functionality of character fitting in R |

− | |||

# Identify gaps in current implementation | # Identify gaps in current implementation | ||

+ | |||

== Accomplishments. == | == Accomplishments. == | ||

# Evaluated the results of continuous character analyses in different packages | # Evaluated the results of continuous character analyses in different packages | ||

Line 12: | Line 12: | ||

#** Difficulties in finding the ML solution | #** Difficulties in finding the ML solution | ||

#* For continuous characters: | #* For continuous characters: | ||

− | #** geiger and OUCH tend to return the same parameter estimates | + | #** geiger and OUCH tend to return the same parameter estimates, but different likelihoods. |

− | #** | + | #*** the Brownian motion model is relatively easy to fit, with results that seem to be compatible across packages |

+ | #*** the Ornstein-Uhlenbeck model with one optimum is a more difficult model to fit. There is often a large, nearly flat ridge in the log-likelihood surface, and different optimization routines and settings may stop at different places along this ridge. This can yield rather different parameter estimates, although the log-likelihoods should not vary as much. | ||

#* For discrete characters | #* For discrete characters | ||

#** geiger and mesquite are consistent, returning the same parameter estimates and likelihoods | #** geiger and mesquite are consistent, returning the same parameter estimates and likelihoods | ||

#** geiger and ape are different | #** geiger and ape are different | ||

− | #** ape is reporting the joint likelihoods for ancestral states. This | + | #** ape is reporting the joint likelihoods for ancestral states. This uses the single set of ancestral states that together result in the highest likelihood on the whole tree. |

− | states that together result in the highest likelihood on the whole tree. mesquite and geiger use | + | #** mesquite and geiger use marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values. |

− | marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values. | + | #** This also means that you get different ancestral state reconstructions from ape and mesquite |

− | # | + | # Improved functionality |

− | #* | + | #* geiger was modified to give more reliable results by a more thorough search of the likelihood surface (fitContinuous) |

− | #* | + | #* geiger can deal with a more general set of discrete character models (fitDiscrete) |

+ | #* geiger's tree transformations now work for nonultrametric trees | ||

+ | # Identify gaps in current implementation | ||

+ | #* The main gap, from an end-user perspective, is obtaining estimates of ancestral character states in R | ||

+ | #* ape does this, but only for joint likelihoods, and the function sometimes has trouble finding the ML solution | ||

+ | #* There is no way to get marginal ancestral character states for discrete characters in r other than interfacing with Mesquite | ||

− | + | ==To do== | |

− | * | + | # Implement "white noise" and Brownian motion with a trend models in geiger's fitContinuous |

− | * | + | # Investigate statistical properties of these methods |

− | ** | + | #* Which models can we tell apart? |

+ | #* How much data are needed? | ||

+ | #* Are parameter estimates biased? | ||

+ | #* Performance of different model selection criteria (LRT, information criteria) | ||

[[Category:R Hackathon 1]] | [[Category:R Hackathon 1]] |

## Latest revision as of 12:04, 28 January 2008

- Participants: Harmon, Hipp, Hunt

## Targets.

- Compare various implementations of the same methods (ape, geiger, OUCH, Mesquite)
- Improve functionality of character fitting in R
- Identify gaps in current implementation

## Accomplishments.

- Evaluated the results of continuous character analyses in different packages
- Packages are mostly consistent
- Discrepancies come from two sources:
- Different approaches (e.g. marginal versus joint likelihood)
- Difficulties in finding the ML solution

- For continuous characters:
- geiger and OUCH tend to return the same parameter estimates, but different likelihoods.
- the Brownian motion model is relatively easy to fit, with results that seem to be compatible across packages
- the Ornstein-Uhlenbeck model with one optimum is a more difficult model to fit. There is often a large, nearly flat ridge in the log-likelihood surface, and different optimization routines and settings may stop at different places along this ridge. This can yield rather different parameter estimates, although the log-likelihoods should not vary as much.

- geiger and OUCH tend to return the same parameter estimates, but different likelihoods.
- For discrete characters
- geiger and mesquite are consistent, returning the same parameter estimates and likelihoods
- geiger and ape are different
- ape is reporting the joint likelihoods for ancestral states. This uses the single set of ancestral states that together result in the highest likelihood on the whole tree.
- mesquite and geiger use marginal likelihoods for ancestral states. This represents the likelihood averaged over all possible ancestral character state values.
- This also means that you get different ancestral state reconstructions from ape and mesquite

- Improved functionality
- geiger was modified to give more reliable results by a more thorough search of the likelihood surface (fitContinuous)
- geiger can deal with a more general set of discrete character models (fitDiscrete)
- geiger's tree transformations now work for nonultrametric trees

- Identify gaps in current implementation
- The main gap, from an end-user perspective, is obtaining estimates of ancestral character states in R
- ape does this, but only for joint likelihoods, and the function sometimes has trouble finding the ML solution
- There is no way to get marginal ancestral character states for discrete characters in r other than interfacing with Mesquite

## To do

- Implement "white noise" and Brownian motion with a trend models in geiger's fitContinuous
- Investigate statistical properties of these methods
- Which models can we tell apart?
- How much data are needed?
- Are parameter estimates biased?
- Performance of different model selection criteria (LRT, information criteria)