Difference between revisions of "R Hackathon 1/End User Goals"

From Phyloinformatics
Jump to: navigation, search
Line 27: Line 27:
 
* Choosing/scaling branch lengths
 
* Choosing/scaling branch lengths
 
** Branch scaling algorithms (e.g. Grafen, ACDC, lambda)
 
** Branch scaling algorithms (e.g. Grafen, ACDC, lambda)
***ljh: You can do ACDC, delta, and kappa in GEIGER
 
 
** Incorporating/estimating divergence dates
 
** Incorporating/estimating divergence dates
 
* Ability to use large datasets and trees
 
* Ability to use large datasets and trees
Line 44: Line 43:
 
** CAIC BRUNCH algorithm
 
** CAIC BRUNCH algorithm
 
** Pagel's Discrete
 
** Pagel's Discrete
***ljh: I have code to do this as part of Geiger
 
 
** Multiple discrete and continuous traits in a single analysis
 
** Multiple discrete and continuous traits in a single analysis
 
** Non-linear trait distributions and analyses
 
** Non-linear trait distributions and analyses

Revision as of 21:08, 15 November 2007

Goals from a programmer perspective are on a separate page.

Goals from an end-user perspective

Common methodological/logistical challenges for end-users

  • Phylogenetic uncertainty
    • Multiple trees
    • Polytomies
      • Resolving and re-analyzing/averaging automatically
      • Explicitly analyzing
    • Incorporating bootstrap support or posterior probabilities for branches
  • Phylogeny format and structure
    • Reading and writing Newick, Nexus tree format
      • e.g. Currently can't read/write Nexus tree notes, Nexus data blocks
      • Reading and writing MrBayes, Mesquite, PAUP and other external package formats
    • Converting among tree and data formats used by different R packages
      • e.g. How do I evolve trees and traits in packages X and Y and analyze in package Z?
  • Tree manipulation
    • Include or exclude taxa based on the data available in your dataset
    • Grafting and pruning subtrees
  • Easy implementation of error/data checking methods
    • Ensure trait data are linked to the correct tip on the tree
    • Diagnostics for data checking
      • PIC diagnostics
      • Linearity in trait relations
  • Choosing/scaling branch lengths
    • Branch scaling algorithms (e.g. Grafen, ACDC, lambda)
    • Incorporating/estimating divergence dates
  • Ability to use large datasets and trees
    • Memory
    • Speed

Filling-in the gaps, methods that are not easily accessible in R but would be useful

  • Methods for visualising trees and plotting traits along them
  • Linking phylogenies to geographic maps
  • Implementation of methods for looking at co-evolution and co-speciation
    • Event-based methods, reconciliation methods
    • New methods for testing congruence between trees
  • Quantifying phylogenetic signal
  • Discrete traits
    • CAIC BRUNCH algorithm
    • Pagel's Discrete
    • Multiple discrete and continuous traits in a single analysis
    • Non-linear trait distributions and analyses

Improved documentation and an easy way to find out what methods are available

  • Summaries of methods available to answer different questions
    • Matrix listing all available functions
  • Improve or write documentation for existing functions that are not documented
  • Code vignettes
  • Common datasets that can be analyzed to illustrate different methods/approaches