R Hackathon 1/End User Goals

From Phyloinformatics
Revision as of 20:56, 18 November 2007 by Sds21 (talk) (Filling-in the gaps, methods that are not easily accessible in R but would be useful)
Jump to: navigation, search

Goals from a programmer perspective are on a separate page.

Goals from an end-user perspective

Common methodological/logistical challenges for end-users

  • Phylogenetic uncertainty
    • Multiple trees
    • Polytomies
      • Resolving and re-analyzing/averaging automatically
      • Explicitly analyzing
    • Incorporating bootstrap support or posterior probabilities for branches
    • Model averaging over sets of trees
  • Phylogeny format and structure
    • Reading and writing Newick, Nexus tree format
      • e.g. Currently can't read/write Nexus tree notes, Nexus data blocks
      • Reading and writing MrBayes, Mesquite, PAUP and other external package formats
    • Converting among tree and data formats used by different R packages
      • e.g. How do I evolve trees and traits in packages X and Y and analyze in package Z?
  • Tree manipulation
    • Include or exclude taxa based on the data available in your dataset
      • I have a way to do this in the new version of GEIGER, but it's not elegant--Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Grafting and pruning subtrees (while maintaining branch lengths)
    • Rooting trees
  • Exploration of trait data
    • Testing for phylogenetic signal/choosing the appropriate transformation if needed
    • Conducting analyses with traits exhibiting different levels of phylogenetic signal
  • Easy implementation of error/data checking methods
    • Ensure trait data are linked to the correct tip on the tree
    • Diagnostics for data checking
      • PIC diagnostics
      • Linearity in trait relations
  • Choosing/scaling branch lengths
    • Branch scaling algorithms (e.g. Grafen, ACDC, lambda)
      • You can do ACDC ("exponentialchange.tree") and some other things like that in GEIGER, others are possible --Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Incorporating/estimating divergence dates
  • Ability to use large datasets and trees
    • Memory
    • Speed

Filling-in the gaps, methods that are not easily accessible in R but would be useful

  • Methods for visualising trees and plotting traits along them
  • Linking phylogenies to geographic maps
  • Implementation of methods for looking at co-evolution and co-speciation
    • Event-based methods, reconciliation methods
    • New methods for testing congruence between trees
  • Quantifying phylogenetic signal
  • Discrete traits
    • CAIC BRUNCH algorithm
    • Pagel's Discrete
      • This is a part of the new GEIGER --Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Multiple discrete and continuous traits in a single analysis
    • Non-linear trait distributions and analyses
  • Stochastic character mapping
    • Ree's (2005) key innovation test
  • Correlating trait evolution with speciation and extinction rates
  • Extending the set of character models that can be fit using likelihood
    • For example, real population genetic models e.g. Estes and Arnold
  • Model averaging across different models of character evolution
  • Better methods for reconstructing ancestral character states and plotting them on trees
  • MacroCAIC - type analyses
  • Simulating trees and characters for other models beyond those available now
  • Fitting Felsenstein's threshold models to comparative data
  • Ree's biogeography method using likelihood
  • Interface or implementation of penalized likelihood from r8s
  • Creating input files with constraints that are acceptable for BEAST and MrBayes
  • Supertree or other tree-combining methods
  • Topology-based tests of diversification ala Moore et al.

Improved documentation and an easy way to find out what methods are available

  • Summaries of methods available to answer different questions
    • Matrix listing all available functions
  • Improve or write documentation for existing functions that are not documented
  • Code Vignettes
  • Common datasets that can be analyzed to illustrate different methods/approaches

To discuss documentation standards, vignettes etc.