R Hackathon 1/End User Goals

From Phyloinformatics
Revision as of 22:19, 4 December 2007 by Dlr32 (talk) (Goals from an end-user perspective)
Jump to: navigation, search

Goals from a programmer perspective are on a separate page.

Goals from an end-user perspective

Common methodological/logistical challenges for end-users

  • Phylogenetic uncertainty
    • Multiple trees
    • Polytomies
      • Resolving and re-analyzing/averaging automatically
      • Explicitly analyzing
    • Incorporating bootstrap support or posterior probabilities for branches
    • Model averaging over sets of trees
  • Phylogeny format and structure
    • Reading and writing Newick, Nexus tree format
      • e.g. Currently can't read/write Nexus tree notes, Nexus data blocks
      • Major problems reading large files with lots of trees in R. Coding this in C would help dramatically.
      • Reading and writing MrBayes, Mesquite, PAUP and other external package formats
    • Converting among tree and data formats used by different R packages
      • e.g. How do I evolve trees and traits in packages X and Y and analyze in package Z?
  • Tree manipulation
    • Include or exclude taxa based on the data available in your dataset
      • I have a way to do this in the new version of GEIGER, but it's not elegant--Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Grafting and pruning subtrees (while maintaining branch lengths)
    • Rooting trees
  • Tree simulation methods (+- accompanying trait evolution)
    • Needed for methods testing and hypothesis testing.--DavidOrme 05:40, 22 November 2007 (EST)
      • Need for speed and ability to simulate large phylogenetic trees in R - I think this should be coded in C and called from R.
  • Exploration of trait data
    • Testing for phylogenetic signal/choosing the appropriate transformation if needed
    • Conducting analyses with traits exhibiting different levels of phylogenetic signal
  • Easy implementation of error/data checking methods
    • Ensure trait data are linked to the correct tip on the tree
    • Diagnostics for data checking
      • PIC diagnostics
      • Linearity in trait relations
  • Choosing/scaling branch lengths
    • Branch scaling algorithms (e.g. Grafen, ACDC, lambda)
      • You can do ACDC ("exponentialchange.tree") and some other things like that in GEIGER, others are possible --Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Incorporating/estimating divergence dates
  • Ability to use large datasets and trees
    • Memory
    • Speed

Filling-in the gaps, methods that are not easily accessible in R but would be useful

  • Methods for visualising trees and plotting traits along them
  • Linking phylogenies to geographic maps
  • Implementation of methods for looking at co-evolution and co-speciation
    • Event-based methods, reconciliation methods
    • New methods for testing congruence between trees
    • Related: Gene tree - species tree analyses (or anything with a contained and a containing tree)
  • Quantifying phylogenetic signal
  • Discrete traits
    • CAIC BRUNCH algorithm
      • This is included in the CAIC package I am just building that reimplements CAIC in R --DavidOrme 04:58, 22 November 2007 (EST)
    • Pagel's Discrete
      • This is a part of the new GEIGER --Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Multiple discrete and continuous traits in a single analysis
    • Non-linear trait distributions and analyses
  • Stochastic character mapping
    • Ree's (2005) key innovation test
  • Correlating trait evolution with speciation and extinction rates
  • Extending the set of character models that can be fit using likelihood
    • For example, real population genetic models e.g. Estes and Arnold
  • Model averaging across different models of character evolution
  • Better methods for reconstructing ancestral character states and plotting them on trees
  • MacroCAIC - type analyses
    • The CAIC package I'm putting together includesthis --DavidOrme 04:58, 22 November 2007 (EST)
  • Simulating trees and characters for other models beyond those available now
  • Fitting Felsenstein's threshold models to comparative data
  • Ree's biogeography method using likelihood
  • Interface or implementation of penalized likelihood from r8s
  • Creating input files with constraints that are acceptable for BEAST and MrBayes
  • Supertree or other tree-combining methods
  • Topology-based tests of diversification ala Moore et al.

Improved documentation and an easy way to find out what methods are available

  • Summaries of methods available to answer different questions
    • Matrix listing all available functions
    • My Treetapper.org NESCent project will include the ability to find which R package(s) or other software program/package implements a particular method (i.e., a user tells the website she has two continuous chars to correlate, is presented with a list of methods for this, chooses independent contrasts, and then gets a list of all packages and software programs implementing this (and can limit by platform, implementation type, etc.)). It won't list the specific function, but hopefully people can at least look this up. The site won't be really up for another month or so, and won't be really useful until I can add more data to the database. --Bco 13:47, 30 November 2007 (EST)
  • Improve or write documentation for existing functions that are not documented
  • Code Vignettes
  • Common datasets that can be analyzed to illustrate different methods/approaches

To discuss documentation standards, vignettes etc.