Difference between revisions of "R Hackathon 1/End User Goals"

From Phyloinformatics
Jump to: navigation, search
(Evoldir poll)
 
(26 intermediate revisions by 7 users not shown)
Line 11: Line 11:
 
***Explicitly analyzing
 
***Explicitly analyzing
 
** Incorporating bootstrap support or posterior probabilities for branches
 
** Incorporating bootstrap support or posterior probabilities for branches
 +
** Model averaging over sets of trees
 
* Phylogeny format and structure
 
* Phylogeny format and structure
 
** Reading and writing Newick, Nexus tree format
 
** Reading and writing Newick, Nexus tree format
 
***e.g. Currently can't read/write Nexus tree notes, Nexus data blocks
 
***e.g. Currently can't read/write Nexus tree notes, Nexus data blocks
 +
*** Major problems reading large files with lots of trees in R. Coding this in C would help dramatically.
 
***Reading and writing MrBayes, Mesquite, PAUP and other external package formats
 
***Reading and writing MrBayes, Mesquite, PAUP and other external package formats
 
** Converting among tree and data formats used by different R packages
 
** Converting among tree and data formats used by different R packages
Line 19: Line 21:
 
* Tree manipulation
 
* Tree manipulation
 
** Include or exclude taxa based on the data available in your dataset
 
** Include or exclude taxa based on the data available in your dataset
** Grafting and pruning subtrees
+
***''I have a way to do this in the new version of GEIGER, but it's not elegant--[[User:Lukeh@uidaho.edu|Lukeh@uidaho.edu]] 20:11, 15 November 2007 (EST)
 +
** Grafting and pruning subtrees (while maintaining branch lengths)
 +
** Rooting trees
 +
* Tree simulation methods (+- accompanying trait evolution)
 +
** Needed for methods testing and hypothesis testing.--[[User:DavidOrme|DavidOrme]] 05:40, 22 November 2007 (EST)
 +
*** Need for speed and ability to simulate large phylogenetic trees in R - I think this should be coded in C and called from R.
 +
* Exploration of trait data
 +
** Testing for phylogenetic signal/choosing the appropriate transformation if needed
 +
** Conducting analyses with traits exhibiting different levels of phylogenetic signal
 
* Easy implementation of error/data checking methods
 
* Easy implementation of error/data checking methods
 
** Ensure trait data are linked to the correct tip on the tree
 
** Ensure trait data are linked to the correct tip on the tree
Line 27: Line 37:
 
* Choosing/scaling branch lengths
 
* Choosing/scaling branch lengths
 
** Branch scaling algorithms (e.g. Grafen, ACDC, lambda)
 
** Branch scaling algorithms (e.g. Grafen, ACDC, lambda)
 +
***''You can do ACDC ("exponentialchange.tree") and some other things like that in GEIGER, others are possible --[[User:Lukeh@uidaho.edu|Lukeh@uidaho.edu]] 20:11, 15 November 2007 (EST)
 
** Incorporating/estimating divergence dates
 
** Incorporating/estimating divergence dates
 
* Ability to use large datasets and trees
 
* Ability to use large datasets and trees
Line 39: Line 50:
 
**Event-based methods, reconciliation methods
 
**Event-based methods, reconciliation methods
 
**New methods for testing congruence between trees
 
**New methods for testing congruence between trees
 +
**Related: Gene tree - species tree analyses (or anything with a contained and a containing tree)
 
* Quantifying phylogenetic signal
 
* Quantifying phylogenetic signal
 
* Discrete traits
 
* Discrete traits
 
** CAIC BRUNCH algorithm
 
** CAIC BRUNCH algorithm
 +
*** This is included in the CAIC package I am just building that reimplements CAIC in R --[[User:DavidOrme|DavidOrme]] 04:58, 22 November 2007 (EST)
 
** Pagel's Discrete
 
** Pagel's Discrete
 +
***''This is a part of the new GEIGER --[[User:Lukeh@uidaho.edu|Lukeh@uidaho.edu]] 20:11, 15 November 2007 (EST)
 
** Multiple discrete and continuous traits in a single analysis
 
** Multiple discrete and continuous traits in a single analysis
 
** Non-linear trait distributions and analyses
 
** Non-linear trait distributions and analyses
 +
* Stochastic character mapping
 +
**Ree's (2005) key innovation test
 +
* Correlating trait evolution with speciation and extinction rates
 +
* Extending the set of character models that can be fit using likelihood
 +
** For example, real population genetic models e.g. Estes and Arnold
 +
* Model averaging across different models of character evolution
 +
* Better methods for reconstructing ancestral character states and plotting them on trees
 +
* MacroCAIC - type analyses
 +
** The CAIC package I'm putting together includesthis --[[User:DavidOrme|DavidOrme]] 04:58, 22 November 2007 (EST)
 +
* Simulating trees and characters for other models beyond those available now
 +
* Fitting Felsenstein's threshold models to comparative data
 +
* Ree's biogeography method using likelihood
 +
* Interface or implementation of penalized likelihood from r8s
 +
* Creating input files with constraints that are acceptable for BEAST and MrBayes
 +
* Supertree or other tree-combining methods
 +
* Topology-based tests of diversification ala Moore et al.
  
 
=== Improved documentation and an easy way to find out what methods are available ===
 
=== Improved documentation and an easy way to find out what methods are available ===
  
 
* Summaries of methods available to answer different questions
 
* Summaries of methods available to answer different questions
** Matrix listing all available functions
+
** [http://www.brianomeara.info/allpackages3.html Matrix] listing all available functions
 +
** ''My Treetapper.org NESCent project will include the ability to find which R package(s) or other software program/package implements a particular method (i.e., a user tells the website she has two continuous chars to correlate, is presented with a list of methods for this, chooses independent contrasts, and then gets a list of all packages and software programs implementing this (and can limit by platform, implementation type, etc.)). It won't list the specific function, but hopefully people can at least look this up. The site won't be really up for another month or so, and won't be really useful until I can add more data to the database. --[[User:Bco|Bco]] 13:47, 30 November 2007 (EST)''
 
* Improve or write documentation for existing functions that are not documented
 
* Improve or write documentation for existing functions that are not documented
* Code vignettes
+
* Code Vignettes
 
* Common datasets that can be analyzed to illustrate different methods/approaches
 
* Common datasets that can be analyzed to illustrate different methods/approaches
 +
 +
[[R_Hackathon_1_documentation|To discuss documentation standards, vignettes etc.]]
 +
 +
=== Evoldir poll ===
 +
We sent EvolDir a link to a [http://snipurl.com/rhack poll] on what users want to see added. Here are the results so far (after one day). We had 53 poll responses and two emailed responses; one was a request for a polymorphism hackathon to implement methods for dealing with polymorphic data, the other appears after the poll. The poll was not intended to mandate which methods are worked on at the hackathon, just provide some additional information about community interest. Each user was asked to select three items.
 +
 +
<table width="100%">
 +
  <tr><td></td><td></td><td></td></tr>
 +
  <tr><td></td>
 +
    <td>
 +
      <div align="center">
 +
        <table width="500" border="0" cellspacing="0" cellpadding="3" bgcolor="#FFFFFF">
 +
          <tr>
 +
            <td class="poll" bgcolor="#FFFFFF" colspan=6> </td>
 +
 +
          </tr>
 +
          <tr>
 +
            <td class="poll" colspan=6> </td>
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" colspan=6><b> &nbsp;Please select three options for inclusion in R: </b></td>
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
 +
            <td class="poll" >Improved documentation and an easy way to find out what methods are available&nbsp;</td>
 +
            <td class="poll" align="right">22&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >1&nbsp;</td>
 +
            <td class="poll"> &nbsp; 8% </td>
 +
          </tr>
 +
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Linking phylogenies to geographic maps&nbsp;</td>
 +
            <td class="poll" align="right">21&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >2&nbsp;</td>
 +
            <td class="poll"> &nbsp; 8% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Incorporating bootstrap support or posterior probabilities for branches&nbsp;</td>
 +
            <td class="poll" align="right">17&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >3&nbsp;</td>
 +
            <td class="poll"> &nbsp; 6% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Multiple discrete and continuous traits in a single analysis&nbsp;</td>
 +
            <td class="poll" align="right">14&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >4&nbsp;</td>
 +
            <td class="poll"> &nbsp; 5% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Converting among tree and data formats used by different R packages&nbsp;</td>
 +
            <td class="poll" align="right">13&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >5&nbsp;</td>
 +
            <td class="poll"> &nbsp; 5% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Dealing with multiple trees&nbsp;</td>
 +
            <td class="poll" align="right">12&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >6&nbsp;</td>
 +
            <td class="poll"> &nbsp; 4% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Gene tree - species tree analyses&nbsp;</td>
 +
            <td class="poll" align="right">11&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >7&nbsp;</td>
 +
            <td class="poll"> &nbsp; 4% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Dealing properly with polytomies&nbsp;</td>
 +
            <td class="poll" align="right">10&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >8&nbsp;</td>
 +
            <td class="poll"> &nbsp; 4% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Reading and writing Newick, Nexus tree format&nbsp;</td>
 +
            <td class="poll" align="right">10&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >8&nbsp;</td>
 +
            <td class="poll"> &nbsp; 4% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Methods for visualising trees and plotting traits along them&nbsp;</td>
 +
            <td class="poll" align="right">10&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >8&nbsp;</td>
 +
            <td class="poll"> &nbsp; 4% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Creating input files with constraints that are acceptable for BEAST and MrBayes&nbsp;</td>
 +
            <td class="poll" align="right">10&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >8&nbsp;</td>
 +
            <td class="poll"> &nbsp; 4% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Ability to use large datasets and trees&nbsp;</td>
 +
            <td class="poll" align="right">9&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >12&nbsp;</td>
 +
            <td class="poll"> &nbsp; 3% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Tree simulation methods (+/- accompanying trait evolution)&nbsp;</td>
 +
            <td class="poll" align="right">8&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >13&nbsp;</td>
 +
            <td class="poll"> &nbsp; 3% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Testing for phylogenetic signal/choosing the appropriate transformation if needed&nbsp;</td>
 +
            <td class="poll" align="right">7&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >14&nbsp;</td>
 +
            <td class="poll"> &nbsp; 3% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >New methods for testing congruence between trees&nbsp;</td>
 +
            <td class="poll" align="right">7&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >14&nbsp;</td>
 +
            <td class="poll"> &nbsp; 3% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Extending the set of character models that can be fit using likelihood&nbsp;</td>
 +
            <td class="poll" align="right">7&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >14&nbsp;</td>
 +
            <td class="poll"> &nbsp; 3% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Interface or implementation of penalized likelihood from r8s&nbsp;</td>
 +
            <td class="poll" align="right">7&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >14&nbsp;</td>
 +
            <td class="poll"> &nbsp; 3% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Model averaging over sets of trees &nbsp;</td>
 +
            <td class="poll" align="right">6&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >18&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Incorporating/estimating divergence dates &nbsp;</td>
 +
            <td class="poll" align="right">6&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >18&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Model averaging across different models of character evolution&nbsp;</td>
 +
            <td class="poll" align="right">6&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >18&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Include or exclude taxa based on the data available in your dataset&nbsp;</td>
 +
            <td class="poll" align="right">5&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >21&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Diagnostics for data checking&nbsp;</td>
 +
            <td class="poll" align="right">5&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >21&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Stochastic character mapping&nbsp;</td>
 +
            <td class="poll" align="right">5&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >21&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Correlating trait evolution with speciation and extinction rates&nbsp;</td>
 +
            <td class="poll" align="right">5&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >21&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Fitting Felsenstein's threshold models to comparative data&nbsp;</td>
 +
            <td class="poll" align="right">5&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >21&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Ree's biogeography method using likelihood&nbsp;</td>
 +
            <td class="poll" align="right">5&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >21&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Supertree or other tree-combining methods&nbsp;</td>
 +
            <td class="poll" align="right">5&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >21&nbsp;</td>
 +
            <td class="poll"> &nbsp; 2% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Event-based methods for looking at co-evolution, reconciliation methods&nbsp;</td>
 +
            <td class="poll" align="right">4&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >28&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Non-linear trait distributions and analyses&nbsp;</td>
 +
            <td class="poll" align="right">4&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >28&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Better methods for reconstructing & plotting ancestral character states&nbsp;</td>
 +
            <td class="poll" align="right">4&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >28&nbsp;</td>
 +
 +
            <td class="poll"> &nbsp; 1% </td>
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Simulating trees and characters for other models beyond those available now&nbsp;</td>
 +
            <td class="poll" align="right">3&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
 +
            <td class="poll" >31&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Topology-based tests of diversification ala Moore et al. &nbsp;</td>
 +
            <td class="poll" align="right">3&nbsp;</td>
 +
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >31&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >localizing diversification rate&nbsp;</td>
 +
 +
            <td class="poll" align="right">2&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >33&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
 +
            <td class="poll" >population genetics and quantita&nbsp;</td>
 +
            <td class="poll" align="right">2&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >33&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
          </tr>
 +
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Grafting and pruning subtrees (while maintaining branch lengths)&nbsp;</td>
 +
            <td class="poll" align="right">2&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >33&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Conducting analyses with traits exhibiting different levels of phylogenetic signal &nbsp;</td>
 +
            <td class="poll" align="right">2&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >33&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Ensure trait data are linked to the correct tip on the tree&nbsp;</td>
 +
            <td class="poll" align="right">2&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >33&nbsp;</td>
 +
            <td class="poll"> &nbsp; 1% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Multivariate Phylogenetic Mixed&nbsp;</td>
 +
            <td class="poll" align="right">1&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >38&nbsp;</td>
 +
            <td class="poll"> &nbsp; 0% </td>
 +
 +
          </tr>
 +
         
 +
          <tr>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >Branch scaling algorithms (e.g. Grafen, ACDC, lambda)&nbsp;</td>
 +
            <td class="poll" align="right">1&nbsp;</td>
 +
            <td class="poll" >&nbsp;</td>
 +
            <td class="poll" >38&nbsp;</td>
 +
            <td class="poll"> &nbsp; 0% </td>
 +
 +
          </tr>
 +
           
 +
          <tr>
 +
            <td class="poll" colspan=6> </td>
 +
          </tr>
 +
        </table>
 +
 +
Free response:
 +
*Since R does GLMs so well (though isn't so easy to to ANOVAs with molecular or character data -- esp. w/ phylogenetic data), these types of methods are really appropriate (PGLS, PMM). My one big frustration was not having PMM ''[http://dx.doi.org/10.1086/380570 Phylogenetic mixed models -- bco]'' well implemented. Paradis et al. have put it in, but it is only univariate and doesn't properly analyze all types of tree length inputs. I think someone else may also have implemented it, but as Felsenstein's lambda (maybe Garland?). Anyway, R would be a great platform for implementing PMM fully, particularly with multivariate data (see the appendix of Housworth et al. 2002 for the math on this). That would make R capable of doing all the most common comparative trait methods (independent contrasts in PDAP, PGLS in APE, and PMM in ?).
 +
  
 
[[Category:R Hackathon 1]]
 
[[Category:R Hackathon 1]]

Latest revision as of 09:19, 10 December 2007

Goals from a programmer perspective are on a separate page.

Goals from an end-user perspective

Common methodological/logistical challenges for end-users

  • Phylogenetic uncertainty
    • Multiple trees
    • Polytomies
      • Resolving and re-analyzing/averaging automatically
      • Explicitly analyzing
    • Incorporating bootstrap support or posterior probabilities for branches
    • Model averaging over sets of trees
  • Phylogeny format and structure
    • Reading and writing Newick, Nexus tree format
      • e.g. Currently can't read/write Nexus tree notes, Nexus data blocks
      • Major problems reading large files with lots of trees in R. Coding this in C would help dramatically.
      • Reading and writing MrBayes, Mesquite, PAUP and other external package formats
    • Converting among tree and data formats used by different R packages
      • e.g. How do I evolve trees and traits in packages X and Y and analyze in package Z?
  • Tree manipulation
    • Include or exclude taxa based on the data available in your dataset
      • I have a way to do this in the new version of GEIGER, but it's not elegant--Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Grafting and pruning subtrees (while maintaining branch lengths)
    • Rooting trees
  • Tree simulation methods (+- accompanying trait evolution)
    • Needed for methods testing and hypothesis testing.--DavidOrme 05:40, 22 November 2007 (EST)
      • Need for speed and ability to simulate large phylogenetic trees in R - I think this should be coded in C and called from R.
  • Exploration of trait data
    • Testing for phylogenetic signal/choosing the appropriate transformation if needed
    • Conducting analyses with traits exhibiting different levels of phylogenetic signal
  • Easy implementation of error/data checking methods
    • Ensure trait data are linked to the correct tip on the tree
    • Diagnostics for data checking
      • PIC diagnostics
      • Linearity in trait relations
  • Choosing/scaling branch lengths
    • Branch scaling algorithms (e.g. Grafen, ACDC, lambda)
      • You can do ACDC ("exponentialchange.tree") and some other things like that in GEIGER, others are possible --Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Incorporating/estimating divergence dates
  • Ability to use large datasets and trees
    • Memory
    • Speed

Filling-in the gaps, methods that are not easily accessible in R but would be useful

  • Methods for visualising trees and plotting traits along them
  • Linking phylogenies to geographic maps
  • Implementation of methods for looking at co-evolution and co-speciation
    • Event-based methods, reconciliation methods
    • New methods for testing congruence between trees
    • Related: Gene tree - species tree analyses (or anything with a contained and a containing tree)
  • Quantifying phylogenetic signal
  • Discrete traits
    • CAIC BRUNCH algorithm
      • This is included in the CAIC package I am just building that reimplements CAIC in R --DavidOrme 04:58, 22 November 2007 (EST)
    • Pagel's Discrete
      • This is a part of the new GEIGER --Lukeh@uidaho.edu 20:11, 15 November 2007 (EST)
    • Multiple discrete and continuous traits in a single analysis
    • Non-linear trait distributions and analyses
  • Stochastic character mapping
    • Ree's (2005) key innovation test
  • Correlating trait evolution with speciation and extinction rates
  • Extending the set of character models that can be fit using likelihood
    • For example, real population genetic models e.g. Estes and Arnold
  • Model averaging across different models of character evolution
  • Better methods for reconstructing ancestral character states and plotting them on trees
  • MacroCAIC - type analyses
    • The CAIC package I'm putting together includesthis --DavidOrme 04:58, 22 November 2007 (EST)
  • Simulating trees and characters for other models beyond those available now
  • Fitting Felsenstein's threshold models to comparative data
  • Ree's biogeography method using likelihood
  • Interface or implementation of penalized likelihood from r8s
  • Creating input files with constraints that are acceptable for BEAST and MrBayes
  • Supertree or other tree-combining methods
  • Topology-based tests of diversification ala Moore et al.

Improved documentation and an easy way to find out what methods are available

  • Summaries of methods available to answer different questions
    • Matrix listing all available functions
    • My Treetapper.org NESCent project will include the ability to find which R package(s) or other software program/package implements a particular method (i.e., a user tells the website she has two continuous chars to correlate, is presented with a list of methods for this, chooses independent contrasts, and then gets a list of all packages and software programs implementing this (and can limit by platform, implementation type, etc.)). It won't list the specific function, but hopefully people can at least look this up. The site won't be really up for another month or so, and won't be really useful until I can add more data to the database. --Bco 13:47, 30 November 2007 (EST)
  • Improve or write documentation for existing functions that are not documented
  • Code Vignettes
  • Common datasets that can be analyzed to illustrate different methods/approaches

To discuss documentation standards, vignettes etc.

Evoldir poll

We sent EvolDir a link to a poll on what users want to see added. Here are the results so far (after one day). We had 53 poll responses and two emailed responses; one was a request for a polymorphism hackathon to implement methods for dealing with polymorphic data, the other appears after the poll. The poll was not intended to mandate which methods are worked on at the hackathon, just provide some additional information about community interest. Each user was asked to select three items.

 Please select three options for inclusion in R:
  Improved documentation and an easy way to find out what methods are available  22      8%
  Linking phylogenies to geographic maps  21      8%
  Incorporating bootstrap support or posterior probabilities for branches  17      6%
  Multiple discrete and continuous traits in a single analysis  14      5%
  Converting among tree and data formats used by different R packages  13      5%
  Dealing with multiple trees  12      4%
  Gene tree - species tree analyses  11      4%
  Dealing properly with polytomies  10      4%
  Reading and writing Newick, Nexus tree format  10      4%
  Methods for visualising trees and plotting traits along them  10      4%
  Creating input files with constraints that are acceptable for BEAST and MrBayes  10      4%
  Ability to use large datasets and trees    12    3%
  Tree simulation methods (+/- accompanying trait evolution)    13    3%
  Testing for phylogenetic signal/choosing the appropriate transformation if needed    14    3%
  New methods for testing congruence between trees    14    3%
  Extending the set of character models that can be fit using likelihood    14    3%
  Interface or implementation of penalized likelihood from r8s    14    3%
  Model averaging over sets of trees     18    2%
  Incorporating/estimating divergence dates     18    2%
  Model averaging across different models of character evolution    18    2%
  Include or exclude taxa based on the data available in your dataset    21    2%
  Diagnostics for data checking    21    2%
  Stochastic character mapping    21    2%
  Correlating trait evolution with speciation and extinction rates    21    2%
  Fitting Felsenstein's threshold models to comparative data    21    2%
  Ree's biogeography method using likelihood    21    2%
  Supertree or other tree-combining methods    21    2%
  Event-based methods for looking at co-evolution, reconciliation methods    28    1%
  Non-linear trait distributions and analyses    28    1%
  Better methods for reconstructing & plotting ancestral character states    28    1%
  Simulating trees and characters for other models beyond those available now    31    1%
  Topology-based tests of diversification ala Moore et al.     31    1%
  localizing diversification rate    33    1%
  population genetics and quantita    33    1%
  Grafting and pruning subtrees (while maintaining branch lengths)    33    1%
  Conducting analyses with traits exhibiting different levels of phylogenetic signal     33    1%
  Ensure trait data are linked to the correct tip on the tree    33    1%
  Multivariate Phylogenetic Mixed    38    0%
  Branch scaling algorithms (e.g. Grafen, ACDC, lambda)    38    0%

Free response:

  • Since R does GLMs so well (though isn't so easy to to ANOVAs with molecular or character data -- esp. w/ phylogenetic data), these types of methods are really appropriate (PGLS, PMM). My one big frustration was not having PMM Phylogenetic mixed models -- bco well implemented. Paradis et al. have put it in, but it is only univariate and doesn't properly analyze all types of tree length inputs. I think someone else may also have implemented it, but as Felsenstein's lambda (maybe Garland?). Anyway, R would be a great platform for implementing PMM fully, particularly with multivariate data (see the appendix of Housworth et al. 2002 for the math on this). That would make R capable of doing all the most common comparative trait methods (independent contrasts in PDAP, PGLS in APE, and PMM in ?).