R Hackathon 1/Programming Goals

From Phyloinformatics
Revision as of 15:32, 13 November 2007 by Bco (talk) (New functions)
Jump to: navigation, search

Creating a standard

There are many ways to represent trees in R, and not all of them can be converted from one to the other [see diagram of possible conversions below]. Treeformats.png

One major goal of the hackathon is establishing stable standards for trees, networks, data, and perhaps model descriptions and analysis settings. When we are done, we should be able to load a tree and character data from a NEXUS file (see Supporting_NEXUS_Documentation and NEXUS Specification) and then use functions from various packages without requiring conversion of the objects holding the tree(s) and data. New R packages for comparative methods will use this standard; existing packages will ideally be modified to meet this standard, but will hopefully be able to at least easily convert to and from this standard from and to their internal representations. If even this is not possible, we could create a package or modify an existing package to easily convert from and to all the currently-implemented tree and data representations for comparative methods. Desired features [subject to discussion, as is almost everything--Bco 15:31, 12 November 2007 (EST)] include rooting, branch lengths, tree weights, information about ancestral state reconstructions/assignments, and labels.

We have a test set of sample trees and datasets (RHackTrees.zip)which contain various formats for input (trees with and without branch lengths, rooting, labels, etc.). In order to better get an idea of the structure of trees and data within each package, we'd like to get the internal representation of the files from each package, as well as notes regarding the procedure for loading the files and what was lost (for example, is it clear which trees are rooted and unrooted?). See, for an example of a description of a format, Paradis' description of the phylo class (pdf) or the scheme for coding nucleotides (pdf) in APE.

We also have compiled all the help documents for functions related to comparative methods in R into a single searchable, sortable table, with all the functions categorized into major and minor types. This may help, for example, find all the relevant functions for plotting trees from various packages.


Interaction

R can call external programs and scripts (see more info here). Currently, those comparative methods and related approaches that are not present in R are generally implemented in standalone programs (commonly written in C or C++; see Felsenstein's list of programs using/making phylogenies) or as modules for the Java program Mesquite. Developing the ability for users to run such analyses from within R will reduce the amount of re-coding of particular methods required.

Particular goals:

   * Mesquite calling R modules.
   * R calling Mesquite modules.
   * R calling existing software (PAUP? MrBayes? others?)

New functions

See end user goals for a discussion of functionality needing to be added.