Phylohackathon 1/Documentation

From Phyloinformatics
Jump to: navigation, search


This page outlines the documentation that will arise from the Phyloinformatics Hackathon. This documentation is focused on our critical or high-priority use cases and efforts related to these use cases. Most of these use cases are concerning with performing some sets of analyses with the aim of obtaining some set of results, thus the documentation can be viewed as instructions or HOWTOs. A set of instructions is typically highly specific, suggesting that the user will use given applications and execute them in a specified ways. However it's quite clear that in the phylogenetics field one always has a variety of applications and approaches to choose from.

Additional efforts are being made in order to enable the researcher to integrate the best applications into their customized workflows. A relevant example is wrapper code being developed by the BioPerl, BioPython, and HyPhy groups that will allow the user to execute HyPhy using Perl or Python and parse the results. Other wrappers, such as code for PAUP and T_Coffee, are being developed by the BioRuby group.

A significant amount of attention is also being paid to standards in phyloinformatics, existing or emerging. The NEXUS format is used by a number of different phylogenetics applications but has been independently altered by those same applications in order to accomodate new types of data and new methods in the field. The resultant NEXUS format variation is causing difficulties for both users and developers. A NEXUS validator is being developed, along with formal definitions of compliance, that will alert users and developers to issues in the NEXUS files that they use. The PhyloModeler group is concentrating on the development of a new standard, one that defines phyloinformatic models. The most typical use of this proposed standard would be as an XML format, allowing data interchange. .

Summary of outputs

The Lapp et al (2007) paper describing the hackathon summarizes in Table 1 the list of targets and outcomes achieved for each:

Table 1 from Lapp et al (2007), Evolutionary Bioinformatics Online 3: 287–296

The following table was used to collect input for the hackathon manuscript. Except for a few URLs, it is redundant with, or superseded by Table 1 in the manuscript.

Synopsis of outputs, First NESCent Phyloinformatics Hackathon (December, 2006)
Use case Brief description of output Link or contact
Sequence families Designed, implemented, and tested two new Bio::SimpleAlign (BioPerl) methods, used to protect sequence names in workflows that use PHYLIP OTU naming issue.
Implemented elements of the Character-Data-and-Tree (CDAT) model in BioPerl (Bio::AlignI as Bio::AnnotatableI; new Bio::Annotation::TreeI to associate tree with aligned data) In BioPerl
Reconcile trees Implemented a Python script wrapper for the Java application softparsmap. The wrapper allows the user to input arbitrary trees instead of relying on the NCBI taxonomy. The script is now being used by the authors of softparsmap to implement a web interface. Media:Reconcile.txt
Phylogenetic footprinting Implemented a naive footprinting method in BioPerl's ClustalW wrapper, as well as a new wrapper around PhastCons and related programs.
NEXUS compliance Identified and (superficially) evaluated major APIs (NCL, Bio::NEXUS, Mesquite) NEXUS (NESCent)
identified common errors in files, gathered test files, and proposed scheme for levels of compliance
created parser for BioJava which is capable of parsing the majority of NEXUS blocks, and will provided remaining ones in unparsed String blocks to the user. Unparseable blocks remain unedited in output files unless explicitly removed.
created parser for BioRuby.
Tree modelling created interfaces for tree modelling in BioJava. NOTE: these have since been removed as it was possible to reuse the open source toolkit JGraphT to perform their equivalent function.
Phylip parser created parser for Phylip in BioJava and BioRuby.
Nescent PhyloSOC BioJava gained a PhyloSOC student as a direct result of participating in the hackathon, who has since worked on improving the parsers written at the hackathon (including adopting JGraphT and making the NEXUS trees block parser work with it), and writing some useful tree manipulation tools such as multiple hit correction, etc.

Critical Use Cases

  1. Family alignment: identify homologues, generate family alignment, evaluate models (3.2, 3.18)
  2. Reconcile trees (3.19) and Determine concordance between two or more phylogenies (3.15)
  3. Phylogenetic footprinting and shadowing (3.10)
  4. Morphological characters: infer tree (3.12) and calculate support values (3.13)
  5. Estimate divergence times (3.17)
  6. NEXUS: lack of support for (and adherence to) standards

Development Targets

The high-priority use cases and the corresponding development targets for each subgroup (toolkit) are summarized on the Targets page.