Phylohackathon 1/Documentation

From NESCent Informatics Wiki
Revision as of 11:33, 5 September 2007 by Hlapp (Talk | contribs)

Jump to: navigation, search

Contents

Introduction

This page outlines the documentation that will arise from the Phyloinformatics Hackathon. This documentation is focused on our critical or high-priority use cases and efforts related to these use cases. Most of these use cases are concerning with performing some sets of analyses with the aim of obtaining some set of results, thus the documentation can be viewed as instructions or HOWTOs. A set of instructions is typically highly specific, suggesting that the user will use given applications and execute them in a specified ways. However it's quite clear that in the phylogenetics field one always has a variety of applications and approaches to choose from.

Additional efforts are being made in order to enable the researcher to integrate the best applications into their customized workflows. A relevant example is wrapper code being developed by the BioPerl, BioPython, and HyPhy groups that will allow the user to execute HyPhy using Perl or Python and parse the results. Other wrappers, such as code for PAUP and T_Coffee, are being developed by the BioRuby group.

A significant amount of attention is also being paid to standards in phyloinformatics, existing or emerging. The NEXUS format is used by a number of different phylogenetics applications but has been independently altered by those same applications in order to accomodate new types of data and new methods in the field. The resultant NEXUS format variation is causing difficulties for both users and developers. A NEXUS validator is being developed, along with formal definitions of compliance, that will alert users and developers to issues in the NEXUS files that they use. The PhyloModeler group is concentrating on the development of a new standard, one that defines phyloinformatic models. The most typical use of this proposed standard would be as an XML format, allowing data interchange.

Summary of outputs

Lets keep this table small enough to put in the hackathon manuscript. So, keep your descriptions brief! Use 2 separate entries if your work generated two distinct outputs that need to be described separately.

Synopsis of outputs, First NESCent Phyloinformatics Hackathon (December, 2006)
Use case Brief description of output Link or contact
Sequence families Designed, implemented, and tested two new Bio::SimpleAlign (BioPerl) methods, used to protect sequence names in workflows that use PHYLIP OTU naming issue.
Sequence families Implemented elements of the Character-Data-and-Tree (CDAT) model in BioPerl (Bio::AlignI as Bio::AnnotatableI; new Bio::Annotation::TreeI to associate tree with aligned data) is this in bioperl live?
Reconcile trees Implemented a Python script wrapper for the Java application softparsmap. The wrapper allows the user to input arbitrary trees instead of relying on the NCBI taxonomy. The script is now being used by the authors of softparsmap to implement a web interface. Media:Reconcile.txt
Phylogenetic footprinting Implemented a naive footprinting method in BioPerl's ClustalW wrapper, as well as a new wrapper around PhastCons and related programs. link1
Morphological characters desc1 link1
Divergence time estimates desc1 link1
NEXUS compliance Identified and (superficially) evaluated major APIs (NCL, Bio::NEXUS, Mesquite) NEXUS (NESCent)
NEXUS compliance identified common errors in files, gathered test files, and proposed scheme for levels of compliance NEXUS (NESCent)
NEXUS compliance created parser for BioJava which is capable of parsing the majority of NEXUS blocks, and will provided remaining ones in unparsed String blocks to the user. Unparseable blocks remain unedited in output files unless explicitly removed. NEXUS (NESCent)
NEXUS compliance created parser for BioRuby. NEXUS (NESCent)
Tree modelling created interfaces for tree modelling in BioJava. NOTE: these have since been removed as it was possible to reuse the open source toolkit JGraphT to perform their equivalent function. link1
Phylip parser created parser for Phylip in BioJava and BioRuby. link1
Nescent PhyloSOC BioJava gained a PhyloSOC student as a direct result of participating in the hackathon, who has since worked on improving the parsers written at the hackathon (including adopting JGraphT and making the NEXUS trees block parser work with it), and writing some useful tree manipulation tools such as multiple hit correction, etc. link1

Critical Use Cases

  1. Family alignment: identify homologues, generate family alignment, evaluate models (3.2, 3.18)
  2. Reconcile trees (3.19) and Determine concordance between two or more phylogenies (3.15)
  3. Phylogenetic footprinting and shadowing (3.10)
  4. Morphological characters: infer tree (3.12) and calculate support values (3.13)
  5. Estimate divergence times (3.17)
  6. NEXUS: lack of support for (and adherence to) standards

Development Targets

The high-priority use cases and the corresponding development targets for each subgroup (toolkit) are summarized on the Targets page.