PhyloSoC:Extend the Nexus Class Library to parse NeXML and PhyloXML

From Phyloinformatics
Jump to: navigation, search

PhyloSoC:Extend the Nexus Class Library to parse NeXML and PhyloXML

Author

Michael G. Elliot

mentor: Mark T. Holder

Abstract

Nexus is a flat text file format widely used to store bioinformatics data. The Nexus Class Library (NCL) is an integrated collection of C++ classes that parses files written in Nexus. The NCL currently does not have extensive XML support, which means it is incomptible with newly emerging formats such as NeXML and PhyloXML. This project will add the facility to read and write NeXML and PhyloXML formats. In addition, the project will involve designing and implementing an API for querying arbitrary xml-formatted metadata associated with NCL structures.

Code

https://sourceforge.net/projects/ncl/ https://ncl.svn.sourceforge.net/svnroot/ncl/branches/xml

Blog

http://circumfluentwaves.wordpress.com

Project Plan

Bonding period

Tasks: Gather sample files, familiarize myself with the library, discuss plans with mentors

Week 1

Tasks: Write parser for NEXML:

  1. taxa elements,
  2. simple trees elements that contain trees with edge lengths (but nothing more complex than that).
  3. the characters elements with discrete character types

Week 2

Tasks: .

  1. continuous character data in characters elements.
  2. Start working in a generic annotation or metadata API into NCL.

Deliverables: NCL can read and write the core NeXML elements.

Week 3-4

Implement code that uses the annotation/metadata API to store NeXML annotations. We will prioritize the targeted nexml instance documents to be tackled in these two weeks.

Deliverables: All annotations in an NeXML instance document will be accessible through a generic API, and commonly-needed annotations (such as branch support) will be accessible through a more convenient syntax (that will also be used for NHX properties, for instance).

Week 5

Tasks: Write parser for PHYLOXML trees and taxa

Week 6-7

Tasks: Complete parser for PHYLOXML. Clean up code and document. Deliverables: Functioning PHYLOXML reader, (possibly writer?)

Week 8

Midterm evaluation Tasks: Start parsing of "NOTES" block...


Week 9

Adapt NCL's hacky new hampshire extended format to use the metadata API.


Week 10

Optimization and development of some simple testing clients that "Show off" the API and will serve as good "cookbook" examples for other programmers interested in using NCL. This should be fun (or at least more fun than writing a parser), and informative about what is difficult to use.


Week 11

correct deficiencies in API noticed in the development of clients in week 10.


Week 12

(suggested pencil's down to firm up docs etc). Clean up documentation and wrap it up.