PhyloSoC:Extend the Nexus Class Library to parse NeXML and PhyloXML

From Phyloinformatics
Revision as of 13:10, 25 May 2010 by (talk) (Week 1)
Jump to: navigation, search

PhyloSoC:Extend the Nexus Class Library to parse NeXML and PhyloXML


Michael G. Elliot


Nexus is a flat text file format widely used to store bioinformatics data. The Nexus Class Library (NCL) is an integrated collection of C++ classes that parses files written in Nexus. The NCL currently does not have extensive XML support, which means it is incomptible with newly emerging formats such as NeXML and PhyloXML. This project will add the facility to read and write NeXML and PhyloXML formats. In addition, the project will involve designing and implementing an API for querying arbitrary xml-formatted metadata associated with NCL structures.



Project Plan

Bonding period

Tasks: Gather sample files, familiarize myself with the library, discuss plans with mentors

Week 1

Tasks: Write parser for NEXML:

                      1. taxa elements,
                      2. simple trees elements that contain trees with edge lengths (but nothing more complex than that).
                      3. the characters elements with discrete character types

Week 2

Tasks: Complete parser for NEXML "minor blocks" such as assumptions, sets, etc. Deliverables: Functioning NEXML reader, (possibly writer?)

Week 3

Tasks: Write parser for PHYLOXML trees and taxa

Week 4

Tasks: Complete parser for PHYLOXML "minor blocks". Clean up code and document. Deliverables: Functioning PHYLOXML reader, (possibly writer?)

Week 5

Tasks: Metadata "notes" block... More coming soon