Difference between revisions of "PhyloSoC:Extend the Nexus Class Library to parse NeXML and PhyloXML"

From Phyloinformatics
Jump to: navigation, search
(New page: PhyloSoC:Extend the Nexus Class Library to parse NeXML and PhyloXML ==Author== Michael G. Elliot ==Abstract== Nexus is a flat text file format widely used to sto...)
 
 
(2 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
 
==Author==
 
==Author==
[[User:Michaelelliot | Michael G. Elliot]]
+
[[User:Michaelelliot | Michael G. Elliot]]  
  
 +
mentor: [[User:Mholder | Mark T. Holder]]
 
==Abstract==
 
==Abstract==
 
Nexus is a flat text file format widely used to store bioinformatics data. The Nexus Class Library (NCL) is an integrated collection of C++ classes that parses files written in Nexus. The NCL currently does not have extensive XML support, which means it is incomptible with newly emerging formats such as NeXML and PhyloXML. This project will add the facility to read and write NeXML and PhyloXML formats. In addition, the project will involve designing and implementing an API for querying arbitrary xml-formatted metadata associated with NCL structures.
 
Nexus is a flat text file format widely used to store bioinformatics data. The Nexus Class Library (NCL) is an integrated collection of C++ classes that parses files written in Nexus. The NCL currently does not have extensive XML support, which means it is incomptible with newly emerging formats such as NeXML and PhyloXML. This project will add the facility to read and write NeXML and PhyloXML formats. In addition, the project will involve designing and implementing an API for querying arbitrary xml-formatted metadata associated with NCL structures.
Line 20: Line 21:
  
 
=== Week 1 ===
 
=== Week 1 ===
Tasks: Write parser for NEXML taxa, trees and character blocks
+
Tasks: Write parser for NEXML:
 +
# taxa elements,
 +
# simple trees elements that contain trees with edge lengths (but nothing more complex than that).
 +
# the characters elements with discrete character types
  
 
=== Week 2 ===
 
=== Week 2 ===
Tasks: Complete parser for NEXML "minor blocks" such as assumptions, sets, etc.
+
Tasks: .
Deliverables: Functioning NEXML reader, (possibly writer?)
+
# continuous character data in characters elements.
 +
# Start working in a generic annotation or metadata API into NCL. 
  
=== Week 3 ===
+
Deliverables: NCL can read and write the core NeXML elements.
 +
 
 +
=== Week 3-4 ===
 +
Implement code that uses the annotation/metadata API to store NeXML annotations.  We will prioritize the targeted nexml instance documents to be tackled in these two weeks.
 +
 
 +
Deliverables: All annotations in an NeXML instance document will be accessible through a generic API, and commonly-needed annotations (such as branch support) will be accessible through a more convenient syntax (that will also be used for NHX properties, for instance).
 +
 
 +
=== Week 5 ===
 
Tasks: Write parser for PHYLOXML trees and taxa
 
Tasks: Write parser for PHYLOXML trees and taxa
  
=== Week 4 ===
+
=== Week 6-7 ===
Tasks: Complete parser for PHYLOXML "minor blocks". Clean up code and document.
+
Tasks: Complete parser for PHYLOXML. Clean up code and document.
 
Deliverables: Functioning PHYLOXML reader, (possibly writer?)
 
Deliverables: Functioning PHYLOXML reader, (possibly writer?)
  
=== Week 5 ===
+
=== Week 8 ===
Tasks: Metadata "notes" block... More coming soon
+
Midterm evaluation
 +
Tasks: Start parsing of "NOTES" block...
 +
 
 +
 
 +
=== Week 9 ===
 +
Adapt NCL's hacky new hampshire extended format to use the metadata API.
 +
 
 +
 
 +
 
 +
=== Week 10 ===
 +
Optimization and development of some simple testing clients that "Show off" the API and will serve as good "cookbook" examples for other programmers interested in using NCL.  This should be fun (or at least more fun than writing a parser), and informative about what is difficult to use.
 +
 
 +
 
 +
 
 +
=== Week 11 ===
 +
correct deficiencies in API noticed in the development of clients in week 10.
 +
 
 +
 
 +
=== Week 12 ===
 +
(suggested pencil's down to firm up docs etc).  Clean up documentation and wrap it up.

Latest revision as of 13:21, 25 May 2010

PhyloSoC:Extend the Nexus Class Library to parse NeXML and PhyloXML

Author

Michael G. Elliot

mentor: Mark T. Holder

Abstract

Nexus is a flat text file format widely used to store bioinformatics data. The Nexus Class Library (NCL) is an integrated collection of C++ classes that parses files written in Nexus. The NCL currently does not have extensive XML support, which means it is incomptible with newly emerging formats such as NeXML and PhyloXML. This project will add the facility to read and write NeXML and PhyloXML formats. In addition, the project will involve designing and implementing an API for querying arbitrary xml-formatted metadata associated with NCL structures.

Code

https://sourceforge.net/projects/ncl/ https://ncl.svn.sourceforge.net/svnroot/ncl/branches/xml

Blog

http://circumfluentwaves.wordpress.com

Project Plan

Bonding period

Tasks: Gather sample files, familiarize myself with the library, discuss plans with mentors

Week 1

Tasks: Write parser for NEXML:

  1. taxa elements,
  2. simple trees elements that contain trees with edge lengths (but nothing more complex than that).
  3. the characters elements with discrete character types

Week 2

Tasks: .

  1. continuous character data in characters elements.
  2. Start working in a generic annotation or metadata API into NCL.

Deliverables: NCL can read and write the core NeXML elements.

Week 3-4

Implement code that uses the annotation/metadata API to store NeXML annotations. We will prioritize the targeted nexml instance documents to be tackled in these two weeks.

Deliverables: All annotations in an NeXML instance document will be accessible through a generic API, and commonly-needed annotations (such as branch support) will be accessible through a more convenient syntax (that will also be used for NHX properties, for instance).

Week 5

Tasks: Write parser for PHYLOXML trees and taxa

Week 6-7

Tasks: Complete parser for PHYLOXML. Clean up code and document. Deliverables: Functioning PHYLOXML reader, (possibly writer?)

Week 8

Midterm evaluation Tasks: Start parsing of "NOTES" block...


Week 9

Adapt NCL's hacky new hampshire extended format to use the metadata API.


Week 10

Optimization and development of some simple testing clients that "Show off" the API and will serve as good "cookbook" examples for other programmers interested in using NCL. This should be fun (or at least more fun than writing a parser), and informative about what is difficult to use.


Week 11

correct deficiencies in API noticed in the development of clients in week 10.


Week 12

(suggested pencil's down to firm up docs etc). Clean up documentation and wrap it up.