Difference between revisions of "Project Plan for NeXML and RDF API in BioRuby"

From Phyloinformatics
Jump to: navigation, search
m
m (Week 7 ( July 5 - July 11 ))
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Week 1 ==
+
== Week 1 ( May 24 - May 30 ) ==
Development of the NeXML parser.
+
Planned:<br/>
* Design classes to encapsulate, and parse and return corresponding objects for:
+
I will start the development of the NeXML parser this week. The parser should be able to accept NeXML in a couple of ways: file, io, string, uri. The target is, to be able to parse <code>otus</code>, <code>otu</code>, and simple <code>trees</code>( just a tree with some nodes and edges ), and <code>class</code>. Focus will be on designing classes to encapsulate these NeXML elements, the actual parsing and unit tests and not on documentation.
** Taxa( <code>otu</code> )
 
** Taxa block( <code>otus</code> ) and
 
** Sets( <code>class</code> )
 
  
== Week 2 ==
+
Notes:<br/>
* Design classes to encapsulate, and parse and return corresponding objects for:
+
* Since the support for <code>class</code> elements is not complete in the schema, it will not be implemented in the parser.
** Trees( <code>trees</code> )
 
** Tree( <code>tree</code> ), Network( <code>network</code> ), Node( <code>node</code> ) and Edge( <code>edge</code> )
 
  
== Week 3 ==
+
== Week 2 ( May 31 - June 6 ) ==
* Design classes to encapsulate, and parse and return corresponding objects for:
+
Planned:
** Character block( <code>characters</code> )
+
* Completely implement <code>trees</code> and <code>networks</code> including both their int and float variants.
** <code>format</code>, <code>states</code>, <code>state</code>, <code>char</code>
 
** <code>matrix</code>, <code>row</code>
 
  
== Week 4 ==
+
Start working on <code>characters</code> element. NeXML allows for two broad categories of data( sequence and granular observation ), each with six sub categories. Without keeping the type in mind the parser should be able to recognize:
Make sure that the API for the parser is in place, with software development iterations, tests and documents.
+
* state definitions - <code>states</code> and it child elements( maybe leave ambiguous definition - discuss with Rutger )
 +
* character definitions - <code>char</code>
 +
* matrix - <code>row</code>, raw character sequences (<code>seq</code>) and granular observations( <code>cell</code> )
 +
In parsing <code>characters</code> the focus will be on designing classes to abstract <code>characters</code> and its child elements, the actual parsing and unit tests and not on documentation.
  
== Week 5 ==
+
Notes:
Development of the NeXML serializer
+
* Bootstrap values are stored in NeXML as semantic annotations with the <code>meta</code> tag. So it will be implemented after semantic annotations are done.
* Extend the already designed classes to serialize:
 
** Taxa( <code>otu</code> )
 
** Taxa block( <code>otus</code> ) and
 
** Sets( <code>class</code> )
 
  
== Week 6 ==
+
== Week 3 ( June 7 - June 13 ) ==
* Extend the already designed classes to serialize:
+
Planned:
** Trees( <code>trees</code> )
+
* Completely implement <code>characters</code> with the supported types.
** Tree( <code>tree</code> ), Network( <code>network</code> ), Node( <code>node</code> ) and Edge( <code>edge</code> )
+
* Document the code base and make sure that the parsing API is in place complete with tests and documentation.
 +
* Request for feedback from the BioRuby community( this will be done at the end of the week )
  
== Week 7 ==
+
Notes:
* Extend the already designed classes to serialize:
+
* Due to certain issues <code>characters</code> implementation could not be completed or properly tested. Will do it the next week.
** Character block( <code>characters</code> )
 
** <code>format</code>, <code>states</code>, <code>state</code>, <code>char</code>
 
** <code>matrix</code>, <code>row</code>
 
  
== Week 8 ==
+
== Week 4 ( June 14 - June 20 )==
Make sure that the API for the NeXML serializer is in place, with software development iterations, tests and documents.
+
Planned:<br/>
 +
* Finalize the work on NeXML parser:
 +
** Code cleanup.
 +
** Write documentation.
 +
** Add more tests.
 +
* Start working on the NeXML serializer:
 +
** Root
 +
** Taxa block( <code>otus</code> ) and Taxons( <code>otu</code> ).
 +
Focus will be on adding to_nexml methods to the classes and generating valid NeXML.
  
== Week 9 ==
+
Notes:<br/>
Design classes for semantic annotation in BioRuby.
+
* Existing classes do not need <tt>to_nexml</tt> methods; they sit independently of the serializer implementation. Serializer methods are a part of Bio::NeXML::Writer class.
  
== Week 10 ==
+
== Week 5 ( June 21 - June 27 ) ==
* Parse <code>meta</code> NeXML element and return the corresponding object.
+
Planned:
* Serialize annotations into <code>meta</code> tag.
+
* Completely implement serialization of:
 +
** Trees
 +
** Characters
 +
Focus will be on adding to_nexml methods to the classes and generating valid NeXML.
  
== Week 11 ==
+
Notes:
Make sure that the RDF API is in place, with software development iterations, tests and documents.
+
* Check out the serializer API in the [[NeXML and RDF API for BioRuby]].
 +
* Started getting feedback from BioRuby developers ( more details here [http://lists.open-bio.org/pipermail/bioruby/2010-June/001280.html Archive] ):
 +
** Refactoring the Matrix class.
 +
** Need for regression tests and benchmarking.
 +
** Use of doc tests for document validation and expansion.
 +
** Streaming over memory based parsing.
 +
** Use of RSpec for RDF API
  
== Week 12 ==
+
== Week 6 ( June 28 - July 4 ) ==
Tests and documentations.
+
* Finalize the work on NeXML serializer:
 +
** Code cleanup
 +
** Write documentation
 +
* Start work on the RDF API - Work out the specs( with RSpec ) of the API.
  
== References ==  
+
== Week 7 ( July 5 - July 11 ) ==
A discussion on API can be found here - [[NeXML and RDF API for BioRuby]]
+
Taking this week off.
 +
 
 +
== Week 8 ( July 12 - July 18 ) ==
 +
* Midterm evaluations
 +
* Continue with the RDF specs and design classes to realize them.
 +
 
 +
== Week 9 ( July 19 - July 25 ) and Week 10( July 26 - August 1 ) ==
 +
* Implement NeXML/RDF parser and serializer. This would imply parsing the <code>meta</code> tag and serializing RDF Graphs to <code>meta</code> element.
 +
 
 +
== Week 11 ( August 2 - August 8 ) ==
 +
* Code profiling and benchmarking
 +
* Regression tests
 +
* Make up for any missing tests and documentations.
 +
 
 +
== Week 12 ( August 9 - August 15 ) ==
 +
* Feedback
 +
* Iteration
 +
 
 +
== Technicalities ==
 +
[http://github.com/yeban/bioruby Github] is being used for code collaboration. Any NeXML file read for parser development is validated against the current NeXML schema to ensure correctness. The developed code is being unit tested with Ruby's unit testing framework and documentation generated using Rdoc. All NeXML element are documented here - [[NeXML Elements]] and an API discussion can be found here - [[NeXML and RDF API for BioRuby]].
  
 
[[Category:NeXML and RDF API for BioRuby]]
 
[[Category:NeXML and RDF API for BioRuby]]

Latest revision as of 09:15, 3 July 2010

Week 1 ( May 24 - May 30 )

Planned:
I will start the development of the NeXML parser this week. The parser should be able to accept NeXML in a couple of ways: file, io, string, uri. The target is, to be able to parse otus, otu, and simple trees( just a tree with some nodes and edges ), and class. Focus will be on designing classes to encapsulate these NeXML elements, the actual parsing and unit tests and not on documentation.

Notes:

  • Since the support for class elements is not complete in the schema, it will not be implemented in the parser.

Week 2 ( May 31 - June 6 )

Planned:

  • Completely implement trees and networks including both their int and float variants.

Start working on characters element. NeXML allows for two broad categories of data( sequence and granular observation ), each with six sub categories. Without keeping the type in mind the parser should be able to recognize:

  • state definitions - states and it child elements( maybe leave ambiguous definition - discuss with Rutger )
  • character definitions - char
  • matrix - row, raw character sequences (seq) and granular observations( cell )

In parsing characters the focus will be on designing classes to abstract characters and its child elements, the actual parsing and unit tests and not on documentation.

Notes:

  • Bootstrap values are stored in NeXML as semantic annotations with the meta tag. So it will be implemented after semantic annotations are done.

Week 3 ( June 7 - June 13 )

Planned:

  • Completely implement characters with the supported types.
  • Document the code base and make sure that the parsing API is in place complete with tests and documentation.
  • Request for feedback from the BioRuby community( this will be done at the end of the week )

Notes:

  • Due to certain issues characters implementation could not be completed or properly tested. Will do it the next week.

Week 4 ( June 14 - June 20 )

Planned:

  • Finalize the work on NeXML parser:
    • Code cleanup.
    • Write documentation.
    • Add more tests.
  • Start working on the NeXML serializer:
    • Root
    • Taxa block( otus ) and Taxons( otu ).

Focus will be on adding to_nexml methods to the classes and generating valid NeXML.

Notes:

  • Existing classes do not need to_nexml methods; they sit independently of the serializer implementation. Serializer methods are a part of Bio::NeXML::Writer class.

Week 5 ( June 21 - June 27 )

Planned:

  • Completely implement serialization of:
    • Trees
    • Characters

Focus will be on adding to_nexml methods to the classes and generating valid NeXML.

Notes:

  • Check out the serializer API in the NeXML and RDF API for BioRuby.
  • Started getting feedback from BioRuby developers ( more details here Archive ):
    • Refactoring the Matrix class.
    • Need for regression tests and benchmarking.
    • Use of doc tests for document validation and expansion.
    • Streaming over memory based parsing.
    • Use of RSpec for RDF API

Week 6 ( June 28 - July 4 )

  • Finalize the work on NeXML serializer:
    • Code cleanup
    • Write documentation
  • Start work on the RDF API - Work out the specs( with RSpec ) of the API.

Week 7 ( July 5 - July 11 )

Taking this week off.

Week 8 ( July 12 - July 18 )

  • Midterm evaluations
  • Continue with the RDF specs and design classes to realize them.

Week 9 ( July 19 - July 25 ) and Week 10( July 26 - August 1 )

  • Implement NeXML/RDF parser and serializer. This would imply parsing the meta tag and serializing RDF Graphs to meta element.

Week 11 ( August 2 - August 8 )

  • Code profiling and benchmarking
  • Regression tests
  • Make up for any missing tests and documentations.

Week 12 ( August 9 - August 15 )

  • Feedback
  • Iteration

Technicalities

Github is being used for code collaboration. Any NeXML file read for parser development is validated against the current NeXML schema to ensure correctness. The developed code is being unit tested with Ruby's unit testing framework and documentation generated using Rdoc. All NeXML element are documented here - NeXML Elements and an API discussion can be found here - NeXML and RDF API for BioRuby.