NeXML and RDF API for BioRuby

From Phyloinformatics
Revision as of 12:19, 9 June 2010 by Yeban (talk)
Jump to: navigation, search


The following document discusses the implementation of an NeXML parser and serializer and an RDF API for BioRuby. Note that this document is not final yet.


Currently all the parsing is done at the start( i.e. no streaming ). This is likely to change later. Parse an NeXML file:

 doc = "trees.xml" )
 nexml = doc.parse
 nexml.class #Bio::NeXML::Nexml

Otus and Otu

Taxa blocks are stored internally as a Ruby hash for faster 'id' based lookup.

 nexml.otus_set #a hash of otus objects indexed with 'id'
 nexml.otus #an array of otus objects
 #iterate over each otus object
 nexml.each_otus do |taxa|
   puts taxa.label
 #find an otus by id
 taxa1 = nexml.get_otus_by_id "taxa1"
 taxa1.class #Bio::NeXML::Otus

Similarly taxons are stored internally as a Ruby hash indexed with 'id'. To work with otu :

 taxa1.otu_set #a hash of otu objects indexed with 'id
 taxa1.otus #an array of otu objects
 #get an individual otu object given its id
 taxon1 = taxa1[ 'taxon1' ]
 #or iterate over each otu object
 taxa1.each do |taxon|
   puts taxon.label

Each otus object is an enumerable: &:id

Trees and Tree

Get a trees object:

 nexml.trees #return an array of trees objects.
 trees1 = nexml.trees[0]
 trees1.class #Bio::NeXML::Trees
 #get the taxa to which the trees is linked to

Currently a tree can have only one root node. To work with an individual tree :

 #get a tree object with its 'id'
 tree1 = trees1[ 'tree1' ]
 tree1.class #Bio::NeXML::IntTree or Bio::NeXML::FloatTree
 #or iterate over each tree object
 trees1.each do |tree|
   puts tree.label

All the available methods from Bio::Tree class can be called on a tree object.

 node1 = tree.get_node_by_name "n3" #note name is same as id
 tree1.parents node1

A trees object is an enumerable: &:id