BioRuby PhyloXML HowTo documentation

From Phyloinformatics
Revision as of 15:21, 14 August 2009 by Dianaj (talk) (How to retrieve 'other' data)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


PhyloXML is an XML language for saving, analyzing and exchanging data of annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer. More information at


In addition to BioRuby library you need a libxml ruby bindings. To install:

gem install -r libxml-ruby

How to parse a file

require 'bio'
# Create new phyloxml parser
phyloxml ='example.xml')
# Print the names of all trees in the file
phyloxml.each do |tree|

If there are several trees in the file, you can access the one you wish by an index

tree = phyloxml[3]

You can use all Bio::Tree methods on the tree. For example,

tree.leaves.each do |node|

PhyloXML files can hold additional information besides phylogenies at the end of the file. This info can be accessed through the 'other' array of the parser object.

phyloxml ='example.xml')
while tree = phyloxml.next_tree
  # do stuff with trees
puts phyloxml.other

How to write a file

# Create new phyloxml writer
writer ='tree.xml')
# Write tree to the file tree.xml
# Add another tree to the file

How to retrieve data

Here is an example of how to retrieve the scientific name of the clades.

require 'bio'
phyloxml ='ncbi_taxonomy_mollusca.xml')
phyloxml.each do |tree|
  tree.each_node do |node|
    print "Scientific name: ", node.taxonomies[0].scientific_name, "\n"

How to retrieve 'other' data

require 'bio'
phyloxml ='phyloxml_examples.xml')
while tree = phyloxml.next_tree
  #do something with the trees
p phyloxml.other
puts "\n"
#=> output is an object representation
#Print in a readable way
puts phyloxml.other[0].to_xml, "\n"
#<align:alignment xmlns:align="">
#  <seq name="A">acgtcgcggcccgtggaagtcctctcct</seq>
#  <seq name="B">aggtcgcggcctgtggaagtcctctcct</seq>
#  <seq name="C">taaatcgc--cccgtgg-agtccc-cct</seq>
#Once we know whats there, lets output just sequences
phyloxml.other[0].children.each do |node|
  puts node.value