Difference between revisions of "BioRuby PhyloXML HowTo documentation"

From Phyloinformatics
Jump to: navigation, search
m (How to parse a file)
(How to retrieve 'other' data)
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
=Intro=
 
=Intro=
  
PhyloXML is a tree format for saving and exchanging data of annotated phylogenetic trees. PhyloXML parsers and writers are implemented in BioRuby, BioPython and BioPerl. More information at www.phyloxml.org
+
PhyloXML is an XML language for saving, analyzing and exchanging data of annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer. More information at www.phyloxml.org
  
 
=Requirements=
 
=Requirements=
Line 12: Line 12:
  
 
=How to parse a file=
 
=How to parse a file=
<code>
+
 
 
  require 'bio'
 
  require 'bio'
 
+
 
 
  # Create new phyloxml parser
 
  # Create new phyloxml parser
 
  phyloxml = Bio::PhyloXML::Parser.new('example.xml')
 
  phyloxml = Bio::PhyloXML::Parser.new('example.xml')
 
+
 
 
  # Print the names of all trees in the file
 
  # Print the names of all trees in the file
 
  phyloxml.each do |tree|
 
  phyloxml.each do |tree|
 
   puts tree.name
 
   puts tree.name
 
  end
 
  end
</code>
+
 
 +
If there are several trees in the file, you can access the one you wish by an index
 +
 
 +
tree = phyloxml[3]
 +
 
 +
You can use all Bio::Tree methods on the tree. For example,
 +
 
 +
tree.leaves.each do |node|
 +
    puts node.name
 +
end
 +
 
 +
PhyloXML files can hold additional information besides phylogenies at the end of the file. This info can be accessed through the 'other' array of the parser object.
 +
 
 +
phyloxml = Bio::PhyloXML::Parser.new('example.xml')
 +
while tree = phyloxml.next_tree
 +
  # do stuff with trees
 +
end
 +
 
 +
puts phyloxml.other
 +
 
 +
=How to write a file=
 +
 
 +
# Create new phyloxml writer
 +
writer = Bio::PhyloXML::Writer.new('tree.xml')
 +
 
 +
# Write tree to the file tree.xml
 +
writer.write(tree1)
 +
 
 +
# Add another tree to the file
 +
writer.write(tree2)
 +
 
 +
= How to retrieve data =
 +
 
 +
Here is an example of how to retrieve the scientific name of the clades.
 +
 
 +
require 'bio'
 +
 
 +
phyloxml = Bio::PhyloXML::Parser.new('ncbi_taxonomy_mollusca.xml')
 +
phyloxml.each do |tree|
 +
  tree.each_node do |node|
 +
    print "Scientific name: ", node.taxonomies[0].scientific_name, "\n"
 +
  end
 +
end
 +
 
 +
= How to retrieve 'other' data =
 +
 
 +
require 'bio'
 +
 
 +
phyloxml = Bio::PhyloXML::Parser.new('phyloxml_examples.xml')
 +
while tree = phyloxml.next_tree
 +
  #do something with the trees
 +
end
 +
 
 +
p phyloxml.other
 +
puts "\n"
 +
#=> output is an object representation
 +
 
 +
#Print in a readable way
 +
puts phyloxml.other[0].to_xml, "\n"
 +
#=>:
 +
#
 +
#<align:alignment xmlns:align="http://example.org/align">
 +
#  <seq name="A">acgtcgcggcccgtggaagtcctctcct</seq>
 +
#  <seq name="B">aggtcgcggcctgtggaagtcctctcct</seq>
 +
#  <seq name="C">taaatcgc--cccgtgg-agtccc-cct</seq>
 +
#</align:alignment>
 +
 
 +
#Once we know whats there, lets output just sequences
 +
phyloxml.other[0].children.each do |node|
 +
  puts node.value
 +
end
 +
#=>
 +
#
 +
#acgtcgcggcccgtggaagtcctctcct
 +
#aggtcgcggcctgtggaagtcctctcct
 +
#taaatcgc--cccgtgg-agtccc-cct

Latest revision as of 15:21, 14 August 2009

Intro

PhyloXML is an XML language for saving, analyzing and exchanging data of annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer. More information at www.phyloxml.org

Requirements

In addition to BioRuby library you need a libxml ruby bindings. To install:

gem install -r libxml-ruby

How to parse a file

require 'bio'
# Create new phyloxml parser
phyloxml = Bio::PhyloXML::Parser.new('example.xml')
# Print the names of all trees in the file
phyloxml.each do |tree|
  puts tree.name
end

If there are several trees in the file, you can access the one you wish by an index

tree = phyloxml[3]

You can use all Bio::Tree methods on the tree. For example,

tree.leaves.each do |node|
   puts node.name
end

PhyloXML files can hold additional information besides phylogenies at the end of the file. This info can be accessed through the 'other' array of the parser object.

phyloxml = Bio::PhyloXML::Parser.new('example.xml')
while tree = phyloxml.next_tree
  # do stuff with trees
end
  
puts phyloxml.other

How to write a file

# Create new phyloxml writer
writer = Bio::PhyloXML::Writer.new('tree.xml')
# Write tree to the file tree.xml
writer.write(tree1)
# Add another tree to the file
writer.write(tree2)

How to retrieve data

Here is an example of how to retrieve the scientific name of the clades.

require 'bio'
phyloxml = Bio::PhyloXML::Parser.new('ncbi_taxonomy_mollusca.xml')
phyloxml.each do |tree|
  tree.each_node do |node|
    print "Scientific name: ", node.taxonomies[0].scientific_name, "\n"
  end
end

How to retrieve 'other' data

require 'bio'
phyloxml = Bio::PhyloXML::Parser.new('phyloxml_examples.xml')
while tree = phyloxml.next_tree
  #do something with the trees
end
p phyloxml.other
puts "\n"
#=> output is an object representation
#Print in a readable way
puts phyloxml.other[0].to_xml, "\n"
#=>:
#
#<align:alignment xmlns:align="http://example.org/align">
#  <seq name="A">acgtcgcggcccgtggaagtcctctcct</seq>
#  <seq name="B">aggtcgcggcctgtggaagtcctctcct</seq>
#  <seq name="C">taaatcgc--cccgtgg-agtccc-cct</seq>
#</align:alignment>
#Once we know whats there, lets output just sequences
phyloxml.other[0].children.each do |node|
  puts node.value
end
#=>
#
#acgtcgcggcccgtggaagtcctctcct
#aggtcgcggcctgtggaagtcctctcct
#taaatcgc--cccgtgg-agtccc-cct