Difference between revisions of "PhyloSoC:BioPerl integration of the NeXML exchange standard and Bio::Phylo toolkit/nexml module design"

From Phyloinformatics
Jump to: navigation, search
m (Nexml Item Module Design)
m (Interesting Points)
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Nexml Item Module Design ==
+
== Nexml Item Module Design (work in progress) ==
  
Currently, parsing a nexml file into Bioperl makes use of the *IO modules (AlignIO, SeqIO, and TreeIO) to load nexml represented data into bioperl objects.  These modules will handle each data type separately, but the ultimate goal is to make this easier on the user and allow the full parsing of a nexml file with one pass of the Bio::Phylo parser without forcing the user to parse the file three times - once each for sequences, alignments, and trees. To accomplish this, a Bio::nexml module will be created that will handle an entire nexml file including the three different data types it contains.  A powerful feature of nexml is it's ability to relate data to each other in relevant ways. This module will also handle (as much as possible) the relationships between the different data types.
+
==== Overview ====
  
=== Data Relationships to Maintain ===
+
Currently, parsing a nexml file into Bioperl makes use of the *IO modules (AlignIO, SeqIO, and TreeIO) to load nexml represented data into Bioperl objects.  These modules will handle each data type separately, but the ultimate goal is to make this easier on the user and allow the full parsing of a nexml document into a bioperl object that can contain each data type and maintain the relationships between them.  To accomplish this, a Bio::Nexml module will be created that will act as a representation of an entire Nexml document and will hold seq, aln, and tree objects.  Creating a new class will provide a way to maintain relationships between the data and will also make it more clear to the user that this represents an entire nexml document.  It will be a simple container that lets the majority of the work be done by the object classes it contains making use of the modules/methods previously written during this project.
*Sequences <=> Nodes
 
** How to do this?
 
*Alignments <=> Trees
 
** How to do this?
 
*Taxa <=> Trees
 
** How to do this?
 
*Taxa <=> Alignments
 
** How to do this?
 
*Taxon <=> Node
 
** How to do this?
 
  
=== This design plan is a work in progress ===
+
==== Synopsis ====
 +
#something like this
 +
 +
#create and cache an xml parser linked to the stream
 +
$nexml_doc = Bio::NeXML->new(-file=>'a_nexml_file.xml');
 +
 +
#allow reading of different object types
 +
#by using previously implemented method Bio::TreeIO::Nexml->next_tree()
 +
$tree = $nexml_doc->next_tree()
 +
 
 +
==== Interesting Points ====
 +
*To maintain the ability to parse individual datatypes (i.e. $tree_doc = Bio::TreeIO->new( -format => 'nexml')) as well as an entire document (i.e $nexml_doc = Bio::NeXML->new(-file 'thefile.xml')), the code that currently resides in the next_seq, next_tree, and next_aln functions will be moved to a Bio::Nexml::Util module.  This allows the code to live in a single place and still be used by both the Bio::*IO::nexml (AlignIO, SeqIO, and TreeIO) modules as well as the Bio::Nexml module.
 +
 
 +
==== Data Relationships to Maintain ====
 +
*Sequences    <=>      Nodes
 +
*Alignments  <=>      Trees
 +
*Taxa              <=>      Trees
 +
*Taxa              <=>      Alignments
 +
*Taxon            <=>      Node

Latest revision as of 07:10, 22 June 2009

Nexml Item Module Design (work in progress)

Overview

Currently, parsing a nexml file into Bioperl makes use of the *IO modules (AlignIO, SeqIO, and TreeIO) to load nexml represented data into Bioperl objects. These modules will handle each data type separately, but the ultimate goal is to make this easier on the user and allow the full parsing of a nexml document into a bioperl object that can contain each data type and maintain the relationships between them. To accomplish this, a Bio::Nexml module will be created that will act as a representation of an entire Nexml document and will hold seq, aln, and tree objects. Creating a new class will provide a way to maintain relationships between the data and will also make it more clear to the user that this represents an entire nexml document. It will be a simple container that lets the majority of the work be done by the object classes it contains making use of the modules/methods previously written during this project.

Synopsis

#something like this

#create and cache an xml parser linked to the stream
$nexml_doc = Bio::NeXML->new(-file=>'a_nexml_file.xml');

#allow reading of different object types
#by using previously implemented method Bio::TreeIO::Nexml->next_tree()
$tree = $nexml_doc->next_tree()

Interesting Points

  • To maintain the ability to parse individual datatypes (i.e. $tree_doc = Bio::TreeIO->new( -format => 'nexml')) as well as an entire document (i.e $nexml_doc = Bio::NeXML->new(-file 'thefile.xml')), the code that currently resides in the next_seq, next_tree, and next_aln functions will be moved to a Bio::Nexml::Util module. This allows the code to live in a single place and still be used by both the Bio::*IO::nexml (AlignIO, SeqIO, and TreeIO) modules as well as the Bio::Nexml module.

Data Relationships to Maintain

  • Sequences <=> Nodes
  • Alignments <=> Trees
  • Taxa <=> Trees
  • Taxa <=> Alignments
  • Taxon <=> Node