Difference between revisions of "NeXML Elements"

From Phyloinformatics
Jump to: navigation, search
(update)
 
(58 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Preface==
+
<strong>Note:</strong> This page has moved to: https://github.com/rvosa/bio-nexml/wiki/NeXML-API-for-BioRuby
The following document exhaustively covers the NeXML elements. The coverage is purely technical and I have linked to the appropriate schema definitions from the NeXML site for completeness. A knowledge of the XML Schema Definition( XSD ) is unnecessary to read the document. A discussion on the schema design can be found here: [http://nexml.org/nexml/html/doc/schema-1/#discussion NeXML schema discussion]. Note that this documentation is not final yet.
 
 
 
==Root==
 
The root element of NeXML is called <code>nexml</code>. This element is an instance of the [http://nexml.org/nexml/html/doc/schema-1/nexml/#Nexml Nexml] class.
 
 
 
Attributes:
 
* <code>version</code> - a decimal number indicating the nexml schema version. At present this value is 0.8.
 
* <code>generator</code> - an optional attribute, which is used to identify the program that generated the file. The attribute's value is a free form string.
 
 
 
Namespaces: (Where it says "by convention" in the list below, the convention applies to the three-letter prefixes which are free to
 
vary in most cases, not the namespaces themselves):
 
* the xml namespace prefix that identifies xml schema semantics that might be inlined in the file. By convention this is of the format <code>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"</code> so that parts of schema language used inside nexml (e.g. where a concrete subclass must be specified) are identified by the <code>xsi</code> prefix.
 
* the nexml namespace prefix, by convention of the format <code>xmlns:nex="http://www.nexml.org/1.0"</code>, so that locations where nexml specific types are referenced (e.g. data type subclasses) these are identified by their <code>nex</code> prefix.
 
* the default namespace, <code>xmlns="http://www.nexml.org/1.0"</code>, which is necessary for namespace aware processors (such as the NeXML-to-CDAO xslt stylesheet produced by an EvoInfo subgroup at the Spring 2009 hackathon).
 
* the xml namespace prefix, required to be of the format <code>xmlns:xml="http://www.w3.org/XML/1998/namespace</code>". This may be used, for example, to specify the base address of the document (using the xml:base attribute).
 
 
 
Lastly, to associate the instance document with the nexml schema, it requires an attribute to specify the nexml schema location, and the namespace it applies to. This is of the format <code>xsi:schemaLocation="http://www.nexml.org/1.0 http://www.nexml.org/1.0/nexml.xsd"</code>. Notice that this attribute is a schema language snippet (identified by the xsi:prefix) that identifies a namespace http://www.nexml.org/1.0) and associates it with a physical schema location (http://www.nexml.org/1.0/nexml.xsd).
 
 
 
The root element can be annotated.
 
 
 
==OTUS==
 
OTUS means Organizational Taxonomic Units. The <code>otus</code> element defines a collection of taxon. An <code>otus</code> is an instance of the [http://nexml.org/nexml/html/doc/schema-1/taxa/taxa/#Taxa Taxa] class.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>label</code> - a human readable description of the otus.
 
 
 
===OTU===
 
An <code>otu</code> element defines a taxon. An <code>otu</code> is an instance of the [http://nexml.org/nexml/html/doc/schema-1/taxa/taxa/#Taxon Taxon] class.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>label</code> - a human readable
 
* <code>class</code> - takes the id of a class element. This attribute is optional.
 
 
 
==Class==
 
A set is defined in NeXML with a <code>class</code> element. A <code>class</code> is an instance of the [http://nexml.org/nexml/html/doc/schema-1/abstract/#Class Class] class.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>label</code> - a human readable description of the class. A label is optional.
 
 
 
==Trees==
 
 
 
Trees in NeXML are described following the GraphML syntax by defining nodes and edges using the <code>node</code> and <code>edge</code> element. Phylogenetic tree or network is defined with the <code>tree</code> and <code>network</code> element respectively. A <code>trees</code> element is a conatainer for both tree and network. In a network the in-degree of nodes is lessened, so that a node can have multiple parents.
 
 
 
Attribute:
 
<code>id</code> - a file level unique id
 
<code>otus</code> - takes the id of an otus element defined previously
 
<code>label</code> - a human readable description of the trees. A label is optional.
 
 
 
===Tree===
 
Phylogenetic trees are defined in NeXML with the <code>tree</code> tag. A <code>tree</code> is modeled by the [http://nexml.org/nexml/html/doc/schema-1/trees/abstracttrees/#AbstractTree AbstractTree] class.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>xsi:type</code> - defines the type of the tree. A tree can be:
 
** [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#FloatTree FloatTree]
 
** [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#IntTree IntTree]
 
* <code>label</code> - a human readable description of the tree. A label is optional.
 
 
 
While an IntTree can only have integer edge length a FloatTree has an IEEE 754-1985 compliant floating point number for its edge length.
 
 
 
===Network===
 
Phylogenetic networks are defined with the <code>network</code> tag. A network is modeled by the [http://nexml.org/nexml/html/doc/schema-1/trees/abstracttrees/#AbstractNetwork AbstractNetwork] class and two concrete implementations exist.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>xsi:type</code> - defines the type of the network. A network can be:
 
** [http://nexml.org/nexml/html/doc/schema-1/trees/network/#FloatNetwork FloatNetwork]
 
** [http://nexml.org/nexml/html/doc/schema-1/trees/network/#IntNetwork IntNetwork]
 
* <code>label</code> - a human readable description of the tree. A label is optional.
 
 
 
The difference between a float and int network is same as that between a float and an int tree.
 
 
 
===Nodes===
 
A <code>node</code> tag defines a node of a tree or a network. A tree node is modelled by the [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#TreeNode TreeNode] class and a network node by the [http://nexml.org/nexml/html/doc/schema-1/trees/network/#NetworkNode NetworkNode] class; both of which inherit from the [http://nexml.org/nexml/html/doc/schema-1/trees/abstracttrees/#AbstractNode AbstractNode] class.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id.
 
* <code>otu</code> - id of a previously defined otu element. This is optional.
 
* <code>root</code> - takes <code>true</code> if the node is a root node, <code>false</code> otherwise. The tree is considered rooted, multiply rooted or unrooted based on how many root
 
nodes does it have.
 
 
 
===Edges===
 
An <code>edge</code> tag defines an edge in a tree or a network.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>source</code> - id of the node to be served as the source
 
* <code>target</code> - id of the node to server as the target
 
* <code>length</code> - length of the edge
 
 
 
Edges can be of the following type based on the whether it is defined for a tree( int or float ) or a network( again, int or float ):
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/network/#NetworkFloatEdge NetworkFloatEdge]
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/network/#NetworkIntEdge NetworkIntEdge]
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#TreeFloatEdge TreeFloatEdge]
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#TreeIntEdge TreeIntEdge]
 
 
 
==Characters==
 
The <code>characters</code> element is used to define storage entities like molecular sequences, categorical data or continuous data.
 
<code>characters</code> is modeled by the [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractBlock AbstractBlock] class and twelve concrete types.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id.
 
* <code>otus</code> - takes the id of an otus element defined previously.
 
* <code>xsi:type</code> - defines the type of the storage block. The NeXML standard defines twelve concrete character types
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DnaSeqs DnaSeqs] / [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DnaCells DnaCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#ProteinSeqs ProteinSeqs] /[http://nexml.org/nexml/html/doc/schema-1/characters/protein/#ProteinCells ProteinCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RnaSeqs RnaSeqs] / [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RnaCells RnaCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionSeqs RestrictionSeqs] /[http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionCells  RestrictionCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardSeqs StandardSeqs] /[http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardCells StandardCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousSeqs ContinuousSeqs] /[http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousCells  ContinuousCells]
 
* <code>label</code> - a human readable description of the character block. A label is optional
 
 
 
The characters element is a bucket of observations and the allowed parameter space for those observations.
 
 
 
===Format===
 
The format element defines the the allowed characters and states in a matrix, and their ambiguity mapping. format is an instance of the [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractFormat AbstractFormat] type. Every character element can have zero or one format element( child ).
 
 
 
Attributes: none.
 
 
 
On the basis of the type of its parent characters element, its type can be:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAFormat AAFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousFormat ContinuousFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAFormat DNAFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAFormat RNAFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionFormat RestrictionFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardFormat StandardFormat]
 
 
 
====states====
 
The states element is an instance of the [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractStates AbstractStates]class and serves as the container of states( see the following section ). Zero or more states can be nested inside the format tag.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
 
 
States can be of the following kind depending on the type of the parent format element:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAStates AAStates]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAStates DNAStates]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAStates RNAStates]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionStates RestrictionStates]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardStates StandardStates]
 
 
 
Continuous data is stateless. Naturally there is no states type corresponding to it.
 
=====state=====
 
A state is defined with state, polymorphic_state_set, uncertain_state_set. They are respectively instances of [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractState AbstractState], [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractUncertainStateSet AbstractUncertainStateSet] and [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractPolymorphicStateSet AbstractPolymorphicStateSet].
 
 
 
A state can be of the following kind depending on the type of the parent states element:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAState AAState]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAState  DNAState]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAState  RNAState]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionState RestrictionState]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardState StandardState]
 
 
 
Attribute:
 
id - a file level unique id
 
symbol - the value of the state.
 
 
 
====char====
 
A char element defines the column definition. A char is an instance of the [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractChar AbstractChar] class.
 
 
 
Depending on the type of the parent format element, a char can have the following types:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAChar AAChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousChar ContinuousChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAChar DNAChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAChar RNAChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionChar RestrictionChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardChar StandardChar]
 
 
 
Attributes:
 
With the type of the char in place, a char can take one or more of the following attributes:
 
* <code>id</code> - a file level unique id is a must for all the char elements
 
* <code>states</code> - the value being the id of a pre defined states element.Except the ContinuousChar type all chars take a states attribute.
 
* <code>codon</code> - specify the codon position. DnaChar and RnaChar optionally take the codon attribute.
 
 
 
===Matrix===
 
A matrix element hold the state observations. Depending on the characters type, matrix element can  be of 2 abstract types each with 6 concrete type.
 
 
 
*[http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractSeqMatrix AbstractSeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AASeqMatrix AASeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousSeqMatrix ContinuousSeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNASeqMatrix DNASeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNASeqMatrix RNASeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionSeqMatrix RestrictionSeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardSeqMatrix StandardSeqMatrix]
 
*[http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractObsMatrix AbstractObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAObsMatrix AAObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousObsMatrix ContinuousObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAObsMatrix DNAObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAObsMatrix RNAObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionObsMatrix RestrictionObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardObsMatrix StandardObsMatrix]
 
 
 
Attributes: none
 
 
 
====Row====
 
A matrix must have one or more row elements. Just like its parent matrix, row can be of two abstract types each with 6 concrete type.
 
 
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractSeqRow AbstractSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAMatrixSeqRow AAMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousMatrixSeqRow ContinuousMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAMatrixSeqRow DNAMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAMatrixSeqRow RNAMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionMatrixSeqRow RestrictionMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardMatrixSeqRow StandardMatrixSeqRow]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractObsRow AbstractObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAMatrixObsRow AAMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousMatrixObsRow ContinuousMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAMatrixObsRow DNAMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAMatrixObsRow RNAMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionMatrixObsRow RestrictionMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardMatrixObsRow StandardMatrixObsRow]
 
 
 
==Semantic Annotation==
 
Fundamental data objects in NeXML can be annotated using RDFa. This way, the annotationsare directly available to any off-the-shelf RDFa parser but are also simple to integrate into non- RDF-aware processing libraries. The annotations are expressed using recursively nested meta elements, and are essentially triples whose subjects are identified by the about attribute. To specify that a NeXML element such as a tree is the subject, the about attribute and the id attribute of the element must match. This way, core NeXML elements can be converted to RDF (for example by using an XSL stylesheet), RDFa annotations can be extracted from the NeXML (using another stylesheet), and the two resulting graphs can be aligned by the subjects of their respective triples.
 
 
 
* If the triple’s object value is a literal such as a string or a number, the exact data type is specified by the datatype attribute; these types are typically core XML schema types such as xsd:string for atomic types, or rdf:Literal for nested XML element structures that are to be parsed as opaque literals in an RDF graph. The object value is enclosed inside the meta element and the predicate is specified using the property attribute (whose value must be a CURIE). meta elements of this type are of the subclass nex:LiteralMeta.
 
* If the triple’s object value is a remote resource, its location is specified using the href attribute. The predicate for this class of triples is specified as a CURIE using the rel attribute. meta elements of this type are of the subclass nex:ResourceMeta.
 
* If the triple’s object value is a nested annotation, its predicate is specified as a CURIE using the rel attribute. Since this enclosing meta element is to be transformed into an anonymous RDF node, it needs to be identified as the subject of a reification by assigning it an about attribute (as per the RFa rules for identifying triple subjects).
 
  
 
[[Category:NeXML and RDF API for BioRuby]]
 
[[Category:NeXML and RDF API for BioRuby]]

Latest revision as of 19:04, 22 September 2011

Note: This page has moved to: https://github.com/rvosa/bio-nexml/wiki/NeXML-API-for-BioRuby