Difference between revisions of "NeXML Elements"

From Phyloinformatics
Jump to: navigation, search
(OTUS)
(update)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Preface==
+
<strong>Note:</strong> This page has moved to: https://github.com/rvosa/bio-nexml/wiki/NeXML-API-for-BioRuby
The following document exhaustively covers the NeXML elements. The coverage is purely technical and I have linked to the appropriate schema definitions from the NeXML site for completeness. A knowledge of the XML Schema Definition( XSD ) is unnecessary to read the document. A discussion on the schema design can be found here: [http://nexml.org/nexml/html/doc/schema-1/#discussion NeXML schema discussion]. Note that this documentation is not final yet.
 
 
 
==Root==
 
The root element of NeXML is called <code>nexml</code>. This element is an instance of the [http://nexml.org/nexml/html/doc/schema-1/nexml/#Nexml Nexml] class.
 
 
 
Attributes:
 
* <code>version</code> - a decimal number indicating the nexml schema version. At present this value is 0.9.
 
* <code>generator</code> - an optional attribute of type string, which is used to identify the program that generated the file.
 
 
 
Namespaces: (Where it says "by convention" in the list below, the convention applies to the three-letter prefixes which are free to
 
vary in most cases, not the namespaces themselves):
 
* the xml namespace prefix that identifies xml schema semantics that might be inlined in the file. By convention this is of the format <code>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"</code> so that parts of schema language used inside nexml (e.g. where a concrete subclass must be specified) are identified by the <code>xsi</code> prefix.
 
* the nexml namespace prefix, by convention of the format <code>xmlns:nex="http://www.nexml.org/1.0"</code>, so that locations where nexml specific types are referenced (e.g. data type subclasses) these are identified by their <code>nex</code> prefix.
 
* the default namespace, <code>xmlns="http://www.nexml.org/1.0"</code>, which is necessary for namespace aware processors (such as the NeXML-to-CDAO xslt stylesheet produced by an EvoInfo subgroup at the Spring 2009 hackathon).
 
* the xml namespace prefix, required to be of the format <code>xmlns:xml="http://www.w3.org/XML/1998/namespace</code>". This may be used, for example, to specify the base address of the document (using the xml:base attribute).
 
 
 
Lastly, to associate the instance document with the nexml schema, it requires an attribute to specify the nexml schema location, and the namespace it applies to. This is of the format <code>xsi:schemaLocation="http://www.nexml.org/1.0 http://www.nexml.org/1.0/nexml.xsd"</code>. Notice that this attribute is a schema language snippet (identified by the xsi:prefix) that identifies a namespace http://www.nexml.org/1.0) and associates it with a physical schema location (http://www.nexml.org/1.0/nexml.xsd).
 
 
 
An example of root element would be:
 
 
 
<xml>
 
<?xml version="1.0" encoding="ISO-8859-1"?>
 
<nex:nexml
 
    version="0.9"
 
    generator="eclipse"
 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 
    xmlns:xml="http://www.w3.org/XML/1998/namespace"
 
    xmlns:nex="http://www.nexml.org/1.0"
 
    xmlns="http://www.nexml.org/1.0"
 
    xsi:schemaLocation="http://www.nexml.org/1.0 http://www.nexml.org/1.0/nexml.xsd">
 
    <!-- contents go here -->
 
</nex:nexml>
 
</xml>
 
 
 
The root element can contain:
 
* zero or more semantic annotations
 
* one or more OTUs elements
 
* zero or more characters elements
 
* zero or more trees elements (in mixed order with characters elements).
 
 
 
==OTUS==
 
OTUS means Organizational Taxonomic Units. The <code>otus</code> element defines a collection of taxon. An <code>otus</code> is an instance of the [http://nexml.org/nexml/html/doc/schema-1/taxa/taxa/#Taxa Taxa] class.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id.
 
* <code>label</code> - a human readable description of the otus.
 
 
 
An <code>otus</code> element can take:
 
* <code>otu</code> - zero or more.
 
* <code>meta</code> - zero or more.
 
 
 
===OTU===
 
An <code>otu</code> element defines a taxon. An <code>otu</code> is an instance of the [http://nexml.org/nexml/html/doc/schema-1/taxa/taxa/#Taxon Taxon] class.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>label</code> - a human readable
 
* <code>class</code> - takes the id of a <code>class</code> element. This attribute is optional.
 
 
 
An <code>otu</code> element can take:
 
* <code>meta</code> - zero or more.
 
 
 
===Example===
 
<xml>
 
<otus id="taxa1" label="Primary taxa block">
 
    <otu id="t1" label="Homo sapiens"/>
 
    <otu id="t2" label="Pan paniscus"/>
 
    <otu id="t3" label="Pan troglodytes"/>
 
    <otu id="t4" label="Gorilla gorilla"/>
 
    <otu id="t5" label="Pongo pygmaeus"/>
 
</otus>
 
</xml>
 
 
 
==Class==
 
A set is defined in NeXML with a <code>class</code> element. A <code>class</code> is an instance of the [http://nexml.org/nexml/html/doc/schema-1/abstract/#Class Class] class.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>label</code> - a human readable description of the class. A label is optional.
 
 
 
Note:
 
This attribute has not been implemented in the schema fully.
 
 
 
==Trees==
 
 
 
Trees in NeXML are described following the GraphML syntax. Nodes and edges of a tree are defined using the <code>node</code> and <code>edge</code> element respectively. In NeXML acyclic trees are defined with the <code>tree</code> tag while cyclic trees or networks are defined with the <code>network</code> tag, so that in a network indegree of a node can be more than one. A <code>trees</code> element is a conatainer for both <code>tree</code> and <code>network</code>.
 
 
 
Attribute:
 
* <code>id</code> - a file level unique id.
 
* <code>otus</code> - takes the id of an otus element defined previously.
 
* <code>label</code> - a human readable description of the trees. A label is optional.
 
 
 
A <code>trees</code> element is of the <code>[http://nexml.org/nexml/html/doc/schema-1/trees/trees/#Trees Trees]</code> concrete type.
 
 
 
A trees element can have:
 
* one or more <code>tree</code>
 
* one or more <code>network</code> ( in mixed order with <code>tree</code> )
 
 
 
===Tree===
 
Phylogenetic trees are defined in NeXML with the <code>tree</code> tag. Two types of tree can be defined: IntTree and FloatTree. While an IntTree can only have integer edge length a FloatTree has an IEEE 754-1985 compliant floating point number for its edge length.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>xsi:type</code> - defines the type of the <code>tree</code>. It can be either <code>IntTree</code> or a <code>FloatTree</code>
 
* <code>label</code> - a human readable description of the tree. A label is optional.
 
 
 
With the [http://nexml.org/nexml/html/doc/schema-1/trees/abstracttrees/#AbstractTree AbstractTree] class at the root of the hierarchy the following sub types are defined.
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#FloatTree FloatTree]
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#IntTree IntTree]
 
 
 
A <code>tree</code> must contain the following in the same order:
 
* one or more <code>node</code>
 
* zero or one <code>rootedge</code> and
 
* one or more <code>edge</code>
 
 
 
===Network===
 
Phylogenetic networks are defined with the <code>network</code> tag.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>xsi:type</code> - defines the type of the <code>network</code>. It can take any of the two values: <code>IntNetwork</code>, <code>FloatNetwork</code>
 
* <code>label</code> - a human readable description of the tree. A label is optional.
 
 
 
Difference between an int and float network is that same as that of int and float tree.
 
 
 
A network is modeled by the [http://nexml.org/nexml/html/doc/schema-1/trees/abstracttrees/#AbstractNetwork AbstractNetwork] class and two concrete implementations exist:
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/network/#FloatNetwork FloatNetwork]
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/network/#IntNetwork IntNetwork]
 
 
 
A <code>tree</code> must contain the following in the same order:
 
* one or more <code>node</code>
 
* one or more <code>edge</code>
 
 
 
===Nodes===
 
A <code>node</code> tag defines a node of a tree or a network.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id.
 
* <code>otu</code> - id of a previously defined otu element. This is optional.
 
* <code>root</code> - takes <code>true</code> if the node is a root node, <code>false</code> otherwise.
 
 
 
The tree is considered rooted, multiply rooted or unrooted based on how many root nodes does it have.
 
 
 
Nodes have the following hierarchy:
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/abstracttrees/#AbstractNode AbstractNode]
 
** [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#TreeNode TreeNode]
 
** [http://nexml.org/nexml/html/doc/schema-1/trees/network/#NetworkNode NetworkNode]
 
 
 
===Edges===
 
An <code>edge</code> tag defines an edge in a tree or a network.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>source</code> - id of the node to be served as the source
 
* <code>target</code> - id of the node to server as the target
 
* <code>length</code> - length of the edge
 
 
 
Inheriting from the [http://nexml.org/nexml/html/doc/schema-1/trees/abstracttrees/#AbstractEdge AbstractEdge] edges can be of the following concrete type based on whether it is defined for a tree( int or float ) or a network( again, int or float ):
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/network/#NetworkFloatEdge NetworkFloatEdge]
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/network/#NetworkIntEdge NetworkIntEdge]
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#TreeFloatEdge TreeFloatEdge]
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#TreeIntEdge TreeIntEdge]
 
 
 
===Rootedge===
 
A <code>rootedge</code> tag is used to indicate a time span leading up to the root( principally for coalescent trees ).
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>target</code> - id of the root node
 
* <code>length</code> - length of the edge
 
 
 
Rootedges have the following heirarchy:
 
* [http://nexml.org/nexml/html/doc/schema-1/trees/abstracttrees/#AbstractRootEdge AbstractRootEdge]
 
** [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#TreeIntRootEdge TreeIntRootEdge]
 
** [http://nexml.org/nexml/html/doc/schema-1/trees/tree/#TreeFloatRootEdge TreeFloatRootEdge]
 
 
 
===Example===
 
<xml>
 
<!-- Within the root element -->
 
<trees otus="taxa1" id="Trees" label="TreesBlockFromXML">
 
    <tree id="tree1" xsi:type="nex:FloatTree" label="tree1">
 
        <node id="n1" label="n1" root="true"/>
 
        <node id="n2" label="n2" otu="t1"/>
 
        <node id="n3" label="n3"/>
 
        <node id="n4" label="n4"/>
 
        <node id="n5" label="n5" otu="t3"/>
 
        <node id="n6" label="n6" otu="t2"/>
 
        <node id="n7" label="n7"/>
 
        <node id="n8" label="n8" otu="t5"/>
 
        <node id="n9" label="n9" otu="t4"/>
 
        <rootedge target="n1" id="re1" length="0.34765" />
 
        <edge source="n1" target="n3" id="e1" length="0.34534"/>
 
        <edge source="n1" target="n2" id="e2" length="0.4353"/>
 
        <edge source="n3" target="n4" id="e3" length="0.324"/>
 
        <edge source="n3" target="n7" id="e4" length="0.3247"/>
 
        <edge source="n4" target="n5" id="e5" length="0.234"/>
 
        <edge source="n4" target="n6" id="e6" length="0.3243"/>
 
        <edge source="n7" target="n8" id="e7" length="0.32443"/>
 
        <edge source="n7" target="n9" id="e8" length="0.2342"/>
 
    </tree>
 
    <tree id="tree2" xsi:type="nex:IntTree" label="tree2">
 
        <node id="n1" label="n1"/>
 
        <node id="n2" label="n2" otu="t1"/>
 
        <node id="n3" label="n3"/>
 
        <node id="n4" label="n4"/>
 
        <node id="n5" label="n5" otu="t3"/>
 
        <node id="n6" label="n6" otu="t2"/>
 
        <node id="n7" label="n7"/>
 
        <node id="n8" label="n8" otu="t5"/>
 
        <node id="n9" label="n9" otu="t4"/>
 
        <edge source="n1" target="n3" id="e1" length="1"/>
 
        <edge source="n1" target="n2" id="e2" length="2"/>
 
        <edge source="n3" target="n4" id="e3" length="3"/>
 
        <edge source="n3" target="n7" id="e4" length="1"/>
 
        <edge source="n4" target="n5" id="e5" length="2"/>
 
        <edge source="n4" target="n6" id="e6" length="1"/>
 
        <edge source="n7" target="n8" id="e7" length="1"/>
 
        <edge source="n7" target="n9" id="e8" length="1"/>
 
    </tree>
 
    <network id="tree3" xsi:type="nex:IntNetwork" label="tree3">
 
        <node id="n1" label="n1"/>
 
        <node id="n2" label="n2" otu="t1"/>
 
        <node id="n3" label="n3"/>
 
        <node id="n4" label="n4"/>
 
        <node id="n5" label="n5" otu="t3"/>
 
        <node id="n6" label="n6" otu="t2"/>
 
        <node id="n7" label="n7"/>
 
        <node id="n8" label="n8" otu="t5"/>
 
        <node id="n9" label="n9" otu="t4"/>
 
        <edge source="n1" target="n3" id="e1" length="1"/>
 
        <edge source="n1" target="n2" id="e2" length="2"/>
 
        <edge source="n3" target="n4" id="e3" length="3"/>
 
        <edge source="n3" target="n7" id="e4" length="1"/>
 
        <edge source="n4" target="n5" id="e5" length="2"/>
 
        <edge source="n4" target="n6" id="e6" length="1"/>
 
        <edge source="n7" target="n6" id="e7" length="1"/> <!-- extra edge -->
 
        <edge source="n7" target="n8" id="e7" length="1"/>
 
        <edge source="n7" target="n9" id="e8" length="1"/>
 
    </network>
 
</trees>
 
</xml>
 
 
 
==Characters==
 
The <code>characters</code> element defines comparative data. NeXML dictates that the allowed parameter space for the observations be declared with a <code>format</code> element and the actual observation with the <code>matrix</code> element; both nested within the <code>characters</code> element. Two broad observation categories are allowed: raw character sequences and granular character state observations. Under both these categories NeXML provides means of describing observed DNA, RNA, Protein, Restriction, Standard, and Conitnuous data.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id.
 
* <code>otus</code> - takes the id of an otus element defined previously.
 
* <code>xsi:type</code> - can be one of the following: <code>DnaSeqs</code>, <code>DnaCells</code>, <code>RnaSeqs</code>, <code>RnaCells</code>, <code>RestrictionSeqs</code>, <code>RestrictionCells</code>, <code>StandardSeqs</code>, <code>StandardCells</code>, <code>ContinuousSeqs</code>, <code>ContinuousCells</code>.
 
* <code>label</code> - a human readable description of the character block. A label is optional
 
 
 
The [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractBlock AbstractBlock] class is at the root of the <code>characters</code> hierarchy with the following subtypes:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractSeqs  AbstractSeqs ] - superclass for character blocks that consist of raw character sequences
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DnaSeqs DnaSeqs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#ProteinSeqs ProteinSeqs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RnaSeqs RnaSeqs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionSeqs RestrictionSeqs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardSeqs StandardSeqs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousSeqs ContinuousSeqs]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractCells AbstractCells] - superclass for character blocks that consist of granular character state observations
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DnaCells DnaCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#ProteinCells ProteinCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RnaCells RnaCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionCells  RestrictionCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardCells StandardCells]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousCells  ContinuousCells]
 
 
 
An NeXML <code>characters</code> block must have:
 
* <code>format</code> - one and only one for each concrete type.
 
* <code>matrix</code> - one and only one for each concrete type.
 
 
 
===Format===
 
The <code>format</code> element defines the the allowed characters and states in a matrix, and their ambiguity mapping. Within <code>format</code> is enclosed <code>states</code> and <code>char</code> definition.
 
 
 
Attributes: none.
 
 
 
<code>format</code> is modeled by the [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractFormat AbstractFormat] type with the following concrete types:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAFormat AAFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousFormat ContinuousFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAFormat DNAFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAFormat RNAFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionFormat RestrictionFormat]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardFormat StandardFormat]
 
 
 
A <code>format</code> element can have:
 
* <code>states</code> - one or more for each concrete type except <code>ContinuousFormat</code>. A Continuous type is stateless.
 
* <code>char</code> - one or more for each concrete type.
 
 
 
====states====
 
<code>states</code> is a container for defined states and their mappings.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>label</code> - a human readable description. This attribute is optional.
 
 
 
With [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractStates AbstractStates] class at the root of the hierarchy <code>states</code> can take up one of the following concrete implementations based on the type of <code>format</code>
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAStates AAStates]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAStates DNAStates]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAStates RNAStates]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionStates RestrictionStates]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/standard./#StandardStates StandardStates]
 
 
 
The type of <code>states</code> depends on the type of its parent <code>format</code>. Continuous data is stateless, hence there is no <code>states</code> type corresponding to it. Restriction data implies only two states: presence( 1 ) or absence( 0 ). Naturally, two( and only two ) <code>state</code> are defined for the <code>RestrictionStates</code> type. All other types may have zero or more <code>state</code> defined for them.
 
 
 
A <code>states</code> can have:
 
<code>state</code>
 
<code>uncertain_state_set</code>
 
<code>polymorphic_state_set</code> elements.
 
 
 
 
 
=====state=====
 
<code>state</code> defines a possible observation state with its <code>symbol</code> attribute.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id
 
* <code>symbol</code> - a simple type that restricts from [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractSymbol AbstractSymbol]
 
* <code>label</code> - a human readable description. A <code>label</code> is optional.
 
 
 
<code>state</code> is of the [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractState AbstractState] type. A <code>state</code> can be of the following kind depending on the type of the parent <code>states</code> element:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAState AAState] - takes a [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAToken AAToken] for <code>symbol</code>
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAState  DNAState] - takes a [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAToken DNAToken] for <code>symbol</code>
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAState  RNAState] - takes a [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAToken RNAToken] for <code>symbol</code>
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionState RestrictionState] - takes a [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionToken RestrictionToken] for <code>symbol</code>
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardState StandardState] - takes a [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardToken StandardToken] for <code>symbol</code>
 
 
 
=====polymorphic_state_set=====
 
<code>polymorphic_state_set</code> resolves ambiguous states in an "and" context. To define the possible ambiguities <code>polymorphic_state_set</code> nests zero or more <code>member</code> element.
 
 
 
Attributes: same as <code>state</code>.
 
 
 
[http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractPolymorphicStateSet AbstractPolymorphicStateSet]
 
 
 
=====uncertain_state_set=====
 
<code>uncertain_state_set</code> resolves ambiguous states in a "or" context. To define the possible ambiguities <code>uncertain_state_set</code> nests zero or more <code>member</code> element.
 
 
 
Attributes: same as <code>state</code>.
 
 
 
[http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractUncertainStateSet AbstractUncertainStateSet]
 
 
 
=====member=====
 
 
 
====char====
 
A <code>char</code> specifies which <code>states</code> apply to which matrix columns. A <code>char</code> is of the [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractChar AbstractChar] type.
 
 
 
Attributes:
 
With the type of the char in place, a char can take one or more of the following attributes:
 
* <code>id</code> - a file level unique id is a must for all the char elements
 
* <code>states</code> - the value being the id of a pre defined states element. Except the <code>ContinuousChar</code> type all chars take a states attribute.
 
* <code>codon</code> - specify the codon position. DnaChar and RnaChar optionally take the codon attribute.
 
 
 
Depending on the type of the parent <code>format</code> element, a <code>char</code> can be an instance of the following type:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAChar AAChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousChar ContinuousChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAChar DNAChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAChar RNAChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionChar RestrictionChar]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardChar StandardChar]
 
 
 
===Matrix===
 
A <code>matrix</code> element holds the state observations.
 
 
 
Attributes: none
 
 
 
A <code>matrix</code> can be one of twelve possible concrete types, six each sub classed under two abstract types:
 
*[http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractSeqMatrix AbstractSeqMatrix] - contains row which hold raw character sequences
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AASeqMatrix AASeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousSeqMatrix ContinuousSeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNASeqMatrix DNASeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNASeqMatrix RNASeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionSeqMatrix RestrictionSeqMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardSeqMatrix StandardSeqMatrix]
 
*[http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractObsMatrix AbstractObsMatrix] - contains rows which hold granular state observations
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAObsMatrix AAObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousObsMatrix ContinuousObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAObsMatrix DNAObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAObsMatrix RNAObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionObsMatrix RestrictionObsMatrix]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardObsMatrix StandardObsMatrix]
 
 
 
An NeXML <code>matrix</code> must have:
 
* <code>row</code> - one or more.
 
====Row====
 
The <code>row</code> element defines a single row of a <code>matrix</code>.
 
 
 
Attributes:
 
* <code>id</code> - a file level unique id.
 
* <code>label</code> - a human readable description of the row. A label is optional.
 
* <code>otu</code> - takes the id of an otu defined previously.
 
 
 
A <code>row</code> element has the following class hierarchy:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractSeqRow AbstractSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAMatrixSeqRow AAMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousMatrixSeqRow ContinuousMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAMatrixSeqRow DNAMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAMatrixSeqRow RNAMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionMatrixSeqRow RestrictionMatrixSeqRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardMatrixSeqRow StandardMatrixSeqRow]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractObsRow AbstractObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAMatrixObsRow AAMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousMatrixObsRow ContinuousMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAMatrixObsRow DNAMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAMatrixObsRow RNAMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionMatrixObsRow RestrictionMatrixObsRow]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardMatrixObsRow StandardMatrixObsRow]
 
 
 
A <code>row</code> element takes:
 
* <code>seq</code> - one and only one for each concrete subtype of [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractSeqRow AbstractSeqRow]
 
* <code>cell</code> - one or more for each concrete subtype of [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractObsRow AbstractObsRow]
 
 
 
=====seq=====
 
<code>seq</code> defines a raw character sequence. NeXML allows for six kind of sequences.
 
 
 
<code>seq</code> have the following hierarchy:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractSeq AbstractSeq]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNASeq DNASeq]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNASeq RNASeq]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AASeq AASeq]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractTokenList AbstractTokenList]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousSeq ContinuousSeq]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardSeq StandardSeq]
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionSeq RestrictionSeq]
 
 
 
=====cell=====
 
A <code>cell</code> defines granular observation.
 
 
 
Attributes:
 
* <code>state</code> - id of a <code>state</code> element defined previously.
 
* <code>char</code> - id of a <code>char</code> element defined previously.
 
 
 
<code>cell</code> have the following hierarchy:
 
* [http://nexml.org/nexml/html/doc/schema-1/characters/abstractcharacters/#AbstractObs AbstractObs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/protein/#AAObs AAObs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/continuous/#ContinuousObs ContinuousObs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/dna/#DNAObs DNAObs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/rna/#RNAObs RNAObs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/restriction/#RestrictionObs RestrictionObs]
 
** [http://nexml.org/nexml/html/doc/schema-1/characters/standard/#StandardObs StandardObs]
 
 
 
===Examples===
 
====Example 1====
 
<xml>
 
<characters otus="taxa1" id="standardchars6" xsi:type="nex:StandardSeqs" label="Standard sequences">
 
    <format>
 
        <!--
 
            The first elements inside a format element are stateset
 
            definitions. In this example, there is a set of four
 
            states, each tagged with an id. The symbol attribute is
 
            a shorthand token.
 
        -->
 
        <states id="standardstateset1">
 
            <state id="standardstates1" symbol="1"/>
 
            <state id="standardstates2" symbol="2"/>
 
            <state id="standardstates3" symbol="3"/>
 
            <polymorphic_state_set symbol="4" id="standardstates4">
 
                <member state="standardstates1"/>
 
                <member state="standardstates2"/>
 
            </polymorphic_state_set>
 
            <uncertain_state_set symbol="5" id="standardstates5">
 
                <member state="standardstates3"/>
 
                <member state="standardstates1"/>
 
            </uncertain_state_set>
 
        </states>
 
        <char states="standardstateset1" id="standardc1"/>
 
        <char states="standardstateset1" id="standardc2"/>
 
    </format>
 
    <!--
 
        The matrix in this example contains two columns, both
 
        referring to the same stateset - and so cells in both
 
        columns can occupy one of four states.
 
    -->
 
    <matrix>
 
        <row id="standardr1" otu="t1">
 
            <seq>1 2</seq>
 
        </row>
 
        <row id="standardr2" otu="t2">
 
            <seq>2 2</seq>
 
        </row>
 
        <row id="standardr3" otu="t3">
 
            <seq>3 4</seq>
 
        </row>
 
        <row id="standardr4" otu="t4">
 
            <seq>2 3</seq>
 
        </row>
 
        <row id="standardr5" otu="t5">
 
            <seq>4 1</seq>
 
        </row>
 
    </matrix>
 
</characters>
 
</xml>
 
====Example 2====
 
<xml>
 
    <characters otus="taxa1" id="m3" xsi:type="nex:ContinuousCells" label="Continuous characters">
 
        <format>
 
            <!--
 
                Because in this subclass, characters are marked up granularly -
 
                as cells - we must define the columns these cells belong to.
 
                Because this is continuous data, we don't (can't) define the
 
                states cells in these columns may occupy, hence there are no
 
                'states' elements in this subclass.
 
            -->
 
            <char id="ContinuousCharacter1" label="this is character 1"/>
 
            <char id="ContinuousCharacter2"/>
 
            <char id="ContinuousCharacter3"/>
 
            <char id="ContinuousCharacter4"/>
 
            <char id="ContinuousCharacter5"/>
 
        </format>
 
        <matrix>
 
            <row id="ContinuousCellsRow1" otu="t1">
 
                <!--
 
                    In this subclass, the 'state' attribute holds the raw
 
                    value of the cell (i.e. a floating point number), not
 
                    a reference to a state defined previously.
 
                -->
 
                <cell char="ContinuousCharacter1" state="-1.545414144070023"/>
 
                <cell char="ContinuousCharacter2" state="-2.3905621575431044"/>
 
                <cell char="ContinuousCharacter3" state="-2.9610221833467265"/>
 
                <cell char="ContinuousCharacter4" state="0.7868662069161243"/>
 
                <cell char="ContinuousCharacter5" state="0.22968509237534918"/>
 
            </row>
 
            <row id="ContinuousCellsRow2" otu="t2">
 
                <cell char="ContinuousCharacter1" state="-1.6259836379710066"/>
 
                <cell char="ContinuousCharacter2" state="3.649352410850134"/>
 
                <cell char="ContinuousCharacter3" state="1.778885099660406"/>
 
                <cell char="ContinuousCharacter4" state="-1.2580877968480846"/>
 
                <cell char="ContinuousCharacter5" state="0.22335354995610862"/>
 
            </row>
 
            <row id="ContinuousCellsRow3" otu="t3">
 
                <cell char="ContinuousCharacter1" state="-1.5798979984134964"/>
 
                <cell char="ContinuousCharacter2" state="2.9548251411133157"/>
 
                <cell char="ContinuousCharacter3" state="1.522005675256233"/>
 
                <cell char="ContinuousCharacter4" state="-0.8642016921755289"/>
 
                <cell char="ContinuousCharacter5" state="-0.938129801832388"/>
 
            </row>
 
            <row id="ContinuousCellsRow4" otu="t4">
 
                <cell char="ContinuousCharacter1" state="2.7436692306788086"/>
 
                <cell char="ContinuousCharacter2" state="-0.7151148143399818"/>
 
                <cell char="ContinuousCharacter3" state="4.592207937774776"/>
 
                <cell char="ContinuousCharacter4" state="-0.6898841440534845"/>
 
                <cell char="ContinuousCharacter5" state="0.5769509574453064"/>
 
            </row>
 
            <row id="ContinuousCellsRow5" otu="t5">
 
                <cell char="ContinuousCharacter1" state="3.1060827493657683"/>
 
                <cell char="ContinuousCharacter2" state="-1.0453787389160105"/>
 
                <cell char="ContinuousCharacter3" state="2.67416332763427"/>
 
                <cell char="ContinuousCharacter4" state="-1.4045634106692808"/>
 
                <cell char="ContinuousCharacter5" state="0.019890469925520196"/>
 
            </row>
 
        </matrix>
 
    </characters>
 
</xml>
 
====Example 3====
 
<xml>
 
    <characters otus="taxa1" id="characters3" xsi:type="nex:DnaSeqs" label="DNA sequences">
 
    <format>
 
    <states id="IUPACDNAStateSet1">
 
    <state id="NucA" symbol="A" />
 
    <state id="NucC" symbol="C" />
 
        <state id="NucG" symbol="G" />
 
        <state id="NucT" symbol="T" />
 
    <uncertain_state_set id="SymK" symbol="K">
 
        <member state="NucG" />
 
        <member state="NucT" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymM" symbol="M">
 
        <member state="NucA" />
 
        <member state="NucC" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymR" symbol="R">
 
        <member state="NucA" />
 
        <member state="NucG" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymS" symbol="S">
 
        <member state="NucC" />
 
        <member state="NucG" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymW" symbol="W">
 
        <member state="NucA" />
 
        <member state="NucT" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymY" symbol="Y">
 
        <member state="NucC" />
 
        <member state="NucT" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymB" symbol="B">
 
        <member state="NucC" />
 
        <member state="NucG" />
 
        <member state="NucT" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymD" symbol="D">
 
        <member state="NucA" />
 
        <member state="NucG" />
 
        <member state="NucT" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymH" symbol="H">
 
        <member state="NucA" />
 
        <member state="NucC" />
 
        <member state="NucT" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymV" symbol="V">
 
        <member state="NucA" />
 
        <member state="NucC" />
 
        <member state="NucG" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymN" symbol="N">
 
        <member state="NucA" />
 
        <member state="NucC" />
 
        <member state="NucG" />
 
        <member state="NucT" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymX" symbol="X">
 
        <member state="NucA" />
 
        <member state="NucC" />
 
        <member state="NucG" />
 
        <member state="NucT" />
 
    </uncertain_state_set>
 
    <uncertain_state_set id="SymGap" symbol="-" />
 
    <uncertain_state_set id="SymMiss" symbol="?">
 
        <member state="NucA" />
 
        <member state="NucC" />
 
        <member state="NucG" />
 
        <member state="NucT" />
 
        <member state="SymK" />
 
        <member state="SymM" />
 
        <member state="SymR" />
 
        <member state="SymS" />
 
        <member state="SymW" />
 
        <member state="SymY" />
 
        <member state="SymB" />
 
        <member state="SymD" />
 
        <member state="SymH" />
 
        <member state="SymV" />
 
        <member state="SymN" />
 
        <member state="SymX" />
 
        <member state="SymGap" />
 
    </uncertain_state_set>
 
    </states>
 
        <char id="ResidueCol1" states="IUPACDNAStateSet1" codon="2" />
 
        <char id="ResidueCol2" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol3" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol4" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol5" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol6" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol7" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol8" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol9" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol10" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol11" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol12" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol13" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol14" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol15" states="IUPACDNAStateSet1" />
 
        <char id="ResidueCol16" states="IUPACDNAStateSet1" />
 
    </format>
 
    <matrix>
 
        <row otu="t1" id="DNASequence1"><seq>A C G C T C G C A T C G C A T C</seq></row>
 
        <row otu="t2" id="DNASequence2"><seq>A C G C T C G C A T C G C A T C</seq></row>
 
        <row otu="t3" id="DNASequence3"><seq>A C G C T C G C A T C G C A T C</seq></row>
 
    </matrix>
 
    </characters>
 
</xml>
 
 
 
==Semantic Annotation==
 
Fundamental data objects in NeXML can be annotated using RDFa. This way, the annotationsare directly available to any off-the-shelf RDFa parser but are also simple to integrate into non- RDF-aware processing libraries. The annotations are expressed using recursively nested meta elements, and are essentially triples whose subjects are identified by the about attribute. To specify that a NeXML element such as a tree is the subject, the about attribute and the id attribute of the element must match. This way, core NeXML elements can be converted to RDF (for example by using an XSL stylesheet), RDFa annotations can be extracted from the NeXML (using another stylesheet), and the two resulting graphs can be aligned by the subjects of their respective triples.
 
 
 
* If the triple’s object value is a literal such as a string or a number, the exact data type is specified by the datatype attribute; these types are typically core XML schema types such as xsd:string for atomic types, or rdf:Literal for nested XML element structures that are to be parsed as opaque literals in an RDF graph. The object value is enclosed inside the meta element and the predicate is specified using the property attribute (whose value must be a CURIE). meta elements of this type are of the subclass nex:LiteralMeta.
 
* If the triple’s object value is a remote resource, its location is specified using the href attribute. The predicate for this class of triples is specified as a CURIE using the rel attribute. meta elements of this type are of the subclass nex:ResourceMeta.
 
* If the triple’s object value is a nested annotation, its predicate is specified as a CURIE using the rel attribute. Since this enclosing meta element is to be transformed into an anonymous RDF node, it needs to be identified as the subject of a reification by assigning it an about attribute (as per the RFa rules for identifying triple subjects).
 
  
 
[[Category:NeXML and RDF API for BioRuby]]
 
[[Category:NeXML and RDF API for BioRuby]]

Latest revision as of 19:04, 22 September 2011

Note: This page has moved to: https://github.com/rvosa/bio-nexml/wiki/NeXML-API-for-BioRuby