NEXUS Specification

From Phyloinformatics
Jump to: navigation, search

Maddison, et al., 1997

The full text PDF of Maddison et al (1997) is available online in the Syst. Biol. archives (if you have access).

Formal BNF grammar of NEXUS

Iglesias, et al 2001 have written a formal BNF grammar for NEXUS. Don't treat this as gospel, its just an attempt to get the syntax rules. There is some missing text-processing (case of keywords should be irrelevant), and at least one mistake (in the "items" subcommand of the "format" command of the characters block, "average" is miss-spelled), which means that the BNF itself was not fully tested.

List of NEXUS keywords from the Bio::NEXUS documentation

Note that after the #NEXUS keyword, a NEXUS file is a list of sentences, each of which begins with a command keyword and ends with a semi-colon. Between the command and the semi-colon there may be sub-commands (with or without args) and symbols like "=" (e.g., "begin taxa;", "dimensions nchar=53;", "taxset 'beans' = 1-5,7"). Below is a table from the Bio::NEXUS documentation that lists all of the command and sub-command keywords (by Vivek Gopalan and Arlin Stoltzfus, 8 Aug 2006). Notes:

  • used by permission of the authors (Astoltzfus 13:32, 22 December 2006 (EST))
  • some commands appear in multiple blocks
  • p.594: "Programs that read NEXUS files do not have to be able to understand all aspects of the file format; In fact, no program at this time can understand more than about 60% of elements described in this document"
  • column 4 ("Is ObjDef") refers to 'object definition command' as on p 619 of the standard.
Key words Blocks Is Command Is ObjDef Modifies command Arguments # of blocks using command # in Bio:: NEXUS Comment Documentation Testing
#NEXUS - FALSE FALSE - 1 1
begin all TRUE FALSE - 10 10
taxa taxa FALSE FALSE begin 1 1
dimensions "taxa, characters, unaligned, distances" TRUE FALSE - 4 4 "In taxa block, must appear before taxlabels; in characters block must proceed charlabels, charstatelabels, statelabels, , matrix"
ntax "taxa, distances" FALSE FALSE "dimensions, newtaxa" number-of-taxa 3 3 use only in taxa block or with newtaxa command
taxlabels "taxa, characters, unaligned, distances" TRUE FALSE - taxon-name .. 4 4 "taxon names- ""must not correspond to another taxon name or number"""
end all TRUE FALSE 10 10
characters characters FALSE FALSE begin 1 1
newtaxa "characters, unaligned, distances" TRUE FALSE dimensions ntax 3 0 Must appear before ntax. deprecated.
nchar "characters, distances" FALSE FALSE dimensions number-of-characters 2 2
format "characters, unaligned, distances" TRUE FALSE - 3 3 "in characters block must proceed charlabels, charstatelabels, statelabels, matrix"
datatype "characters, unaligned" TRUE FALSE format {standard | DNA | RNA | nucleotide | protein | continous} 2 2
respectcase "characters, unaligned" TRUE FALSE format none 2 0
missing "characters, unaligned, distances" TRUE FALSE format symbol 3 3
gap characters TRUE FALSE format symbol 1 1
symbols "characters, unaligned" TRUE FALSE format "symbol .." 2 2
equate "characters, unaligned" TRUE FALSE format "symbol = entry .." 2 0
matchchar characters TRUE FALSE format symbol 1 0
labels "characters, unaligned, distances" TRUE FALSE format none 3 3 mutually exclusive with nolabels
nolabels "characters, unaligned, distances" TRUE FALSE format none 3 3 mutually exclusive with labels
transpose characters TRUE FALSE format none 1 0
interleave "characters, distances" TRUE FALSE format none 2 2
items characters TRUE FALSE format (items ..) items = {min | max | median | average | variance | stderror | samplesize | states} 1 0
statesformat characters TRUE FALSE format {statespresent | individuals | count | frequency} 1 1
tokens characters TRUE FALSE format none 1 1 mutually exclusive with notokens
notokens characters TRUE FALSE format none 1 1 mutually exclusive with tokens
eliminate characters TRUE FALSE - character-set 1 0 "in characters block must proceed charlabels, charstatelabels, statelabels, matrix"
charstatelabels characters TRUE FALSE - character-number character-name/ state-name .. 1 1
charlabels characters TRUE FALSE - character-name .. 1 1
statelabels characters TRUE FALSE character-number state-name .. 1 1
matrix "characters, unaligned, distances" TRUE FALSE - data-matrix 3 3
unaligned unaligned FALSE FALSE begin - 1 1
distances distances FALSE FALSE begin - 1 1
triangle distances TRUE FALSE format {lower | upper | both} 1 1
diagonal distances TRUE FALSE format 1 1 mutually exclusive with nodiagonal
nodiagonal distances TRUE FALSE format 1 1 mutually exclusive with diagonal
data data FALSE FALSE begin 1 1 Equivalent to characters block where the newtaxa subcommand is included in the dimensions command. Deprecated
codons codons FALSE FALSE begin - 1 0
codonposset codons TRUE TRUE "codonposset [*] name [({standard | vector}) = N: character-set, 1: character-set, 2: character-set, 3: character-set;" 1 0
geneticcode codons TRUE TRUE geneticcode code-name = genetic-code-description 1 0 "predefined code-names = "universal, universtal.ext, mtdna.dros, mtdna.dros.ext, mtdna.mam, mtdna.mam.ext, mtdna.yeast"
codeorder codons TRUE FALSE geneticcode 132 or other 1 0
nucorder codons TRUE FALSE geneticcode TCAG or other 1 0
tokens codons TRUE FALSE geneticcode 1 0 mutually exclusive with notokens
notokens codons TRUE FALSE geneticcode 1 0 mutually exclusive with tokens
extensions codons TRUE FALSE geneticcode symbol .. 1 0
codeset codons TRUE TRUE codeset [*] code-set-name [({characters | unaligned | taxa})] = code-name: character-set or taxon-set .. or all 1 0
sets sets FALSE FALSE begin 1 1
charset sets TRUE TRUE - charset charset-name [( {standard | vector})] = character-set 1 0
stateset sets TRUE TRUE - stateset stateset-name [({standard | vector})] = state-set 1 0
changeset sets TRUE TRUE - changeset changeset-name = state-set <-> state-set .. 1 0
taxset sets TRUE TRUE - taxset taxset-name [({standard | vector})] = taxon-set 1 1
treeset sets TRUE TRUE - "treeset charset-name [({standard, vector})] = character-set" 1 0
charpartition sets TRUE TRUE - charpartition partition-name [({standard | vector}) {tokens | notokens}] = subset-name : character-set 1 0
tokens sets TRUE FALSE "charpartition, taxpartition, treepartition" none 1 0 mutually exclusive with notokens
notokens sets TRUE FALSE "charpartition, taxpartition, treepartition" none 1 0 mutually exclusive with tokens
taxpartition sets TRUE TRUE - taxpartition partition-name [({standard | vector}) {tokens | notokens}] = subset-name : taxon -set 1 0
treepartition sets TRUE TRUE - treepartition partition-name [({standard | vector}) {tokens | notokens}] = subset-name : tree-set 1 0
assumptions assumptions FALSE FALSE begin 1 1
options assumptions TRUE FALSE - 1 0
deftype assumptions TRUE FALSE options deftype = type-name 1 0
polytcount assumptions TRUE FALSE options {minstep | maxsteps} 1 0
gapmode assumptions TRUE FALSE options {missing | newstate} 1 0
usertype assumptions TRUE TRUE - usertype type-name [({stepmatrix | cstree})] = usertype-description 1 0
typeset assumptions TRUE TRUE - "typeset [*] typeset-name [(standard, vector)] = type-set-definition" 1 0
wtset assumptions TRUE TRUE - wtset [*] wtset-name [({standard | vector})] = wtset-definition 1 1
exset assumptions TRUE TRUE - exset [*] exset-name [({standard | vector})] = exset-definition 1 0
ancstates assumptions TRUE TRUE - ancstates [*] ancstates-name [({standard | vector})] = ancstates-definition 1 0
trees Trees FALSE FALSE begin 1 1
translate Trees TRUE FALSE - 1 1
tree Trees TRUE TRUE tree [*] tree-name = tree-specification 1 1
[&R] Trees FALSE FALSE tree 1 1
[&U] Trees FALSE FALSE tree 1 1
notes notes FALSE FALSE begin text [ taxon = taxon-set] [character = character-set] [state = state-set][ tree = tree-set] source = {inline | file | resource} text = text-or-source descriptor 1 0
text notes TRUE FALSE picture [ taxon = taxon-set] [character = character-set] [state = state-set][ tree = tree-set] [format = {PICT | TIFF | EPS | JPEG | GIF}] source = {inline | file | resource} picture = picture-or-source descriptor 1 0
picture notes TRUE FALSE 1 0
taxon notes FALSE FALSE "text, picture" 1 0
character notes FALSE FALSE "text, picture" 1 0
state notes FALSE FALSE "text, picture" 1 0
tree notes FALSE FALSE "text, picture" 1 0
source notes FALSE FALSE "text, picture" 1 0
text notes FALSE FALSE text 1 0
encode notes FALSE FALSE picture 1 0
picture notes FALSE FALSE picture 1 0


References

David R. Maddison, David L. Swofford, and Wayne P. Maddison. NEXUS: An Extensible file format for Systematic Information, Systematic Biology, 46, 590-621, 1997