Morphological Characters Documentation
Contents
Introduction
This Phyloinformatic Hackathon page describes how one might create phylogenetic data from morpological data. Many phyloinformaticists are exclusively concerned with phylogenetic analysis based on analysis of sequence but an equally important, and vibrant, means of classification uses features or characters from developmental and morphological studies. One could argue further that these two seemingly distinct approaches will eventually unify, as we begin to understand the details of the relationship between gene and physical manifestation.
Regardless of one's data the result will be a tree, and this critical use case discusses how one might create this tree from observations of characters or phenotypes.
Inferring a Phylogenetic Tree from Morphological Characters
Based on the work of Brian Sidlauskas (http://www.duke.edu/~bls16).
Applications
Steps
1. Create a Character Matrix
Mesquite can be used to assemble a spreadsheets of characters. The matrix is a simple 2-dimensional matrix, representing species, characters, and the states of those characters. For example:
Species | Shape of anterior portion of mesethmoid | Overlap of sphenotic by sixth infraorbital in lateral view | Presence or absence of mesocoracoid | Number of teeth |
---|---|---|---|---|
Erethistes pusillus | 0 | 1 | 0 | 3 |
Gymnotus coatesi | 1 | 0 | 1 | 4 |
Gymnotus carapo | 1 | 1 | 0 | 3 |
Key
Number of teeth: 3=three, 4=four
Presence or absence of mesocoracoid: 0 = absent, 1 = present
Shape of anterior portion of mesethmoid: 0 = hooked, 1 = straight
Overlap of sphenotic by sixth infraorbital in lateral view: 0 = bones overlap, 1 = bones do not overlap
2. Create a NEXUS File
- Given a character matrix one can create a NEXUS file using Mesquite.
- Manually add analysis switches to the NEXUS file created by Mesquite, such as
- Heuristic search options
- Outgroup assignments
- Branch and bound
- Number of iterations
Calculating Support Values for a Phylogenetic Tree
Hackathon Contributions
BioPerl and Bio::Phylo
The challenge here is to make matrix objects compatible between BioPerl and Bio::Phylo. BioPerl has an alignment object (i.e. a molecular matrix), but currently no matrix used for categorical data. However, objects implementing AlignI could be modified to allow for non-molecular data - by changing the alphabets used to check for valid characters.
BioPerl
The idea is that Bio::SimpleAlign objects are composed of Bio::LocatableSeq objects, which themselves follow the usual Bio::Seq concept of containing protein, dna or rna alphabets. That concept is hard-coded by regular expressions that sanity-check each sequence.
If we made these regular expressions get-set-able (and/or defined in the constructor), then Bio::Seq objects can contain more generic "sequences" of arbitary single-character alphabets. SimpleAlign objects may now be reused to represent matrices of morphological characters. Note that the SimpleAlign object is allowed to contain such an unordered matrix already, it is not required that there be an alignment in the object. Not yet implemented.
This section depends on work that is in progress. Please help by contributing.
Bio::Phylo
The Bio::Phylo matrix object's API is being adapted to conform to the interface between BioPerl and Bio::Phylo.
This section depends on work that is in progress. Please help by contributing.
BioRuby
This use case has been addressed by developing a BioRuby NEXUS model and a NEXUS parser, as well as developing a parser for PAUP results and PAUP/TNT results.