Morphological Characters Documentation

From Phyloinformatics
Jump to: navigation, search

Introduction

This Phyloinformatic Hackathon page describes how one might create phylogenetic data from morpological data. Many phyloinformaticists are exclusively concerned with phylogenetic analysis based on analysis of sequence but an equally important, and vibrant, means of classification uses features or characters from developmental and morphological studies. One could argue further that these two seemingly distinct approaches will eventually unify, as we begin to understand the details of the relationship between gene and physical manifestation.

Regardless of one's data the result will be a tree, and this critical use case discusses how one might create this tree from observations of characters or phenotypes.

Inferring a Phylogenetic Tree from Morphological Characters

Based on the work of Brian Sidlauskas (http://www.duke.edu/~bls16).

Applications

Steps

1. Create a Character Matrix

Mesquite can be used to assemble a spreadsheets of characters. The matrix is a simple 2-dimensional matrix, representing species, characters, and the states of those characters. For example:


Species Shape of anterior portion of mesethmoid Overlap of sphenotic by sixth infraorbital in lateral view Presence or absence of mesocoracoid Number of teeth
Erethistes pusillus 0 1 0 3
Gymnotus coatesi 1 0 1 4
Gymnotus carapo 1 1 0 3


Key

Number of teeth: 3=three, 4=four

Presence or absence of mesocoracoid: 0 = absent, 1 = present

Shape of anterior portion of mesethmoid: 0 = hooked, 1 = straight

Overlap of sphenotic by sixth infraorbital in lateral view: 0 = bones overlap, 1 = bones do not overlap

2. Create a NEXUS File
  • Given a character matrix one can create a NEXUS file using Mesquite.
  • Manually add analysis switches to the NEXUS file created by Mesquite, such as
    • Heuristic search options
    • Outgroup assignments
    • Branch and bound
    • Number of iterations

Calculating Support Values for a Phylogenetic Tree

Hackathon Contributions

BioPerl and Bio::Phylo

The challenge here is to make matrix objects compatible between BioPerl and Bio::Phylo. BioPerl has an alignment object (i.e. a molecular matrix), but currently no matrix used for categorical data. However, objects implementing AlignI could be modified to allow for non-molecular data - by changing the alphabets used to check for valid characters.

BioPerl

The idea is that Bio::SimpleAlign objects are composed of Bio::LocatableSeq objects, which themselves follow the usual Bio::Seq concept of containing protein, dna or rna alphabets. That concept is hard-coded by regular expressions that sanity-check each sequence.

If we made these regular expressions get-set-able (and/or defined in the constructor), then Bio::Seq objects can contain more generic "sequences" of arbitary single-character alphabets. SimpleAlign objects may now be reused to represent matrices of morphological characters. Note that the SimpleAlign object is allowed to contain such an unordered matrix already, it is not required that there be an alignment in the object. Not yet implemented.

This section depends on work that is in progress. Please help by contributing.

Bio::Phylo

The Bio::Phylo matrix object's API is being adapted to conform to the interface between BioPerl and Bio::Phylo.

This section depends on work that is in progress. Please help by contributing.

BioRuby

This use case has been addressed by developing a BioRuby NEXUS model and a NEXUS parser, as well as developing a parser for PAUP results and PAUP/TNT results.