Estimate Divergence Documentation

From Phyloinformatics
Jump to: navigation, search

Introduction

This Phyloinformatics Hackathon page discusses work that the participants contributed to enable calculation of divergence. The term has at least two significant meanings in phyloinformatics, and these meanings are related to each other. First, it refers to the degree of relatedness of a pair of sequences or some generalized expression of relatedness derived from many pairs. Second, it refers to branching or divergence on a species trees. It is a fundamental principle in phyloinformatics that one can use the former to estimate the latter.

Hackathon Contributions

HyPhy

The HyPhy group intends to contribute the McDonald-Kreitman algorithm in HyPhy. In addition they generalized McDonald-Kreitman such that one could estimate divergence across genomes, or thousands of alignments.

This section depends on work that is in progress. Please help by contributing.

BioPerl

BioPerl intends to contribute to this use case by creating the Bio::Tools::Run::Phylo::r8s::NPRS and Bio::Tools::Run::Phylo::r8s::PL modules. Not yet done, these modules will depend on work on Bio::NEXUS as the input to r8s is a NEXUS file.

This section depends on work that is in progress. Please help by contributing.

BioPython

BioPython contributed to this use case by creating a wrapper for running HyPhy which can estimate divergence using the McDonald-Kreitman algorithm mentioned above. As of this writing HyPhy has not yet implemented McDonald-Kreitman but we include this example due to the general utility of the Biopython wrapper.

This section depends on work that is in progress. Please help by contributing.

Using the HyPhy Module

The HyPhy batch language operates operates as a user-functional layer above the low-level functions. The default distribution of HyPhy includes a library of template batch files that execute an assortment of conventional phylogenetic analysis, as well as many customized ones.

The Biopython HyPhy module operates by executing a HyPhy batch file, written in the HyPhy batch language.

Installation

There are two prerequisites to using Biopython as a wrapper around HyPhy:

  1. Install Biopython, following the instructions shown here: http://biopython.org/wiki/Download
  2. Install HyPhy, following the instructions shown here: http://www.hyphy.org/downloads/index.html
Writing a Wrapper Script

In this example the script calls the HyPhy batch file that executes FEL (fixed-effects likelihood). This algorithm estimates nonsynonymous and synonymous substitution rates for each codon position in an alignment of protein-coding sequences. As a result, it is a parameter-rich method that requires a large data set (20 sequences is a bare minimum).

An additional input to the script is the name of the alignment file itself, p51_short.seq.

<python> import Hyphy

  1. a global variable required by the Hyphy module, specifies where
  2. the Hyphy executable can be found

USER_PATH = "/Users/apoon/Source/HYPHY_Source/data/"

  1. call Hyphy batch file for fixed-effects likelihood (FEL) analysis
  2. of sequences for positive selection. Takes as its argument the
  3. absolute path to a file containing sequences. File can be FASTA,
  4. NEXUS, or MEGA formatted.

res = Hyphy.fel (USER_PATH + 'p51_short.seq')

  1. spool contents of return string to screen, should contain a tab-separated
  2. table where each row corresponds to a codon position in the alignment.

print res </python>