Phylogenetic Footprinting Documentation

From Phyloinformatics
Revision as of 15:39, 15 December 2006 by (talk) (Approach)
Jump to: navigation, search


This critical use case describes how one might discover and characterize conserved sequence features shared in related genomes or genomic sequences. Footprinting and Shadowing are similar but typically differ in the number of sequences analyzed and the relatedness of those sequences.

Phylogenetic Footprinting

Phylogenetic footprinting seeks to find regulatory sequences or specific features of non-coding DNA by analyzing DNA within and around aligned, orthologous genes. The assumption is that mutation is more tolerated outside of critical sequence regions, thus sequences conserved between species are likely to be critical sequences. The species being studied need to be sufficently diverged that mutation has acted appreciably yet sufficently related that the relationships between or RNA- or DNA-binding proteins and their corresponding bound sequences have not significantly changed.

Phylogenetic Shadowing

Phylogenetic shadowing differs in that the group of species in question ideally contains both closely related and more divergent species. The idea is that by combining data from various pairwise comparisons one should observe enough mutation that conserved regions can be identified even if there is no exact alignment between the most divergent pairs. Thus part of the result from this type of analysis is a shadow, a conserved region shared by all species that lacks base-pair definition.



  1. Identify and extract orthologous sequences through genome synteny or by some other means.
  2. Align orthologous sequences.
  3. Obtain a tree corresponding to the alignment.
  4. Calculate total tree length of the alignment on the tree.
  5. Implement the tree-length calculation in a sliding-window fashion.
  6. Identify regions that significantly deviate from the average tree-length.

This stepwise description can be done using Clustalw. Sendu Bala is providing a new method in the module, found in the bioperl-run package of BioPerl, which can do either phylogenetic footprinting or phylogenetic shadowing.

Sendu has also fixed the run() method in, found in the bioperl-run package of BioPerl, so this module can now be used for phylogenetic footprinting.