Phyloinformatics Summer of Code 2013/Summaries

From Phyloinformatics
Jump to: navigation, search

For the seventh summer in a row, NESCent offered a number of internships aimed at introducing students to open-source software development. This summer, six interns from the 2013 Google Summer of Code™ program, and one from the GNOME Outreach Program for Women worked remotely on an evoinformatics project of their own choosing, each under the guidance of an experienced mentor. NESCent’s 2013 Summer of Code students included Chanda Phelan, Paul Frandsen, Monica-Andreea Dragan, Imran Fanaswala, Joshua Lynch, Zheng Ruan and Yanbo Ye. Their projects ranged from machine learning for ecological genomics to phylogenetics in BioPython to visualization of evolutionary trees.

Students: when you edit this page, there is a template that you can copy and paste for your summary. Take a look at last year's page for examples.


Imran smile2.jpg
Phyml beagle.png

Student: Imran Aslam

Mentor(s): Andrew Rambaut, Marc Suchard, Stephane Guindon

Project: Extend PhyML to use the BEAGLE library

PhyML is a phylogenetic software for inference of evolutionary relationships through genome sequences. These relationships are represented as trees; PhyML creates trees and eventually presents the "best" tree to the user. Of course, to determine the "best" tree PhyML needs to evaluate thousands of tree configurations which is computationally demanding. This project's goal was to use the BEAGLE library to do the heavy-lifting of the computations in PhyML.

Technically speaking, PhyML explores the tree space in terms of branch lengths, topology, substitution rate parameters, etc. This exploration induces maximum likelihood calculations at each step. Under the Markovian assumption of independent events and time-reversibility, one can paralleize (1) partial likelihoods computations and (2) P-matrix updating (i.e. matrix multiplication) thus are suitable for computation across cores. This project implemented the aforementioned optimizations.



Monica predictions foto.jpg
Monica predictions.png

Student: Monica Dragan

Mentors: Anurag Priyam and Yannick Wurm

Project: Identifying problems with gene predictions

Genome sequencing is now possible at almost no cost. However, obtaining accurate gene predictions remains a target hard to achieve with the existing technology. GeneValidator is a tool that identifies problems with gene predictions, based on similarities with data from public databases. We apply a set of validation tests that provide useful information about the problems that appear in the predictions, in order to make evidence about how the gene curation can be made or whether a certain predicted gene may not be considered in other analysis.

Code | Blog | Documentation



Student: Zheng Ruan

Mentors: Eric Talevich and Peter Cock

Project: Codon Alignment and Analysis in Biopython

A codon alignment is an alignment of nucleotide sequences in which the trinucleotides correspond directly to amino acids in the translated protein product. Codon Alignment is widely used in calculating evolutionary parameters and a variety of neutrality test. During this project, I implemented a new module -- CodonAlign in Biopython to construct codon alignment and apply many applications of it. The implementation is capable of dealing with mismatch and frameshift events. Codon Alignment can also be written in a variety MSA formats for further analysis. The code is available at CodonAlign.

Code | Blog | Document


Student: Yanbo Ye

Mentor(s): Mark Holder, Jeet Sukumaran, Eric Talevich

Project: Phylogenetics in Biopython: Filling in the gaps

Bio.Phylo is a phylogenetics package in the open source bioinformatics library -- Biopython. This project is to implement several tree construction and consensus tree algorithms that are not available in previous versions, including UPGMA, NJ, MP, strict consensus, majority rule, adam consensus and some branch support, bootstrap algorithms. Now these algorithms are completed in two modules, TreeConstruction and Consensus, in Bio.Phylo.

Code | Blog | Document



Student: Paul Frandsen

Mentor(s): Rob Lanfear, Brett Calcott

Project: Extend PartitionFinder to automatically partition DNA and protein alignments

PartitionFinder uses heuristic algorithms to combine similar blocks of data, which are assigned a priori by the user, into a model partitioning scheme. This project implemented a new algorithm that uses site rates or site likelihoods to automatically partition sites in a DNA and protein alignments into a best fit partitioning scheme.

Code | Project Wiki



Student: Joshua Lynch

Mentors: Kathryn Iverson, Abu Zaher Md. Faridee

Project: Implementing Machine Learning Algorithms for Classification and Feature Selection in mothur

mothur is a bioinformatics tool developed for and by the microbial ecology community. This project implemented a new classification and feature selection command for mothur using the support vector machine algorithm.

Code | Project Wiki



Student: Chanda Phelan

Mentors: Gabriel Harp, Stephen A. Smith

Project: Phylet: Open Tree of Life Graph Visualization and Navigation

Phylet is a visualization tool for phylogenetic graphs, powered by the graph database Neo4j and visualized by the JavaScript library D3. The project aims to create hierarchical graphs (rather than network graphs) that include conflicts between sources, on a platform accessible enough that it can also be used by a lay audience.

Code | Blog