Phyloinformatics Summer of Code 2010/Summaries
Student software developers showcase their work
For the fourth summer in a row, NESCent offered a number of internships aimed at introducing students to open-source software development. This summer, five interns from the 2010 Google Summer of CodeTM program worked remotely on an evoinformatics project of their own choosing, each under the guidance of an experienced mentor.
NESCent’s 2010 Summer of Code students included Filip Balejko from the University of Warsaw, Kathryn Iverson from the University of Michigan, Lauren Lui from UC Santa Cruz, Anurag Priyam from the Indian Institute of Technology Kharagpur, and Conrad Stack from Pennsylvania State University. Their projects ranged from adding phylogenetics analysis steps to a popular data processing workflow system, to adding support for emerging interoperability standards to programming libraries, to creating a programming toolkit that can create Google Earthcompatible geophylogenies. As their profiles below demonstrate, the students put their summers to very good use!
Student: Filip Balejko (University of Warsaw)
Mentor(s): Sergei Kosakovsky Pond, Brad Chapman, Anton Nekrutenko
Project: Galaxy phylogenetics pipeline development
Galaxy is a popular web based interface for integrating biological tools and analysis pipelines. HyPhy provides a popular package for molecular evolution and sequence statistical analysis. This project brings HyPhy workflows into the Galaxy system, standardizing these analyses on a widely used platform. The goal of my project was to design an implementation pattern and integrate two particular methods for inferring positive selection on coding sequences, SLAC and PARRIS. My code produces output customized for SLAC and a simple and intuitive web interface built on AJAX.
Student: Kathryn Iverson (University of Michigan)
Mentor(s): David Kidd, Karen Cranston, Xianhua Liu, Bill Piel
Project: PhyloGeoRef - a library for implementing geophylogenies in KML and Google Earth
My Summer of Code project was to create a library in Java that generates geophylogenies in KML format, which can be displayed by Google Earth. A geophylogeny is a phylogenetic tree that shows a group of organisms' geographic and evolutionary distribution. The PhyloGeoRef library can take a phylogenetic tree and a list of coordinates and display it in 3D space with Google Earth.
Student: Lauren Lui (UC Santa Cruz)
Mentor(s): Jim Procter, Albert Vilella
Project: Extending Jalview Capabilities to Support RNA Sequence Alignment Annotation and Secondary Structure Visualization
I added features to Jalview, a popular open-source platform-independent multiple sequence alignment editor and analysis platform, to aid annotation and visualization of RNA secondary structures. Jalview now has the ability to import existing RNA sequences and alignments from the RFAM database, support for RNA secondary structure visualization in alignments, and coloring schemes relevant to RNA alignments.
Student: Anurag Priyam (Indian Institute of Technology Kharagpur)
Mentor(s): Rutger Vos, Jan Aerts
Project: Develop an API for NeXML I/O, and, RDF triples for BioRuby
BioRuby is an open-source code library for bioinformatics based on the Ruby programming language, RDF is a general-purpose standard for semantic annotation of data objects on the web, and NeXML is an XML-based, RDF-aware file format specifically for phylogenetic data. For my project, I added support for working with RDF and NeXML in BioRuby. I created an annotation module, built upon RDF, that provides methods to semantically annotate and query BioRuby objects and a NeXML module that enables BioRuby users to work seamlessly with NeXML files. My code integrates into BioRuby's existing framework for manipulating phylogenetic and comparative data on molecules and morphology.
Student: Conrad Stack (Pennsylvania State University)
Mentor(s): Brian O'Meara, Luke Harmon
Project: Ancestral State Reconstruction in R
RBrownie is a software package for the R Project for Statistical Computing. It is based on the Brownie core library, written in C++, for testing evolutionary hypotheses about the evolution of morphological and life history characters in the context of a phylogenetic tree. RBrownie merges this functionality with R's user-friendly coding environment. By adding new classes and a data processing function, my code extends R's capacity for comparative phylogenetic analyses.