Phyloinformatics Summer of Code 2012/Summaries

From Phyloinformatics
Jump to: navigation, search

Student software developers showcase their work

For the sixth summer in a row, NESCent offered a number of internships aimed at introducing students to open-source software development. This summer, seven interns from the 2012 Google Summer of Code™ program, and one independent intern, worked remotely on an evoinformatics project of their own choosing, each under the guidance of an experienced mentor. NESCent’s 2012 Summer of Code students included Abu Zaher Md. Faridee, Daniel Gates, Pulkit Goyal, Elliot Hauser, Islam Ismailov, Michael Landis, Anne Ménard and Justs Zarins. Their projects ranged from machine learning for ecological genomics to optimizing R code for Approximate Bayesian Computing to tools for metadata extraction for phylogenetics.


MASTodonApp v0.3.png

Student: Justs Zarins

Mentor(s): Andrew Rambaut, Karen Cranston, Benjamin Redelings

Project: MASTodon: Summary and visualization of phylogenetic tree sets

MASTodon is a Java application that looks for common subtrees in large sets of phylogenetic trees. It provides a user-friendly graphical interface, automatic pruning algorithms (with the possibility to implement more) as well as powerful manual pruning options.



Student: Pulkit Goyal

Mentor(s): Matt Yoder, Christopher Baron

Project: jMatrixBrowse: GMaps-like Matrix Browsing

jMatrixBrowse is a jQuery plugin for browsing large matrices by dragging them arround(like Google Maps). It renders a browsable canvas that can be panned and zoomed easily. The data is retrieved from an API that gives the information about the matrix to be rendered and the values in the cells. An online demo is available here.



Student: Michael Landis

Mentor(s): Trevor Bedford, Andrew Rambaut

Project: Phylowood.js: Browser-based Interactive Animations of Ancestral Dispersal and Diversity Patterns

Phylowood is a browser-based Javascript web utility that generates animations of when and where species or populations were distributed across Earth. Because of the high-dimensionality of this data, the animations are interactive so the user may focus on the temporal, phylogenetic, and geographical contexts of interest. To learn more, see the Phylowood tutorial.



Student: Elliott Hauser

Mentor(s): Hilmar Lapp

Project: NeXML to MIAPA Mapping & ISAtab Transformation

Miapa-etl is a command line tool for converting TreeBASE NeXML files into ISAtab format compatible with the open source ISAtools data curation suite. This project was a first step towards interoperability between the phylogenetics community and the larger life sciences community using ISAtool to store, curate, and distribute scientific data and metadata. It was also one of the first attempts at mapping computational 'in silico' experiments such as NeXML onto a wet lab 'in vivo' data model like the one in ISAtab. Full documentation can be found in the project Google folder.



Student: Anne Ménard

Mentor(s): Yann Ponty and Jim Procter

Project: Towards a fully RNA-aware alignment editor

JalView is a multiple sequence alignment software, initially focused on amino-acids sequences. The project consisted in extending of JalView’s support for RNA structural features, with an special emphasis on tertiary interactions, which can now be loaded from specific XML files (RNAML format), directly annotated from 3D experimental models (PDB files), and displayed graphically using an integrated version of the Varna software. This joint vision of the sequence evolution, 3D architecture and tertiary module should allow for integrated studies of molecular function.



Student: Islam Ismailov

Mentor(s): James Degnan, Tanja Stadler

Project: Ranked gene tree topologies probability computation

Ranked Gene Trees is a polynomial-time algorithm implementation for computing probabilities of ranked gene tree topologies given species trees. Ranked gene tree probabilities could be used to infer species trees, although inferring species trees is beyond the scope of the project. The idea is to consider ranked gene tree topologies, where we distinguish the relative order of times of nodes on gene trees, but not the real-valued branch lengths.



Student: Abu Zaher Md. Faridee

Mentor(s): Kathryn Iverson, Sarah Westcott

Project: Apply Machine Learning Algorithm(s) to Ecology Data

Random Forest for Mothur is our first attempt to analyse microbial ecology data with machine learning algorithms so that microbial ecologists can identify bacterial populations that are associated with differences between health and disease. So far we've implemented random forest algorithm, we are working on performance improvement issues.



Student: Daniel Gates

Mentor(s): Derrick Zwickl, Barb Banbury, Hilmarr Lapp, Brian O'meara

Project: Optimizing R code for ABC

TreEvo is an R based program which uses approximate Bayesian computing to make inferences about continuous character trait evolution. Bayesian simulations in R can be very computationally intensive, so our project took the original TreEvo code and found ineffeciencies within the code and fixed them while adding parallelization and checkpointing. All of these improvements should make the program more computationally friendly and will require that the user have to make less of a sacrifice of power to complete simulations in a reasonable amount of time.