Phyloinformatics Summer of Code 2007
We are applying to the Google Summer of Code (GSoC) program for the first time this year. On this page we are collecting ideas, possible projects, prerequisites, possible solution approaches, mentors, other people or channels to contact for more information or to bounce ideas off of, etc
- 21:20, 2 March 2007 (EST) Created page, added a couple of links, outline, and started filling in some bits. Hlapp
We believe that the goals, targets, and prior work of this Phyloinformatics working group make it particularly well suited as a mentoring organization for the GSoC program, for basically three reasons.
- The code that students would write will make a significant impact in evolutionary and comparative biology research, scientific disciplines that are driving forward the understanding of life in the postgenomic era. Part of the prior work of this group has been to collect use-cases from the evolutionary biology community at-large and flesh them out with their own expertise, resulting in the most common and pervasive problems caused by the lack of phyloinformatics cyberinfrastructure. Work to fill in the gaps is bound to make an impact. NESCent is also committed to disseminate the the contributions and their value for the research community through summer courses and conference tutorials.
- The range of problems that students can work on productively and make meaningful contributions is diverse, being able to accommodate different areas of interest, or levels of prior skills. The participating toolkits cover a variety of programming languages and tasks involved, yet are directed towards the same overall goal. A diverse group of mentors is on hand and can quickly be expanded to entire developer communities of the participating toolkits, which in the past have been very supportive to newbie programmers.
- We view this program as an opportunity to attract and train future researchers in phylogenetics or related disciplines who not only will become better aware of phyloinformatics infrastructure gaps, but are also able to fill those in themselves, in a manner that is increasingly standards-compliant and reusable by others. In particular, once accepted we will extensively advertise this programs through channels commonly read by undergrad and grad students in biology, bioinformatics, and computational biology. When reviewing students who apply, depending on the applicant we will weigh not only relevant programming knowledge, but also genuine interest in comparative biology research. Some of the mentors have been selected because they can relate particularly well to students who are novices in research programming.
The below is a template for how the student project ideas could be presented. Feel free to copy & paste & edit, and feel free to adjust the format ...
Write a NEXUS parser in C&
- C& is a revolutionary programming language that has not been invented yet but in a few years will dominate the programming world. The best way to prevent broken non-compliant NEXUS parsers written in C& from appearing is to write a good one now.
- Re-implementations of NEXUS parsers inevitably tend to be broken or non-compliant. Hence, the best approach is to write a translator that translates a reference implementation to C&.
- C& has not been invented yet, so a lot of assumptions will have to be made.
- Involved toolkits or projects
- The BioC& toolkit has much of the needed framework.
- Mike&, founder of BioC&
What should prospective students know?
Reference Facts & Links
- Bio* projects
- The umbrella organization for the Bio* projects is the Open Bioinformatics Foundation (O|B|F). O|B|F is governed by a Board of Directors, organizes the Bioinformatics Open Source Conference (BOSC) on an annual basis since 2001, and provides hardware and system administration for the member projects.
The individual member projects are BioPerl, Biojava, Biopython, Bioruby, and BioSQL. Except for the latter, which provides a generic schema for certain life science data types, each of these projects represents the largest and most widely used toolkit in its respective language in the life sciences.
- Perl Bio:: projects
- Bio::NEXUS is the only Level-III compliant parser of the NEXUS file format for phylogenetic data in the Perl programming language. Bio::CDAT is <please fill in here>. Bio::Phylo is <please fill in here>.
- CIPRES project
- please fill in here; is this participating in the PhySoC program?
- Google expects to accept about 100 mentor organizations.
- Mentoring organizations apply between March 5-12, 2007, and students between March 14-24. Accepted mentoring organizations will be published March 14. See full set of timelines.
- Development occurs on-line, there is no requirement to travel.