Phyloinformatics Summer of Code 2008

From Phyloinformatics
Revision as of 18:38, 3 March 2008 by Hilmar (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

We are applying to the Google Summer of Code (GSoC) program again this year. If we are accepted, it will be our second time in the program (see Phyloinformatics Summer of Code 2007 for information on participation last year).

On this page we are collecting ideas, possible projects, prerequisites, possible solution approaches, mentors, other people or channels to contact for more information or to bounce ideas off of, etc.



We are participating in the Google Sumer of Code as a mentoring organization to introduce future researchers in comparative biology to open-source and open, collaborative development of reusable, interoperable, and standards-supporting software that enables new and increasingly complex scientific questions to be addressed. We are particularly interested in training future researchers to not only have awareness and understanding of the value of open-source and collabaratively developed software, but also to gain the programming and remote collaboration skills needed to successfully contribute to such projects.

We believe that the area of phyloinformatics is particularly well suited as a focus for our participation and to give our projects a coherent bigger picture:

  1. The code that students will write will facilitate new and increasingly complex questions to be asked in comparative biology, one of the central disciplines in understanding the evolution of life. We have already collected use-cases from the research community, and our mentors are active and leading among efforts that are trying to address cyberinfrastructure challenges in phyloinformatics. Work on projects along the lines of our suggested project ideas is bound to make an impact.
  2. The range of problems that students can make meaningful contributions to is diverse, enabling us to accommodate different areas of interest and skills. The GSoC project ideas we have generated cover a variety of programming languages, tasks, and skill levels, yet are all directed towards the same overall goals.
  3. We view our GSoC participation as an opportunity to gain future contributors to reusable open-source software components in phyloinformatics. Once accepted, we will advertise this program through appropriate channels to reach undergrad and grad students interested in computational comparative biology.


Note: if there is more than one mentor for a project, the primary mentor is in bold font. Biographical and other information on the mentors is linked to in the Mentors section.


What should prospective students know?

Note that the final acceptance of students has been made by Google. The below is here mostly for historical reasons ...

  • Student application is now open and runs through Tuesday, March 27, 2007, 9am PDT (12pm EDT). Google has instructions for how students apply on-line, and there is an FAQ answer on how applications should look like. You may apply for more than one project, simply submit multiple applications.
  • When applying, (aside from the information requested by Google) provide
    1. your interests, what makes you excited
    2. why you are interested in the project, and what you anticipate to gain from it
    3. a summary of your programming skills
    4. programs or projects you have previously authored or contributed to, in particular those available as open-source
    5. a project plan (i.e., what you expect to doing when over the course of the project) for any project proposal different or modified from the ideas above; even if you propose to work directly on one of the ideas above, presenting a project plan will strengthen your application, as it will show that you have thought about how one might want to go about the work.
  • Please send any questions you have, or projects you would like to propose, to This will reach all mentors and our PhyloSoC adminstrators. We recommend you do this even if you want to work directly on one of our projects ideas above; it gives you an opportunity to get feedback on what our expectations might be, and you might want to ask for more specifics. Also, you can bounce your project plan off of us.
  • You can visit our application document with Google's questions and our answers. The part that isn't on this page or linked from here already is mostly in the last couple of questions on mentors and students.
  • For eligibility, see the GSoC eligibility requirements for students. These requirements and the age restriction of 18 years or older must be met on April 9, 2007. Highschool students meeting the age requirement are also eligible.
  • There is also a Google group for posting GSoC questions (and receiving answers; note that you will need to sign up for the group) that relate to the program itself (rather than to our organization).
  • Students will receive a stipend from Google.

Reference Facts & Links

Projects involved

Bio* projects 
The umbrella organization for the Bio* projects is the Open Bioinformatics Foundation (O|B|F). O|B|F is governed by a Board of Directors, organizes the Bioinformatics Open Source Conference (BOSC) on an annual basis since 2001, and provides hardware and system administration for the member projects.
The individual member projects are BioPerl, Biojava, Biopython, Bioruby, and BioSQL. Except for the latter, which provides a generic schema for certain life science data types, each of these projects represents the largest and most widely used toolkit in its respective language in the life sciences.
Perl Bio:: projects 
Bio::NEXUS is the only Level-III compliant parser of the NEXUS file format for phylogenetic data in the Perl programming language. Bio::CDAT is a container architecture for Character Data And Trees. CDAT is in the early stages of its development. The perceived end result is an architecture that keeps track of Bio::* data objects in ways that are applicable to the task at hand. For example, a CDAT subclass could manage the relationships between a tree and multiple data columns for input in comparative analyses. Bio::Phylo is an API for phylogenetic data used by the CIPRES project. Bio::Phylo is intended to be compatible with BioPerl and CDAT, while functioning as a petri dish for new object designs.

Google Summer of Code