Cyberinfrastructure Summer Traineeships 2009

From Phyloinformatics
Revision as of 00:35, 1 March 2009 by Hilmar (talk) (New page: <div style="font-size: large; font-weight: bold; text-align: center; margin-bottom: 20px;">2009 Cyberinfrastructure Summer Traineeships for Data Repository Interoperability</div> == The p...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
2009 Cyberinfrastructure Summer Traineeships for Data Repository Interoperability

The program


A recently funded NSF grant to a consortium of existing data and metadata repositories for biodiversity, ecological, environmental, and evolutionary data aims to develop the necessary cyberinfrastructure tools and technologies to support the implementation of a Virtual Data Center (VDC) for these fields. The consortium is led by PI Bill Michener at the University of New Mexico, LTER Network Office.

NESCent is one of the collaborators through Dryad, a digital repository for data supporting publications in evolutionary biology. Other collaborating repositories include KNB (developed by NCEAS), NBII, DAAC, and GBIF.

The VDC will be based on a network of existing and new physical data centers ("nodes") that interoperate using open standards and protocols. The network will enable discovery of as well as open, stable, and secure access to data in any of its member nodes.

One of the activities supported by this grant is a summer internship program that is inspired by and closely modeled after the Google Summer of CodeTM program. This program is, however, neither part of the Google Summer of Code, nor endorsed by Google or any of its affiliates, and some of the rules differ.

Cyberinfrastructure Summer Traineeships

We will support a total of 4 summer traineeships in programming cyberinfrastructure towards interoperability of data repositories in the fields of biodiversity, ecology, earth and environment science, and evolution. The program is closely modeled after the Google Summer of CodeTM, follows largely the same structure, but differs in a few rules concerning eligibility, how you apply, and travel.

Broadly, as in the Google program,

  • to participate you apply with a project proposal and a CV, at the same time (Mar 23-Apr 3, 2009) as students apply to the Google program,
  • your proposal will be scored and ranked by the mentors, and
  • if accepted (by being among the top 4 applications), you will be paired with one of our mentors.
  • You need not (and likely will not) be at the same location or institution as your mentor. Development occurs online.
  • Projects are about contributing to collaboratively developed open-source software, applying or implementing open standards, and creating open-source code. Most and foremost, however, they are about learning how to become effective in these things.
  • The length (12 weeks) and timing (May 23 - Aug 17, 2009) of the programming time are in principle the same as for the Google program, as are the stipend ($4,500) and payment schedule.

Your project proposal may be based on one of the project ideas listed below, or on an idea of your own. We will score your proposal based on your qualifications, what you stand to gain professionally from the internship, evidence for how sustained your interest in cyberinfrastructure for interoperability in our fields could be, and most importantly feasibility and thoroughness of your proposal and project plan (see below).

There are a few rules in which this program differs from the Google program, though. Specifically:

  • You must be a US resident (or a US citizen) to be eligible, and be permitted to work in the US. This is required due to NSF being the funding source.
  • Aside from undergraduate and graduate students, current (on April 20, 2009) postdocs are also eligible to apply.
  • You will apply directly to us, not to the Google program.
  • You will receive payment from the University of Mexico, and the method of payment will most likely be a check.
  • Near (shortly before or after) the beginning of the coding period you will attend a meeting of the mentors and several other technical representatives from the collaborating data repositories (called the Technical Working Group - TWG). The meeting will result in strategic decisions on technologies and infrastructure milestones needed to achieve interoperability, and will provide context for and further inform your project.
  • Near the end or after conclusion of the coding period, you will attend a meeting of consortium members (called the Community Engagement Working Group) who focus on engaging non-member repositories as well as the larger community of scientists in the fields served by the member repositories. You will report on your project so that the working group members can determine the bext ways to disseminate your results to the relevant community.

All travel expenses will be reimbursed, and travel beyond those two meetings will not be required.

Phyloinformatics Summer of Code

NESCent does intend to continue participating as a mentoring organization in the official Google Summer of CodeTM program. You can find updates about and project ideas for our prospective 2009 Summer of Code participation on the Phyloinformatics Summer of Code 2009 page, and we encourage you to visit that page too to find the project idea or focus area that best suits your interest.


  • 28 Feb 2009: The project ideas page for 2009 (the page you are looking at) is ready for adding project ideas. --Hlapp


Our organization administrators are Hilmar Lapp ( and Todd J. Vision (

If you are a student interested in applying for a cyberinfrastructure traineeship project, please send any questions you have, projects you would like to propose, etc to This will reach all mentors, program adminstrators, and members of the Technical Working Group.

We aim to hang out regularly on IRC at least on weekdays during working hours (EDT) in #vdc on You're welcome to join us at any time, though be prepared that outside of those times we may not be online. Email will typically work well in either case. (If you do not have an IRC client installed, you might find the comparison on Wikipedia, the Google directory, or the IRC Reviews helpful. For Macs, X-Chat Aqua works pretty well. If you have never used IRC, try the IRC Primer at IRC Help, which also has links to lots of other material.)

For applying, please make sure you read our documentation on information that applicants should know and guidelines we expect you to follow before you apply. We don't have a format template for application that you need to adhere to, but we do ask that you include specific kinds of information. What those are is documented under "When you apply."


Note: if there is more than one mentor for a project, the primary mentor is in bold font. Biographical and other information on the mentors is linked to in the Mentors section.

To prospective applicants: The below are only our project ideas, albeit well thought-out ones. You are welcome to propose your own project if none of those below catches your interest, or if your idea is more exciting to you, provided it still falls within our scope (see 'Before you apply' below). Regardless of what you decide to do, make sure you read and follow the guidelines for applicants below.

Write a NEXUS parser in C&

The below is a template for how the student project ideas could be presented. Feel free to copy & paste & edit, and feel free to adjust the format ...

C& is an amp'ed-up programming language that has not been invented yet but in a few years will dominate the programming world. The best way to prevent broken non-compliant NEXUS parsers written in C& from appearing is to write a good one now.
Re-implementations of NEXUS parsers inevitably tend to be broken or non-compliant. Hence, the best approach is to write a translator that translates a reference implementation to C&.
C& has not been invented yet, so a lot of assumptions will have to be made.
Involved toolkits or projects 
The BioC& toolkit has much of the needed framework.
Degree of difficulty and needed skills 
Hard. The hardest part is probably inventing C&. Writing the parser itself should be medium, unless C& was ill-designed for writing parsers. Knowledge of the BioC& toolkit will obviously help, as well as knowing the NEXUS format.
Mike&, founder of BioC&


  • Matt Jones
  • Hilmar Lapp (also the organization administrator)
  • Mark Servilla
  • Dave Vieglais

What should prospective applicants know?

Important dates

  • Students apply between March 23-April 3, 2009.
  • The coding period starts May 23 and ends Aug 17, 2009.


To be eligible, you must be a US resident or a US citizen, and be either a student (whether full or part-time) or a postdoc.

  • We define student as an individual enrolled in or accepted into an accredited institution including (but not necessarily limited to) colleges, universities, masters programs, PhD programs and undergraduate programs. We may require you to supply documentation from your institution (such as transcripts) as proof of enrollment or admission status. There are no requirements for school or field of study in order to participate.
  • You must be eligible to work in the U.S. (or the country in which you reside). For students in the U.S. on an F-1 visa, you are welcome to apply as long as you have U.S. work authorization. For F-1 students who have to apply for CPT, the University of New Mexico will furnish you with a letter you can provide to your university to get CPT established once your application has been accepted.
  • The student or postdoc requirement is met if fulfilled on April 20, 2009, even if your enrollment or postdoc ends befoe the end of the program.

See the program description for additional information.

You are not eligible to participate as a student if you are an employee of the National Center for Ecological Analysis and Synthesis (NCEAS), or the National Evolutionary Synthesis Center (NESCent), or the University of New Mexico in Albuquerque, or Oak Ridge National Laboratory (ORNL).

Before you apply

  • We support projects in developing cyberinfrastructure towards data repository interoperability with a focus on the fields of biodiversity, ecology, environmental sciences, and evolution. See the program description for background]]. * Pick the idea that appeals most to you in terms of goals, context, and required skills, or you can apply with your own project idea.
  • If you want to apply with your own idea, contact us early on to get feedback on whether your idea is within the scope we support. If you don't, and your idea is outside of our scope, it may simply be declined without further review.
  • Ask us questions about the project idea you have in mind.
  • Write a project proposal draft, include a project plan (see below), and bounce those off of us.

Have I mentioned yet that you should be in touch with us before you apply?

When you apply

To apply, please provide a current CV and the following in your application material.

  1. Your interests, what makes you excited.
  2. Why you are interested in the project, uniquely suited to undertake it, and what do you anticipate to gain from it.
  3. A summary of your programming experience and skills
  4. Programs or projects you have previously authored or contributed to, in particular those available as open-source.
  5. A project plan for the project you are proposing, even if your proposed project is directly based on one of the ideas above.
    • A project plan in principle divides up the whole project into a series of manageable milestones and timelines that, when all accomplished, logically lead to the end goal(s) of the project. Put in another way, a project plan explains what you expect you will need to be doing, and what you expect you need to have accomplished, at which time, so that at the end you reach the goals of the project.
    • Do not take this part lightly. A compelling plan takes a significant amount of work. Applications with no or a hastily composed project plan will likely not be competitive.
    • A good plan will require you to thoroughly think about the project itself and how one might want to go about the work.
    • We don't expect you to have all the experience, background, and knowledge to come up with the final, real work plan on your own at the time you apply. We do expect your plan to demonstrate, however, that you have made the effort and thoroughly dissected the goals into tasks and successive accomplishments that make sense.
    • We strongly recommend that you bounce your proposed project and your project plan draft off our mentors by emailing (see below). You will inevitably discover through this exercise that you are missing a lot of the pieces - we are there to help you fill those in as best as we can.
  6. Your possibly conflicting obligations or plans for the summer during the coding period.
    • Although there are no hard and fast rules about how much you can do in parallel to your Summer of Code project, we do expect the project to be your primary focus of attention over the summer. If you look at your Summer of Code project as a part-time occupation, please don't apply for our organization.
    • That notwithstanding, if you have the time-management skills to manage other work obligations concurrent with your Summer of Code project, feel encouraged to make your case and support it with evidence.
    • Most important of all, be upfront. If it turns out later that you weren't clear about other obligations, at best (i.e., if your accomplishment record at that point is spotless) it destroys our trust. Also, if you are accepted, don't take on additional obligations before discussing those with your mentor.
    • One of the most common reasons for students to struggle or fail is being overstretched. Don't set yourself up for that - at best it would greatly diminish the amount of fun you'll have with your Summer of Code project.
  7. Please also note state if you expect that attending the meetings at the beginning and at the end of the coding period may present a problem for you.

Get in touch with us

Please send any questions you have and ideas and work plans for projects you would like to propose to This will reach all mentors and our adminstrators.

  • We strongly recommend you do this even if you want to work directly on one of our project ideas above. It gives you an opportunity to get feedback on what our expectations might be, and you might want to ask for more specifics.
  • The value of frequent and early communication in contributing to a distributed and collaboratively developed project can hardly be overemphasized. The same is true for becoming part of a community, even if only temporarily. If you can't find the time to engage in communication with your prospective mentors before you have been accepted, how will you argue that you will be a good communicator once accepted?

Reference Facts & Links

Data repositories, technologies, and standards involved

Dryad is a digital repository for data supporting published works in evolutionary biology. It is based on the DSpace software, and is being developed by NESCent in collaboration with the Metadata Research Center (MRC) at the UNC School for Information and Library Science (SILS).
Links: source code, project wiki, metadata application profile
Dublin Core 


The concept, structure, and most of the rules are inspired by the Google Summer of CodeTM Program.

NESCent participated in the Summer of Code program in 2007 and 2008 as a mentoring organization, and will be applying to participate again in 2009. The experience and results from those activities as well as the embrace by the community are what motivated us to propose to NSF a separately run and funded program that would otherwise be closely modeled after the original Google program.


This program or any if its features or terms is not part of the Google Summer of CodeTM program, and is not endorsed by Google or any Google employee. Any links from this page to Google or to the Google Summer of CodeTM program and auxilliary pages are solely due to this program having been inspired by and having similar terms to Google's, and should not be construed to indicate an official relationship or endorsement.

If you have any questions about this program, do not direct them to Google, or to any of the official Google Summer of Code program channels such as mailing lists, Google Groups, or the #gsoc IRC channel. Use only the specific channels mentioned under Contact.

This program is funded by the U.S. National Science Foundation (NSF). Any opinions, findings, conclusions, or recommendations expressed at this site are those of the authors and do not necessarily reflect the views of the National Science Foundation.