Main Page

From NESCent Informatics Wiki
Jump to: navigation, search
Welcome to the NESCent Informatics Wiki

The NESCent Informatics program has two broad goals. The first is to provide support to the science sponsored by the Center. The second goal is to help build cyberinfrastructure that will enable evolutionary biologists to fully exploit the information-rich discipline that biology has become. This latter goal requires leveraging the energies and talents of the open source programming community to build extensible and interoperable software components for evolutionary analyses, and training the evolutionary biology community to fully realize the potential of these tools. This page describes several of NESCent's efforts to help build cyberinfrastructure.

Phyloinformatics Summer of Code

Genome trees.png
NESCent wasn't successful in their application to the Google Summer of Code for 2014. We do, however, have a number of ideas on our Phyloinformatics Summer of Code 2014 page that might be useful to prospective students and mentors, including links to some of the biology, math and science related organisations participating in this year's Summer of Code.



Logo re-used under the terms of the Creative Commons Attribution License from PLoS Comp Biol Issue Image Vol. 1(7) December 2005. http://doi:10.1371/image.pcbi.v01.i07.

Previous years

Cyberinfrastructure Summer Internships

We ran the Cyberinfrastructure Summer Traineeship program for students and postdocs interested in informatics as applied to biodiversity, earth and environmental data for the first time in 2009. Four trainees gained collaborative open-source software development experience by helping to build a Virtual Data Center (VDC).

The program documentation for 2009 has project ideas, accepted projects and students, and further information on which repositories and organizations partner in this consortium. There is also a newsletter-style summary of all projects.

Hackathons

What is a hackathon? A hackathon is a hands-on software development meeting that allows programmers from different teams to intensively work on a common set of objectives and interact face-to-face. Of course, there's a Wikipedia article, too.

Phylotastic: infrastructure for re-using megatrees

Phylotastic logo.png

A new NESCent working group called Hackathons, Interoperability, Phylogenies (HIP), has so far staged two hackathons under the Phylotastic brand, one held June 4 to 8, 2012 at NESCent, and another one Jan 28 - Feb 1, 2013, at iPlant in Tucson, AZ. Participants built a web-services implementation of the pruning, grafting, name-reconciliation and other functionalities necessary for researchers to take advantage of emerging megatrees. See the Phylotastic page at the EvoIO wiki for more information.

GMOD Tools for Evolutionary Biology

GMODEvoHackathonLogo.png

This NESCent sponsored hackathon will fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. This hackathon will focus on tools for

  1. viewing comparative genomics data;
  2. visualizing phylogenomic data; and
  3. supporting population diversity data and phenotype annotation.

The event will take place November 8-12, 2010, at NESCent and bring together a group of about 30 software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements.

This hackathon will provide a unique opportunity to infuse the community of GMOD developers with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components.

See the hackathon working wiki for more information. Once the hackathon is finished, relevant content will be copied to GMOD.org and linked to from the GMOD.org hackathon page.

Phyloinformatics VoCamp

Evoio logo.png

NESCent sponsored the Phyloinformatics VoCamp as a hands-on collaborative meeting for investigators to create and develop ontologies and lightweight vocabularies in support of integration and semantic cross-linking of evolutionary data with its many related fields. It was held on November 7-11, 2009 in Montpellier, France, co-localized with the 2009 annual meeting of the International Biodiversity Information Standards Organization (TDWG).

Integrating diverse biological data with the historical process of evolution is a grand challenge for 21st century biology. The interoperability of data from diverse fields (e.g., genetics, ecology, biodiversity, biomedicine) requires a technology infrastructure based on formalized, shared vocabularies. Developing such vocabularies is a community project. The VoCanp format chosen to promote this notion is similar to a hackathons, but instead of writing software focuses on vocabulary and ontology development.

More information is at the Phyloinformatics VoCamp home page.

Evolutionary Database Interoperability Hackathon

NESCent sponsored a hackathon on Evolutionary Database Interoperability, which took place March 9-13, 2009 on-site at NESCent in Durham, North Carolina.

Despite the rich and meticulously curated variety of on-line databases of phylogenetic data, their holdings are only available in incompatible formats lacking explicit semantics, and programmable APIs for querying the data are often not provided, resulting in significant obstacles to interoperability and data integration. The hackathon brought together data and metadata experts and developers from a number of data providers with the developers of emerging standards for

  • a future phylogenetic data exchange format (NeXML),
  • an ontology resulting in formal and machine-interpretable semantics of evolutionary data and metadata (CDAO), and
  • a programmable web-service based interface for phylogenetic data providers (PhyloWS).

These standards, and many of the ideas for this hackathon arose from, and are a continuation of, the activities of NESCent's Evolutionary Informatics Working Group. The event was therefore also the last meeting of the working group.

More information is at the Evolutionary Database Interoperability Hackathon home page.

NESCent Hackathon on Comparative Methods in R

The NESCent-sponsored Hackathon on Comparative Methods in R took place Dec 10-14, 2007, at NESCent on Durham, North Carolina. The R statistics package has emerged as a popular platform for implementation of comparative phylogenetic methods. The objective of the hackathon was to encourage the open development of software that is interoperable with other packages, supports data and exchange standards, and can be transparently extended to accommodate new data types or formats. The event brought together nearly 30 participants consisting of developers of comparative methods in R as well as users of comparative methods.

Outcomes of the event include better software tools for studying diversification rates, estimating divergence times, and modeling the evolution of continuous phylogenetic characters, as well as improved online documentation, shared libraries for basic phylogenetic data manipulation, and interoperability between R and the Mesquite package.

The meeting also resulted in a new R-sig-phylo mailing list open to anyone interested in discussing the application of comparative phylogenetic methods within R, and a community wiki for comparative analyses in R.

For more information, please see

  1. Comparative Methods in R Hackathon Overview, the best place to start
  2. Report of the meeting
  3. Links to subgroups with their development targets and accomplishments
  4. User-level documentation for doing comparative analysis in R
  5. Virtual Mini-hackathon, December 18-19 2008

NESCent Phyloinformatics Hackathon: Lowering the Barrier

The Phyloinformatics Hackathon took place 11th to 15th December, 2006, at NESCent in Durham, North Carolina. On day 1, we heard user stories and chose six of the use cases as top-priority targets for development. At the end of the day we assembled into toolkit-specific groups to devise toolkit-specific plans focusing on one or more targets. For the rest of the week, we worked. Descriptions of the outputs of the hackathon are still in progress.

The objective of this first NESCent phyloinformatics hackathon was (and is on a continuing basis) on leveraging the Bio* open source software tools to provide the "glue" and lower the barriers for using phylogenetic tools within automated workflows. Details are outlined further in the formal proposal.

For more information, please see

  1. Phyloinformatics Hackathon Overview, this is the best place to start
  2. Scientific Use Cases for user stories presentations on day 1 of the hack-a-thon
  3. In-depth documentation on each of the six high-priority targets chosen for development

Call For Input

Informatics Initiatives

To ensure that the Center's Informatics program continues to be responsive to user needs, and to tap into the expertise and creativity of our community, we solicit short (2-6 pages) whitepapers on initiatives to be undertaken by the Center, including training, software development, hackathons, and coordination of data standards and ontology development.

We have now also created specific guidelines for whitepapers proposing future hackathons and similar one-time cyberinfrastructure events, in an effort to further encourage submissions of this type.

Use-cases

Community input into use cases was a key element in guiding and focusing the work at the Phyloinformatics Hackathon on the most urgent or pervasive problems. We continue to invite input in various ways (see below) to help steer and validate future efforts and results of this and other activities. Input may take several forms:

  • changes to the current list of use cases
  • actual data files (e.g., alignments, trees, other data) for use in testing
  • citations to published papers that pose challenging problems or that provide useful methods
  • your "wish list" for a phyloinformatics computing platform

The favored mechanism for providing input is to post your comments directly to this wiki. In order to do so, you may simply register at the wiki site and start editing, for example the the use case document (or feel free to create a new page). If you have a data file to use for testing, you may upload it to the wiki (and add a link describing it). Alternatively, you may send comments (and files) to Hilmar Lapp (please indicate if you don't wish to share your comments or data files on the wiki). You may also contact any of the organizers with questions or comments.