The NESCent Informatics program has two broad goals. The first is to provide support to the science sponsored by the Center. The second goal is to help build cyberinfrastructure that will enable evolutionary biologists to fully exploit the information-rich discipline that biology has become. This latter goal requires leveraging the energies and talents of the open source programming community to build extensible and interoperable software components for evolutionary analyses, and training the evolutionary biology community to fully realize the potential of these tools. This page describes several of NESCent's efforts to help build cyberinfrastructure.
Phyloinformatics Summer of CodePhyloinformatics Summer of Code 2014 page that might be useful to prospective students and mentors, including links to some of the biology, math and science related organisations participating in this year's Summer of Code.
- 2013 NESCent and Google project pages, and summaries of all projects
- 2012 NESCent and Google project pages, and summaries of all projects
- 2011 NESCent and Google project pages, and summaries of all projects
- 2010 NESCent and Google project pages, and summary of all projects
- 2009 NESCent and Google project pages, and summary of all projects for the fall edition of the 2009 newsletter.
- 2008 NESCent and Google project pages, and summary in the fall edition of the 2008 newsletter.
- 2007 NESCent and Google project pages, and summary in the fall edition of the 2007 newsletter.
Cyberinfrastructure Summer Internships
We ran the Cyberinfrastructure Summer Traineeship program for students and postdocs interested in informatics as applied to biodiversity, earth and environmental data for the first time in 2009. Four trainees gained collaborative open-source software development experience by helping to build a Virtual Data Center (VDC).
The program documentation for 2009 has project ideas, accepted projects and students, and further information on which repositories and organizations partner in this consortium. There is also a newsletter-style summary of all projects.
What is a hackathon? A hackathon is a hands-on software development meeting that allows programmers from different teams to intensively work on a common set of objectives and interact face-to-face. Of course, there's a Wikipedia article, too.
Population Genetics in R Hackathon
In March, 2015 we held a hackathon at NESCent with the objective to help foster an interoperating ecosystem of scalable tools and resources for population genetics data analysis in the popular R platform. The event targeted interoperability, scalability, and workflow building challenges among the many population genetics R packages that already exist. It allowed a diverse group of population genetics researchers, method developers, and people with other relevant areas of expertise to collaborate on code, documentation, use-cases, and other resources that will aid their communities.
See the Population Genetics in R Hackathon home on Github for more details.
The NSF-supported Open Tree of Life project has (1) gathered, encoded and annotated >4000 published phylogenies (“source trees”), (2) combined several taxonomic hierarchies into a reference taxonomy, and (3) used this information to generate a synthetic tree covering >2.5 million species. OpenTree provides access to all of this information via raw downloads, and also via queryable online interfaces that can be invoked by external software. However, tools that actually use these interfaces to deliver phylogenetic knowledge into the hands of scientists have not been developed yet.
To facilitate the development of tools that use its resources, Open Tree of Life, Arbor and the NESCent HIP working group jointly held a hackathon for testing, expanding and building upon the Open Tree of Life APIs. The Tree-for-all event was held September 15 to 19, 2014 at the University of Michigan. Details of the event and outcomes are on the hackathon GitHub repository.
Phylotastic: infrastructure for re-using megatrees
A new NESCent working group called Hackathons, Interoperability, Phylogenies (HIP), has so far staged two hackathons under the Phylotastic brand, one held June 4 to 8, 2012 at NESCent, and another one Jan 28 - Feb 1, 2013, at iPlant in Tucson, AZ. Participants built a web-services implementation of the pruning, grafting, name-reconciliation and other functionalities necessary for researchers to take advantage of emerging megatrees. See the Phylotastic page at the EvoIO wiki for more information.
GMOD Tools for Evolutionary Biology
This NESCent sponsored hackathon will fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. This hackathon will focus on tools for
- viewing comparative genomics data;
- visualizing phylogenomic data; and
- supporting population diversity data and phenotype annotation.
The event will take place November 8-12, 2010, at NESCent and bring together a group of about 30 software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements.
This hackathon will provide a unique opportunity to infuse the community of GMOD developers with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components.
NESCent sponsored the Phyloinformatics VoCamp as a hands-on collaborative meeting for investigators to create and develop ontologies and lightweight vocabularies in support of integration and semantic cross-linking of evolutionary data with its many related fields. It was held on November 7-11, 2009 in Montpellier, France, co-localized with the 2009 annual meeting of the .
Integrating diverse biological data with the historical process of evolution is a grand challenge for 21st century biology. The interoperability of data from diverse fields (e.g., genetics, ecology, biodiversity, biomedicine) requires a technology infrastructure based on formalized, shared vocabularies. Developing such vocabularies is a community project. The VoCanp format chosen to promote this notion is similar to a hackathons, but instead of writing software focuses on vocabulary and ontology development.
More information is at the Phyloinformatics VoCamp home page.
Evolutionary Database Interoperability Hackathon
NESCent sponsored a hackathon on Evolutionary Database Interoperability, which took place March 9-13, 2009 on-site at NESCent in Durham, North Carolina.
Despite the rich and meticulously curated variety of on-line databases of phylogenetic data, their holdings are only available in incompatible formats lacking explicit semantics, and programmable APIs for querying the data are often not provided, resulting in significant obstacles to interoperability and data integration. The hackathon brought together data and metadata experts and developers from a number of data providers with the developers of emerging standards for
- a future phylogenetic data exchange format (NeXML),
- an ontology resulting in formal and machine-interpretable semantics of evolutionary data and metadata (CDAO), and
- a programmable web-service based interface for phylogenetic data providers (PhyloWS).
These standards, and many of the ideas for this hackathon arose from, and are a continuation of, the activities of NESCent's Evolutionary Informatics Working Group. The event was therefore also the last meeting of the working group.
More information is at the Evolutionary Database Interoperability Hackathon home page.
NESCent Hackathon on Comparative Methods in R
The NESCent-sponsored Hackathon on Comparative Methods in R took place Dec 10-14, 2007, at NESCent on Durham, North Carolina. The R statistics package has emerged as a popular platform for implementation of comparative phylogenetic methods. The objective of the hackathon was to encourage the open development of software that is interoperable with other packages, supports data and exchange standards, and can be transparently extended to accommodate new data types or formats. The event brought together nearly 30 participants consisting of developers of comparative methods in R as well as users of comparative methods.
Outcomes of the event include better software tools for studying diversification rates, estimating divergence times, and modeling the evolution of continuous phylogenetic characters, as well as improved online documentation, shared libraries for basic phylogenetic data manipulation, and interoperability between R and the Mesquite package.
The meeting also resulted in a new community wiki for comparative analyses in R.open to anyone interested in discussing the application of comparative phylogenetic methods within R, and a
For more information, please see
- Comparative Methods in R Hackathon Overview, the best place to start
- Report of the meeting
- Links to subgroups with their development targets and accomplishments
- User-level documentation for doing comparative analysis in R
- Virtual Mini-hackathon, December 18-19 2008
NESCent Phyloinformatics Hackathon: Lowering the Barrier
The Phyloinformatics Hackathon took place 11th to 15th December, 2006, at NESCent in Durham, North Carolina. On day 1, we heard user stories and chose six of the use cases as top-priority targets for development. At the end of the day we assembled into toolkit-specific groups to devise toolkit-specific plans focusing on one or more targets. For the rest of the week, we worked. Descriptions of the outputs of the hackathon are still in progress.
The objective of this first NESCent phyloinformatics hackathon was (and is on a continuing basis) on leveraging the Bio* open source software tools to provide the "glue" and lower the barriers for using phylogenetic tools within automated workflows. Details are outlined further in the formal proposal.
For more information, please see
- Phyloinformatics Hackathon Overview, this is the best place to start
- Scientific Use Cases for user stories presentations on day 1 of the hack-a-thon
- In-depth documentation on each of the six high-priority targets chosen for development
Call For Input
To ensure that the Center's Informatics program continues to be responsive to user needs, and to tap into the expertise and creativity of our community, we solicit short (2-6 pages) whitepapers on initiatives to be undertaken by the Center, including training, software development, hackathons, and coordination of data standards and ontology development.
We have now also created specific guidelines for whitepapers proposing future hackathons and similar one-time cyberinfrastructure events, in an effort to further encourage submissions of this type.
Community input into use cases was a key element in guiding and focusing the work at the Phyloinformatics Hackathon on the most urgent or pervasive problems. We continue to invite input in various ways (see below) to help steer and validate future efforts and results of this and other activities. Input may take several forms:
- changes to the current list of use cases
- actual data files (e.g., alignments, trees, other data) for use in testing
- citations to published papers that pose challenging problems or that provide useful methods
- your "wish list" for a phyloinformatics computing platform
The favored mechanism for providing input is to post your comments directly to this wiki. In order to do so, you may simply register at the wiki site and start editing, for example the the use case document (or feel free to create a new page). If you have a data file to use for testing, you may upload it to the wiki (and add a link describing it). Alternatively, you may send comments (and files) to Hilmar Lapp (please indicate if you don't wish to share your comments or data files on the wiki). You may also contact any of the organizers with questions or comments.