PhyloSoC:Export ontology-based phenotype descriptions to EOL

From Phyloinformatics
Revision as of 11:25, 17 June 2011 by Alexgansca (talk) (Week 7 (July 4 - July 11))
Jump to: navigation, search


Jim Balhoff (primary)

Chris Mungall

Matt Yoder

Cyndy Parr

Project description

Short abstract

This project involves developing a system that will map the phenotypic data from an OBD database to the EOL transfer schema. This implies determining what phenotypic information can be used and creating human-readable segments of text that can be integrated in a Encyclopedia of Life page.


Details regarding the development of this project will be posted on the Ontophenotype blog.

Project plan

Community Bonding Period

  • Subscribe to recommended mailing lists Done
  • Start github repository and check git settings and connectivity Done
  • Start project blog Done

Week 1 (May 23 - May 30)

  • Write a parser for the Phenoscape data in the JSON format. Done

Week 2 (May 30 - June 6)

  • If available, use phenotype annotations to taxa in the OWL format.
  • Given two ore more taxa with phenotype annotations, find common anatomical structures between those.
  • Given a starting taxon and other taxa, find the phenotype data that differentiates the first taxon from the others. In other words, find what makes a taxon anatomically particular compared to others.

This will be done using only the information in the provided taxa files.

It would be interesting to present such information in an EOL page. For example, in a section about a species morphology, a phrase like "This species sets apart from the others in the [species_genus] by the lack of [phenotype_entity]." could be automatically generated. Another example is for a genus page : "A common denominator for this genus is the presence of [phenotype_entity] or the absence of [phenotype_entity]".

Week 3 (June 6 - June 13)

Milestone week.

  • In necessary, finish any method from last week.
  • Cover all the methods implemented so far with tests using the Ictalurus genus and the Ictalurus australis and Ictalurus punctatus taxa.
  • Clean all the code so far and pushed it to github.

Week 4 (June 13 - June 20)

At this point, I have extracted phenotypic traits and those can serve as a starting point for the project module that builds human-readable text for EoL. For this week, I've set the following tasks:

  • Experiment different methods for building text from phenotype annotations. (Here additional information on this these annotations would be useful. Subtask : find out how I could obtain such information).
  • Begin writing code for this module.

Week 5 (June 20 - June 27)

Week 6 (June 27 - July 4)

Milestone week.

This week I will work towards obtaining an intermediate result. This will be represented by an EoL export file. I can outline the following subtasks:

  • At this point, I should have at least a simple method that generates text from phenotype descriptions. I will map the taxa information and this text to an element of the EoL schema.
  • Having the mappings, I will generate an XML file. An example of this process is given here : . The examples presented on that page are written in PHP and Ruby, but I will provide a Java implementation.
  • Use an XML validator to validate the resulted file according to the EoL Tranfer Schema format.

I want to give here a few more details on how I intend to do the above mentioned mappings.

For taxa, I will use at first only the required elements :


For text generated form phenotype descriptors , I will build a <dataObject> with the following information:

<dc:description xml:lang="">
    actual text

Week 7 (July 4 - July 11)

For this week, I plan to return to the side of the project regarding information extraction from the phenoscape data.

  • Make use of ontologies.
  • Add and ontology parser to the project.

Week 8 (July 11 - July 18)

Week 9 (July 18 - July 25)

Week 10 (July 25 - August 1)

Week 11 (August 1 - August 8)

Week 12 (August 8 - August 15)