Difference between revisions of "PhyloSoC:Export ontology-based phenotype descriptions to EOL"

From Phyloinformatics
Jump to: navigation, search
(Week 7 (July 4 - July 11))
(Week 5 (June 20 - June 27))
Line 50: Line 50:
  
 
==Week 5 (June 20 - June 27)==
 
==Week 5 (June 20 - June 27)==
 +
* Figure if it possible to obtain common names for at least some of the phenotype terms (versus the scientific names) so that a simple user will not have difficulties understanding the text generated with these terms.
  
 
==Week 6 (June 27 - July 4)==
 
==Week 6 (June 27 - July 4)==

Revision as of 11:39, 17 June 2011

Mentors

Jim Balhoff (primary)

Chris Mungall

Matt Yoder

Cyndy Parr

Project description

Short abstract

This project involves developing a system that will map the phenotypic data from an OBD database to the EOL transfer schema. This implies determining what phenotypic information can be used and creating human-readable segments of text that can be integrated in a Encyclopedia of Life page.

Progress

Details regarding the development of this project will be posted on the Ontophenotype blog.

Project plan

Community Bonding Period

  • Subscribe to recommended mailing lists Done
  • Start github repository and check git settings and connectivity Done
  • Start project blog Done

Week 1 (May 23 - May 30)

  • Write a parser for the Phenoscape data in the JSON format. Done

Week 2 (May 30 - June 6)

  • If available, use phenotype annotations to taxa in the OWL format.
  • Given two ore more taxa with phenotype annotations, find common anatomical structures between those.
  • Given a starting taxon and other taxa, find the phenotype data that differentiates the first taxon from the others. In other words, find what makes a taxon anatomically particular compared to others.

This will be done using only the information in the provided taxa files.

It would be interesting to present such information in an EOL page. For example, in a section about a species morphology, a phrase like "This species sets apart from the others in the [species_genus] by the lack of [phenotype_entity]." could be automatically generated. Another example is for a genus page : "A common denominator for this genus is the presence of [phenotype_entity] or the absence of [phenotype_entity]".

Week 3 (June 6 - June 13)

Milestone week.

  • In necessary, finish any method from last week.
  • Cover all the methods implemented so far with tests using the Ictalurus genus and the Ictalurus australis and Ictalurus punctatus taxa.
  • Clean all the code so far and pushed it to github.

Week 4 (June 13 - June 20)

At this point, I have extracted phenotypic traits and those can serve as a starting point for the project module that builds human-readable text for EoL. For this week, I've set the following tasks:

  • Experiment different methods for building text from phenotype annotations. (Here additional information on this these annotations would be useful. Subtask : find out how I could obtain such information).
  • Begin writing code for this module.

Week 5 (June 20 - June 27)

  • Figure if it possible to obtain common names for at least some of the phenotype terms (versus the scientific names) so that a simple user will not have difficulties understanding the text generated with these terms.

Week 6 (June 27 - July 4)

Milestone week.

This week I will work towards obtaining an intermediate result. This will be represented by an EoL export file. I can outline the following subtasks:

  • At this point, I should have at least a simple method that generates text from phenotype descriptions. I will map the taxa information and this text to an element of the EoL schema.
  • Having the mappings, I will generate an XML file. An example of this process is given here : http://wiki.eol.org:8081/display/dev/Creating+Content+Connectors+for+EOL . The examples presented on that page are written in PHP and Ruby, but I will provide a Java implementation.
  • Use an XML validator to validate the resulted file according to the EoL Tranfer Schema format.

I want to give here a few more details on how I intend to do the above mentioned mappings.

For taxa, I will use at first only the required elements :

dc:identifier
dwc:ScientificName

For text generated form phenotype descriptors , I will build a <dataObject> with the following information:

<dataType>
    http://purl.org/dc/dcmitype/Text
</dataType>
<subject>
    http://rs.tdwg.org/ontology/voc/SPMInfoItems#Morphology
</subject>
<dc:description xml:lang="">
    actual text
</dc:description>

Week 7 (July 4 - July 11)

For this week, I plan to return to the side of the project regarding information extraction from the phenoscape data.

  • Make use of ontologies.
  • Add and ontology parser to the project.

Week 8 (July 11 - July 18)

Week 9 (July 18 - July 25)

Week 10 (July 25 - August 1)

Week 11 (August 1 - August 8)

Week 12 (August 8 - August 15)