Difference between revisions of "PhyloSoC: Automated submission of rich data to TreeBASE"

From Phyloinformatics
Jump to: navigation, search
(Project Goals)
(Timeline for Project Plan)
Line 33: Line 33:
 
<i>Comments: </i> See blog for more details on building the code base.  Familiarized myself with areas of code that I would be working on. This includes the NexmlObjectConverter class, along with its subclasses and superclasses.  Took a look at the NexmlSerializationTest JUnit class, as well. Read about topics such as object-relational mapping with Hibernate and how Spring beans are configured to have other objects assigned to them automatically. <br />
 
<i>Comments: </i> See blog for more details on building the code base.  Familiarized myself with areas of code that I would be working on. This includes the NexmlObjectConverter class, along with its subclasses and superclasses.  Took a look at the NexmlSerializationTest JUnit class, as well. Read about topics such as object-relational mapping with Hibernate and how Spring beans are configured to have other objects assigned to them automatically. <br />
 
<font size="3"><b>5/23-5/29: Week One</b></font><br />
 
<font size="3"><b>5/23-5/29: Week One</b></font><br />
NeXML solution both for expressing CHARSET free text and for expressing the row-segment metadata like the Genbank accession number. It is currently only working for NEXUS.<br />
+
Focus on org.nexml.model.* sections of the code base. NeXML solution both for expressing CHARSET free text and for expressing the row-segment metadata like the Genbank accession number. It is currently only working for NEXUS. Develop unit test to get it working.  Think about ways to add charsets and figure out how to implement. Get back to Rutger about whethor or not any changes to NeXML interface is necessary.<br />
 +
<i>Comments: </i> Studied code and starting to figure out how pieces fit together. In order to express row-segment metadata, I learned that I am going to be adding code to the NexmlMatrixConverter class. <br />
 
<font size="3"><b>5/30-6/05: Week Two</b></font><br />
 
<font size="3"><b>5/30-6/05: Week Two</b></font><br />
 
<font size="3"><b>6/06-6/12: Week Three</b></font><br />
 
<font size="3"><b>6/06-6/12: Week Three</b></font><br />

Revision as of 16:04, 25 May 2011

Project Author and Mentors

Check out the GSoC 2011 TreeBASE project blog!

Also, check out TreeBASE!

Abstract

TreeBASE acts as a archive for phylogenetic analyses. The current submission of data to TreeBASE is via NEXUS files. However, this format results in a clunky user interface and does not allow for automated submission of metadata or additional annotations to be added. This project will take on the task of accepting NeXML files to TreeBASE so that the submission process of metadata to be easily submitted and so that new annotations of the metadata can be displayed in a user-friendly manner.

Background and Purpose

At present time, TreeBASE serves as an archive for phylogenetic data. The user is able to search the database for different studies based on author, study ID, and other keywords found throughout the work. One is then able to browse through the metadata of the study. The user can search for taxa of interest based on several identifiers and results can link you to NCBI’s taxonomy browser or the Universal Biological Indexer and Organizer. One can also view the matrices used in the analyses, which provides a link to the original NEXUS file and a list of the sequences, but only the first 30 characters are visible. The trees displayed for a particular study can be further refined by topology type. All trees can be viewed using PhyloWidget. Although useful, there are numerous additional features that TreeBASE could be included that would maximize its usefulness and minimize the number of clicks the user has to make to navigate throughout the site.

One example of an annotation that would be useful to the TreeBASE user community include linking sequence data to the Genbank accession number so one could be directed to NCBI to directly access sequence data in order to utilize this information for future analyses, as well as including the geocoding the locality coordinates of the organisms included in the study. These are only a few of many annotations that could be incorporated into TreeBASE to improve the utility of this resource. In order to expand the annotations of the data in TreeBASE, the submission process of TreeBASE data must be further refined to be more user-friendly. My project for Google Summer of Code 2011 will allow for the submission of phylogenetic data to TreeBASE so that both the data and metadata are exported in a way that would simplify the user’s interaction with the website. One example of an annotation that would be useful to the TreeBASE user community include linking sequence data to the Genbank accession number so one could be directed to NCBI to directly access sequence data in order to utilize this information for future analyses, as well as including the geocoding the locality coordinates of the organisms included in the study. These are only a few of many annotations that could be incorporated into TreeBASE to improve the utility of this resource. In order to expand the annotations of the data in TreeBASE, the submission process of TreeBASE data must be further refined to be more user-friendly. My project for Google Summer of Code 2011 will allow for the submission of phylogenetic data to TreeBASE so that both the data and metadata are exported in a way that would simplify the user’s interaction with the website.

Project Goals

By the end of the term, we would like to accomplish the following:

  1. Build core code base locally.

Timeline for Project Plan

4/25-5/22: Community Bonding Period
Goal 1: Build core code base locally. Completed
Comments: See blog for more details on building the code base. Familiarized myself with areas of code that I would be working on. This includes the NexmlObjectConverter class, along with its subclasses and superclasses. Took a look at the NexmlSerializationTest JUnit class, as well. Read about topics such as object-relational mapping with Hibernate and how Spring beans are configured to have other objects assigned to them automatically.
5/23-5/29: Week One
Focus on org.nexml.model.* sections of the code base. NeXML solution both for expressing CHARSET free text and for expressing the row-segment metadata like the Genbank accession number. It is currently only working for NEXUS. Develop unit test to get it working. Think about ways to add charsets and figure out how to implement. Get back to Rutger about whethor or not any changes to NeXML interface is necessary.
Comments: Studied code and starting to figure out how pieces fit together. In order to express row-segment metadata, I learned that I am going to be adding code to the NexmlMatrixConverter class.
5/30-6/05: Week Two
6/06-6/12: Week Three
6/13-6/19: Week Four
June 17-22: Attending Evolution & iEvoBio 2011 in Norman, OK
6/20-6/26: Week Five
6/27-7/03: Week Six
7/04-7/10: Week Seven
7/11-7/17: Week Eight
Mid-term Evaluations.
7/18-7/24: Week Nine
7/25-7/31: Week Ten
8/01-8/07: Week Eleven
8/08-8/14: Week Twelve
8/15: Pencils Down
Submit Final Evaluations.