PhyloSoC: Interoperable exchange of gene tree reconciliation maps

From Phyloinformatics
Jump to: navigation, search

Project Summary

I've written up a project summary for my google of code here:

File:Interoperable rec summary.pdf

Author

My name is Daniel Packer and I study computer science and bioinformatics at Hunter College. My email address is [dp] [at] [danielpacker] [dot] [org].

My GSOC blog posts are here.

Abstract

The goals of this project are:

  • Standardize the XML encoding of gene tree reconciliation data by extending an existing standard
  • Implement this encoding in a standard bioinformatics library (BioPerl)
  • Modify the iPlant GTR database to support import and export of GTR trees using the new encoding
  • If time permits, modify the iPlant tree visualization tool so that it internally uses the new encoding
  • If time permits, use iPlant's taxa resolution abilities to populate imported and exported data with taxonomic name strings

Source Code

Links

Project Plan

NOTE: this is based on the original project plan from my application and will be updated as I have more information.

Week 1 (May 24 - May 30)

  • Deliverables Planned
    • Write the XML data type definition for the GTR extensions so we can validate GTR input files.
  • Progress Reported
    • Communicate with mentors about code repository and blog
    • Update project template wiki page
    • First blog post (reflections on what's ahead this summer)
    • Establish iPlant and NESCent wiki access

Week 2 (May 31 - June 6)

  • Deliverables Planned
    • Post code repository and homepage to project template wiki page

(done by EOD today)

    • Check out iPlant source for review
    • Commit skeletal BioPerl TreeIO module
    • Review mentor's recommendations and thoughts on XML representations
    • Commit a preliminary tree reconciliation DTD
    • Continue to read relevant papers and learn more about tree

reconciliation (on my own time)

    • Tree reconciliation working group meeting (tomorrow, Tues 2PM)
    • Further define communication plan (schedule, method) with my

mentors, especially James Estill

Week 3 (June 7 - June 13)

  • Extend BioPerl::TreeIO module to generate GTR XML. James Estill has a BioPerl::TreeIO module that consumes PRIME data, so I can use his module to bootstrap this.

Week 4 (June 14 - June 20)

  • Extend BioPerl::TreeIO module to support parsing and consumption of GTR XML.

Week 5 (June 21 - June 27)

  • Set up skeletal functions in iPlant database and tests for GTR XML export via BioPerl.

Week 6 (June 28 - July 4)

  • Give iPlant database the ability to produce GTR XML via BioPerl. This will let us export the iPlant data in a portable way.

Week 7 (July 5 - July 11)

  • Set up skeletal functions in iPlant graphical interface and tests for internal use of GTR XML instead of the JSON data it's using now.

Week 8 (July 12 - July 18)

  • Implement internal use of GTR XML within iPlant graphical interface.

Week 9 (July 19 - July 25)

  • Use iPlant's internal taxa resolution system to export and import fully resolved data trees.

Week 10 (July 26 - August 1)

  • Continue implementing taxa resolution.

Week 11 (August 2 - August 8)

  • Code revisions, bug fixes.

Week 12 (August 9 - August 15)

  • Code revisions, bug fixes.

Notes

See my notes page.