From Phyloinformatics
Revision as of 12:53, 6 July 2010 by Kdiverson (talk) (Project Plan)
Jump to: navigation, search

phyloGeoRef: a geo-referencing library implemented in Java


Kathryn Iverson


The goal of this project is to create a geo-referencing library for parsing and displaying geographical and phylogenetic information. The main aim of the project is to bring together geographical and phylogenetic information in a way that is usable and useful to the user. Given an input of a tree file and geographical information the library will return a tree with phylogenetic and geographical information in a format such as KML or shapefile or a georeferenced phylogenetic format such as neXML.


Project Plan

Bonding period

Tasks: Gather sample tree files, KML files and other files used by the library; set up development environment; learn about geographical data files; catalog java tree and geospatial libraries Deliverables: psudocode for project -- what information needs to be extracted and how to store it; set up github account and integrate with Eclipse or other IDE, join mailing lists; study KML files; list of relevant java libraries

Approach and organization:

Start small and expand as needed. In this case I'm staring with the basic ability to take a tree file, deliminated file and Take a select few types of files to completion, then add more

A lot of the functions I need for this project have already been written and with a little modification they can be easily implemented in my library. This will allow me to be a little more creative and expand the scope of the library and what the library can do, beyond the original intention.

Week 1:

Define tree parser

Define csv parser

Define delimitated file parser, tsv and user defined

Start storeTree class

Week 2:

Finish storeTree class: Phylogeny array parser Define function to add latitude, longitude and altitude data to each node. this will probably change a little bit once the actual algorithm is implemented Define functions to parse shapefiles using GeoTools library Define function to add metadata to the tree -> should be an extension of, or similar to, the csv parser already implemented.

Week 3:

Implement algorithm from Jamies et al, nodeAltitude = a + ((n-1)*b) where a is the altitude of a node with only terminal leafs (based on the size of the tree), b is the altitude of nodes that are ancestral to more than one node and n is the number of node from the current node to the leaf.

Week 4:

Implement latitude and longitude algorithm. Leafs will get their lat/long data from the coord file.

Week 5:

Define kml writer using jak library. Most of the KML writer functions exist in this library so my job will be to make sure the data is populated correctly. A major part of this will be telling kml how to draw the trees, ie where to put branches (line or pathway) and nodes (points).

Week 6:

Work on KML writer -- get a tree in KML

Week 7: July 4th-10th

  • Improve KML writer and the appearance of the tree.
  • Write function for trees with multiple leafs of the same species.
  • Outline algorithms for large tress and trees with globally distributed leafs

Week 8: July 11-17th

  • Implement large tree and globally distributed tree algorithms

Week 9: July 18-24th

  • Add support for other file types such as neXML and possibly an SQL interface.

Week 10:July 25th-31st

  • Continue neXML and SQL support

Week 11: August 1st-9th

  • Testing and bugfixing

Week 12: August 10th-16th

  • Documentation


Janies, D., Hill, A.W., Habib, F., Guralnick, R., Waltari, E., Wheeler, W.C., 2007. Genomic analysis and geographic visualization of the spread of avian influenza (H5N1). Syst. Biol. 56, 321–329.