PhyloSoC:phylogeoviz

From Phyloinformatics
Revision as of 15:13, 16 May 2007 by Panda linda (talk) (Detailed, Weekly Project Plan: added the weekly project plan)
Jump to: navigation, search

This project is part of the 2007 Phyloinformatics Summer of Code which is part of the Google Summer of Code project. This web page will serve as the central resource for information relation to the project "Visualizing Phylogeographic Information" that is being developed by Yi-Hsin Erica Tsai.

The goal of this project is to develop a web based application that generates geographic maps of DNA haplotype data that are often used in the course of phylogeographic analysis. The application will create maps of pie charts viewable through Google Earth that show the spatial distributions of each haplotype, the frequency of each haplotype in each population, and the number of samples included per population. This program may also be useful to people outside of evolutionary biology; anyone who has need of visualizing frequency data on maps may find this application helpful.

Example output from the viewer. The genetic composition of each population is reflected in pie charts showing the proportion of each haplotype present.

News

Project Overview

Phylogeography has enjoyed an explosion of data from research on migration patterns of organisms to studies of population genetics and population structure. However, there is still no easy way to generate maps of DNA haplotype frequency data. Imagine a map with all sample locations marked, and centered on each location a pie (or stacked bar) chart is visible showing the frequency of each haplotype within the population. The size of each pie is proportional to the amount of samples genotyped in that population. Often these maps are drawn by hand in Adobe Illustrator or other difficult to use, proprietary map drawing programs (e.g. ArcGIS). In addition, this procedure becomes unfeasible with larger data sets. This method does not lend itself to viewing and analyzing multiple data sets simultaneously as is becoming more common in comparative phylogeography. This software package implements such a viewer. This program would have broader applications than just to genetic data, any sort of frequency based information with a geographical component (e.g. % of sunny, rainy, snowy days) could be visualized.

The product would be a web based application with ties to Google Maps or Google Earth. The web application would include a data manager that would export KML. The KML would be used within the browser for visualization using Google Maps or could be exported to integrate with Google Earth. There are three main components to develop. First, the data manager is needed to import, edit, and export data. Second, a visualization tool will generate the phylogeographic maps. Third, the visualization tool will be expanded to display multiple datasets simultaneously; for instance to compare haplotype frequency distributions of multiple loci or haplotype frequencies of multiple species. This program will allow manipulation of data within the application (e.g. grouping all rare haplotypes together, or only showing a subset of the populations) to generate new phylogeographic maps without need for creating and loading new input files. The goal for the program is to allow for easy visualization of phylogeographic data on a map and to facilitate subsequent spatial data analysis.

Want more information? See the full project proposal.

Overall Project Plan

Phase 0: Getting development environment set up

  • How to use subversion.
  • Tutorials in PHP.
  • Install PHP and Apache, try it out on my laptop (create my development environment).
  • Learn how to download and get working a PHP application. Learn how web apps work.
  • Get a "hello world" type program running.

Phase 1: Exploratory phase

  • Learn how to embed maps on a webpage.
  • Learn the relationship between Google Earth and Google Maps.
  • How are they the same, how are they different? What can you do with one that you can't do with the other?
  • Explore KML and general XML.
  • Explore Google Earth and Google Maps APIs.

Phase 2: Finalize design

  • Page by page description of what the user sees.
  • How to input data. Are they going to upload files, input in a text box? What format?
  • How to export data. Format? Data persistence? Can users store data, results, maps, etc.?
  • What is the viewer? Google earth? Google maps?

Phase 3: Creating a functional prototype for the viewer

  • Write an application that:
    • Reads in sample data.
    • Generates the appropriate KML.
    • Displays the data on a map.

Phase 4: Creating a data manager

  • Write an application that:
    • Imports data.
    • Manipulates data (dynamically?).
    • Export map data or KML or some other format.

Phase 5: Integrate the viewer with the data manager

Detailed Project Plan

Dates refer to the beginning of that work week. Any comments are appreciated!

Now til Start:

  • Phase 0Getting development environment set up
  • Set up homepage, wiki, repositories, etc.

May 28

  • Phase 1: Exploratory phase
    • Learn how to embed maps on a webpage.
    • Learn the relationship between Google Earth and Google Maps.
    • How are they the same, how are they different? What can you do with one that you can't do with the other?
    • Explore KML and general XML.
    • Explore Google Earth and Google Maps APIs.
    • Create a pie chart using KML.

June 4

  • Finish exploratory work if neccessary.
  • Phase 2: Finalize design
    • Page by page description of what the user sees.
    • How to input data. Are they going to upload files, input in a text box? What format?
    • How to export data. Format? Data persistence? Can users store data, results, maps, etc.?
    • What is the viewer? Google earth? Google maps?
    • How large are the pie charts going to be in comparison with the geography? How do we deal with the problem of overlapping pie charts?
    • How are we going to color the pie charts? What if there are large numbers of haplotypes, how do we color them all distinctly and usefully?
    • The results of these decisions will be a comprehensive design document posted on the wiki.

June 11

  • Implement the basic pie chart generation functionality. Functionality should include:
    • Basic import of data.
    • Basic KML output writer.
    • Function that draws a pie chart.
    • Function that plots objects (working up to pie charts, but starting with placemarks) on a map.
  • Write corresponding documentation.

June 18

  • Combine pie chart generation and chart plotting functionality.
  • Write the functions that allow adjustments to the output (e.g. changing pie sizes, allowing the user to change haplotype colors, etc).
  • Write corresponding documentation.

June 18

  • Work on a function that allows the user to move pie charts around spatially (and to save those movements).
  • Write corresponding documentation.

June 25

  • Write the functions that display the KML back to the browser.
  • Write corresponding documentation.

July 2

  • Prepare for Botany conference.
  • Get code submitted to Google for midterm code check in.
  • Get a prototype of the viewer available for download.
  • Make sure I'm meeting the midterm evaluation criteria

July 9

  • Work on bugs that arose from earlier code.
  • Revisit full data manager design. Finalize UI design.

July 16

  • Implement UI for the data import functionality.
  • Expand (?) data files that are acceptable (haplotype, genotype, etc.)
  • Write the corresponding documentation.

July 23

  • Implement UI for customizing data analysis.
    • Example: the user should be able to select what loci/alleles/populations to include/exclude in the analyses.

July 30

  • Implement UI for output data manipulation.
    • Example: the user should be able to change the relative pie sizes, move pies around, change haplotype colors, etc.

August 6

  • Implement functions that allow the user to save the map visualization (e.g. jpg) or to save the KML file.

August 13

  • Perform user tests.
  • Ensure that the viewer and the data manager are well integrated.
  • Deposit code with Google.
  • Update website with new product, and all documentation.

August 20

  • Done coding. Final evaluations.