PhyloSoC:phylogeoviz

From Phyloinformatics
Jump to: navigation, search

This project is part of the 2007 Phyloinformatics Summer of Code which is part of the Google Summer of Code project. This web page will serve as the central resource for information relation to the project "Visualizing Phylogeographic Information" that is being developed by Yi-Hsin Erica Tsai.

The latest release of the application can be found at http://phylogeoviz.org/.

The goal of this project is to develop a web based application that generates geographic maps of DNA haplotype data that are often used in the course of phylogeographic analysis. The application will create maps of pie charts viewable through Google Earth that show the spatial distributions of each haplotype, the frequency of each haplotype in each population, and the number of samples included per population. This program may also be useful to people outside of evolutionary biology; anyone who has need of visualizing frequency data on maps may find this application helpful.

Example output from the viewer. The genetic composition of each population is reflected in pie charts showing the proportion of each haplotype present.

Related Sites

News

  • Added the design document to the wiki. --Erica 17:36, 18 June 2007 (EDT)
  • Got a source code repository set up. It's hosted with Google. --Erica 10:48, 25 May 2007 (EDT)
  • Added detailed project plan. Comments are appreciated! --Erica 15:15, 16 May 2007 (EDT)
  • This page started --Erica 21:35, 13 May 2007 (EDT)
  • Started a Project Blog --Erica 17:04, 27 April 2007 (EDT)

Project Overview

Phylogeography has enjoyed an explosion of data from research on migration patterns of organisms to studies of population genetics and population structure. However, there is still no easy way to generate maps of DNA haplotype frequency data. Imagine a map with all sample locations marked, and centered on each location a pie (or stacked bar) chart is visible showing the frequency of each haplotype within the population. The size of each pie is proportional to the amount of samples genotyped in that population. Often these maps are drawn by hand in Adobe Illustrator or other difficult to use, proprietary map drawing programs (e.g. ArcGIS). In addition, this procedure becomes unfeasible with larger data sets. This method does not lend itself to viewing and analyzing multiple data sets simultaneously as is becoming more common in comparative phylogeography. This software package implements such a viewer. This program would have broader applications than just to genetic data, any sort of frequency based information with a geographical component (e.g. % of sunny, rainy, snowy days) could be visualized.

The product would be a web based application with ties to Google Maps or Google Earth. The web application would include a data manager that would export KML. The KML would be used within the browser for visualization using Google Maps or could be exported to integrate with Google Earth. There are three main components to develop. First, the data manager is needed to import, edit, and export data. Second, a visualization tool will generate the phylogeographic maps. Third, the visualization tool will be expanded to display multiple datasets simultaneously; for instance to compare haplotype frequency distributions of multiple loci or haplotype frequencies of multiple species. This program will allow manipulation of data within the application (e.g. grouping all rare haplotypes together, or only showing a subset of the populations) to generate new phylogeographic maps without need for creating and loading new input files. The goal for the program is to allow for easy visualization of phylogeographic data on a map and to facilitate subsequent spatial data analysis.

Want more information? See the full project proposal.

Overall Project Plan

Phase 0: Getting development environment set up

  • How to use subversion.
  • Tutorials in PHP.
  • Install PHP and Apache, try it out on my laptop (create my development environment).
  • Learn how to download and get working a PHP application. Learn how web apps work.
  • Get a "hello world" type program running.

Phase 1: Exploratory phase

  • Learn how to embed maps on a webpage.
  • Learn the relationship between Google Earth and Google Maps.
  • How are they the same, how are they different? What can you do with one that you can't do with the other?
  • Explore KML and general XML.
  • Explore Google Earth and Google Maps APIs.

Phase 2: Finalize design

  • Page by page description of what the user sees.
  • How to input data. Are they going to upload files, input in a text box? What format?
  • How to export data. Format? Data persistence? Can users store data, results, maps, etc.?
  • What is the viewer? Google earth? Google maps?

Phase 3: Creating a functional prototype for the viewer

  • Write an application that:
    • Reads in sample data.
    • Generates the appropriate KML.
    • Displays the data on a map.

Phase 4: Creating a data manager

  • Write an application that:
    • Imports data.
    • Manipulates data (dynamically?).
    • Export map data or KML or some other format.

Phase 5: Integrate the viewer with the data manager

Detailed Project Plan

Dates refer to the beginning of that work week. Any comments are appreciated!

Now til Start:

  • Phase 0Getting development environment set up
  • Set up homepage, wiki, repositories, etc.

May 28

  • Phase 1: Exploratory phase
    • Learn how to embed maps on a webpage.
    • Learn the relationship between Google Earth and Google Maps.
    • How are they the same, how are they different? What can you do with one that you can't do with the other?
    • Explore KML and general XML.
    • Explore Google Earth and Google Maps APIs.
    • Create a pie chart using KML.

June 4

  • Finish exploratory work if neccessary.
  • Phase 2: Finalize design
    • Page by page description of what the user sees.
    • How to input data. Are they going to upload files, input in a text box? What format?
    • How to export data. Format? Data persistence? Can users store data, results, maps, etc.?
    • What is the viewer? Google earth? Google maps?
    • How large are the pie charts going to be in comparison with the geography? How do we deal with the problem of overlapping pie charts?
    • How are we going to color the pie charts? What if there are large numbers of haplotypes, how do we color them all distinctly and usefully?
    • The results of these decisions will be a comprehensive design document posted on the wiki.

June 11

  • Implement the basic pie chart generation functionality. Functionality should include:
    • Basic import of data.
    • Basic KML output writer.
    • Function that draws a pie chart.
    • Function that plots objects (working up to pie charts, but starting with placemarks) on a map.
  • Write corresponding documentation.

June 18

  • Combine pie chart generation and chart plotting functionality.
  • Write the functions that allow adjustments to the output (e.g. changing pie sizes, allowing the user to change haplotype colors, etc).
  • Write corresponding documentation.

June 18

  • Work on a function that allows the user to move pie charts around spatially (and to save those movements).
  • Write corresponding documentation.

June 25

  • Write the functions that display the KML back to the browser.
  • Write corresponding documentation.

July 2

  • Prepare for Botany conference.
  • Get code submitted to Google for midterm code check in.
  • Get a prototype of the viewer available for download.
  • Make sure I'm meeting the midterm evaluation criteria

July 9

  • Work on bugs that arose from earlier code.
  • Revisit full data manager design. Finalize UI design.

July 16

  • Implement UI for the data import functionality.
  • Expand (?) data files that are acceptable (haplotype, genotype, etc.)
  • Write the corresponding documentation.

July 23

  • Implement UI for customizing data analysis.
    • Example: the user should be able to select what loci/alleles/populations to include/exclude in the analyses.

July 30

  • Implement UI for output data manipulation.
    • Example: the user should be able to change the relative pie sizes, move pies around, change haplotype colors, etc.

August 6

  • Implement functions that allow the user to save the map visualization (e.g. jpg) or to save the KML file.

August 13

  • Perform user tests.
  • Ensure that the viewer and the data manager are well integrated.
  • Deposit code with Google.
  • Update website with new product, and all documentation.

August 20

  • Done coding. Final evaluations.

Design Document

Flow-chart-400.png

Page 1: The start page

On the first page the user selects how he/she'd like to input his/her data.

Input-page-400.png

Page 2: The manual data input page

If the user chooses to manually input the data, he/she sees this:

Man-input-400.png

The default numbers of haplotypes and populations are set to 10. Users can update these values to get the appropriate number of rows and columns in the data matrix. Unless the data matrix is small, the user will likely have to scroll within the table to input all the data. To facilitate this the population, lat, and long columns and the header row will be frozen. After the data is saved, the application checks the input.

Page 3: The data management page

Once the data pass validation the user is allowed to edit what data are included/excluded from the analyses. The purpose of this page is to allow the user to include/exclude populations and/or haplotypes. By default all populations and haplotypes are included.

Data-management-400.png

Page 4: The map preview page

After any edits to the include/exclude data the user is taken to a page that previews their map and allows edits to the visualization. There are 3 possible visualizations: 1) Show just the sampling localities. On this option, the map options 'haplotype color', 'pie size (absolute)', and 'pie size (relative)' are grayed out. 2) Display with circles relative to the sample size for each locality. In this case, the map option 'marker appearance' is grayed out. 3) Display full haplotype information for each locality. In this case, the map option 'marker appearance' is grayed out.

When editing any of the map options a panel will pop out with options for that task. Map option definitions:

   * marker appearance: select what icons you want to identify each population
   * haplotype color: select colors for each haplotype
   * pie size (absolute): set the max diameter for each circle or pie
   * pie size (relative): choose if pies are all the same size or relative to the sample size; allows the user to set the bounds on the sample size bins

The map itself is fully functional. Users can zoom in, pan, and access satellite imagery as they can in other google maps applications. Furthermore, the users can click on drag any of the markers, circles, or pies to reposition them. This should be very useful especially for avoiding overlapping pies. If there's time, I'd like to add a button here for 'auto-fix overlapping pies', where the application detects collisions and repositions the pies for the user.

Below the preview screen is the legend. It shows the current color of each haplotype as well as the relative circle sizes and their corresponding sample sizes.

I know on the mock-up the preview screen is fairly small. However, the page will scale with the window, so it should be big enough for most folks. I will also consider moving the 'map options' below the map, so the map can be bigger. I was thinking, though, that the user might find it annoying to constantly scroll up and down to see the effects of the 'map options'.

Map-page-400.png

Page 5: The view, save, and export page

From this page the user can export the visualization in multiple ways. It's important for the user to do the repositioning, coloring, and other editing work here before exporting to Google Earth. It's not possible to click and drag polygons in Google Earth as it is in Google Maps. If the user selects the .jpg option, they will be prompted to choose either saving the map and legend together or separately. The other formats handle the map and legend separately anyways. I'll consider exporting in other formats as time permits (for instance in Adobe Illustrator format).

Export-400.png

Page 6: The error page

Error handling: If the application has trouble reading the input file, then the validation will fail and generate this error page. From this page the user is then directed back to either uploading or manually inputting their data.

Error-page-400.png