PhyloSoC:Biodiversity Conservation Algorithms and GUI

From Phyloinformatics
Revision as of 06:47, 8 May 2007 by Xarvia (talk)
Jump to: navigation, search

This project is part of the 2007 Phyloinformatics Summer of Code which is part of the Google Summer of Code project. This web page will serve as the central resource for information relation to the project "Biodiversity conservation algorithms and GUI" that is being developed by Klaas Hartmann.


Project Overview

Klaas will implement various algorithms that utilise phylogenetic information to prioritise species for biodiversity conservation. A GUI will also be developed allowing these algorithms to be utilised by conservation managers. The overall goal is to provide a package that brings together as many existing algorithms and methods as possible and provides an interface between mathematical results and their intended final audience.

More detail at: SoC Application

Appropriate bio* package:

bioPerl is the package stated in the project proposal. Whilst this is most likely to remain the final choice it is worth reconsidering this briefly here as the final decision was a bit rushed when compiling the original project proposal.



  • most comprehensive bio* package
  • Rutger's Bio::Phylo already implements many useful indices (which would otherwise need to be implemented)
  • Rutger is very familiar with this language


  • Perl has a tendency to encourage unclear code (Klaas' opinion)



  • Python (especially with scipy) is great for scientific computing (Klaas' personal opinion)
  • Klaas has a little experience with Python, wxPython and cross-platform GUIs




  • Tobias is very familiar with Java
  • Less need to worry about platform cross compatibility


This was suggested to Klaas by Arne Mooers


  • Existing GUI
  • Existing user base
  • Cross platform compatible (JAVA)


  • User base does not really overlap with target audience
  • GUI too extensive and complicated -- our features would be buried in it

Klaas' concluding thoughts

The existing code in the bio* packages that will effect this project are:

  • Tree objects
  • Routines for loading trees from files/databases
  • Routines for displaying trees/creating graphical representations
  • Existing implementations of indices (Bio::Phylo)

The extent to which these features are available in the Bio* packages is something I have yet to fully investigate (perhaps Rutger or Tobias have some ideas and could list these under advantages/disadvantages)?

The features of the programming languages themselves that are desirable are:

  • A good GUI library
  • Interface to C/C++
  • Cross platform binary compilation (except Java)

As far as I can tell Python, Perl and Java all have several options for doing these things.

Given my personal interests and skill base my preference is bioPython followed by bioPerl. However there is some overlap between this project and Bio::Phylo, together with the likely further future overlap between these projects this provides a compelling reason to stick with bioPerl (despite my Python preference).

GUI Options

NB: This section is largely irrelevant if bioJava were chosen

The algorithms implemented in this project are aimed at two distinct user groups. The first group consists of researchers that are actively involved in developing, testing and comparing algorithms in this field. Most of these users will be accessing functions directly and not using the GUI. They will therefore need to have Perl and bioPerl installed anyway.

The second user group are conservation managers and other ecologists who simply want to apply these methods to some dataset. These users will want something that works out of the box with minimal effort -- they should not have to install Perl, bioPerl or anything else. The GUI should also work on all major platforms -- Windows, Linux, OS X (anything else that should be added to this list eg. FreeBSD?).

To this end there are two main options that can be pursued: creating binaries for all target platforms or creating a web interface with the calculations being performed on the server.

Creating Binaries

Several methods exist for creating binaries from Perl code. It would be desirable for the chosen method to allow creation of binaries for multiple platforms on a single given platform (unsure if this is possible). Options include PerlBin, App:Packer, perlcc and pp.


  • resulting code possibly faster
  • stable for long computations
  • program will remain available and functional in it's last form as long as the bio* website still operates.


  • producing binaries for several platforms may be a pain
  • will still omit some platforms (but users of strange platforms will presumably be used to compiling things from scratch and installing programs like Perl)
  • possible platform dependent bugs
  • unsure if incorporating C/C++ code will be problematic

Web interface

Options include XUL::node (requires Firefox!!!) and Google GWT Toolkit.


  • full control over the environment in which the algorithms run
  • relatively simple interface construction


  • need to find a server to run this program on
  • the program may eventually stop working when development is no longer active and the server is upgraded/changed
  • if the program actually gets used this may cause excessive load on the server (particularly for large problems which may take days to solve)
  • possible stability issues for problems that take a long time to solve
  • less control over the interface design

Klaas' concluding thoughts

Creating binaries seems the most reliable method to me and provided all works well will provide a relatively painfree experience for the end user. If we identify this as a desirable option I will try building a simple GUI which calls C/C++ code for multiple platforms prior to writing any real code and commiting to that option.

Gateway to other code

Some algorithms have been and continue to be developed in C/C++. It would therefore be useful to have an interface from the package to C/C++ code. Options for doing this include SWIG and Inline::C.


  • may not need to implement some algorithms in Perl
  • if Perl versions of all algorithms are implemented comparisons with their C/C++ counterparts will prove useful
  • C/C++ algorithms may be faster for large problems
  • further work done by others in C/C++ can easily be incorporated


  • Need to ensure the C/C++ code is suitably compiled for all target platforms
  • May cause problems with cross-platform binaries for the GUI

Timeline of Goals

I will spend the first half of my time on the project implementing the various algorithms that have been developed in the literature. The second half of the project will be spent developing a GUI, documenting the project and developing a test suite.

Weeks < 1

  • Prior to project start an appropriate GUI development package and cross platform compilation system should have been determined

Weeks 1

  • Determine how to integrate the project with BioPerl and Bio::Phylo
  • Determine how the GUI will be developed and where this fits in with BioPerl
  • Evaluate the data structure currently used to represent trees, a more efficient data structure may be required for some algorithms

Week 2

  • Implement any species specific indices not already included in the package (eg. Quadratic Entropy -> although I think there is Perl code for this somewhere)

Week 2,3,4,5

  • Implement algorithms for solving the NAP (various greedy and dynamic programming algorithms)

Week 6

  • Finalise implementation of algorithms and check their validity.
  • Plan the GUI and do a mock up

End week 6: Mid term evaluation

  • Most of the algorithms should be completed and a plan for the GUI should be available

Weeks 7,8,9

  • Implement the GUI
  • Distribute a preliminary version to peers for comment

Week 10

  • Act on comments from peers
  • Document the project

Week 11-12

  • Develop a set of test cases and test routines for the project
  • Tie up loose ends

Beyond SoC

This work is of great interest to me and I would continue adding to this project as results become available.

Literature References

The following papers contain most of the current results regarding the NAP that this project would seek to implement:

My work:

Other work:

Moulton, V., Semple, C., Steel, M. Optimizing phylogenetic diversity under constraints. Journal of Theoretical Biology.

Minh, B. Q., S. Klaere, and A. von Haesler. 2006. Phylogenetic diversity within seconds. Systematic Biology in press.

Pardi, F. and N. Goldman. 2007. Resource aware taxon selection for maximising phylogenetic diversity. Systematic Biology in press.

Pardi, F. and N. Goldman. 2005. Species choice for comparative genomics: no need for cooperation. PLoS Genetics 1:e71.

Simianer, H., S.~Marti, J.~Gibson, O.~Hanotte, and J.~Rege. 2003. An approach to the optimal allocation of conservation funds to minimize loss of genetic diversity between livestock breeds. Ecological Economics 45:377--392.

Steel, M. 2005. Phylogenetic diversity and the greedy algorithm. Systematic Biology 54:527--529.

External Links

Original Proposal

Project blog

Source code