Difference between revisions of "PhyloSoC:Biodiversity Conservation Algorithms and GUI"

From Phyloinformatics
Jump to: navigation, search
(Klaas' concluding thoughts)
(Klaas' concluding thoughts)
Line 74: Line 74:
* Interface to C/C++
* Interface to C/C++
* Cross platform binary compilation (except Java)
* Cross platform binary compilation (except Java)
** ALthough that is possible, Perl also doesn't need cross platform binary compilation, does it (I think it's ok to require a perl interpreter)? Besides, for Java it is also possible to create installers (e.g. with [http://www.ej-technologies.com/products/install4j/overview.html install4j]) that bundle the JRE and therefore appear just like a native application to the end user.
As far as I can tell Python, Perl and Java all have several options for doing these things.
As far as I can tell Python, Perl and Java all have several options for doing these things.

Revision as of 16:51, 8 May 2007

This project is part of the 2007 Phyloinformatics Summer of Code which is part of the Google Summer of Code project. This web page will serve as the central resource for information relation to the project "Biodiversity conservation algorithms and GUI" that is being developed by Klaas Hartmann.


Project Overview

Klaas will implement various algorithms that utilise phylogenetic information to prioritise species for biodiversity conservation. A GUI will also be developed allowing these algorithms to be utilised by conservation managers. The overall goal is to provide a package that brings together as many existing algorithms and methods as possible and provides an interface between mathematical results and their intended final audience.

More detail at: SoC Application

Appropriate bio* package:

bioPerl is the package stated in the project proposal. Whilst this is most likely to remain the final choice it is worth reconsidering this briefly here as the final decision was a bit rushed when compiling the original project proposal.

  • Yes, and this was probably also partially influenced by my perception that Rutger would play a more significant role in this project than it appears he will now. Also, BioPerl is the most advanced Bio* toolkit and BioJava doesn't offer much functionality or object models to assist with this project, so if the project was to be written in Java then potentially it would utilise only a limited set of existing functionality from BioJava. It is unclear to me how widespread changes to the project plan the Google SoC rules allow at this point in the project. --Tobias.thierer 16:44, 8 May 2007 (EDT)



  • most comprehensive bio* package
  • Rutger's Bio::Phylo already implements many useful indices (which would otherwise need to be implemented)
  • Rutger is very familiar with this language


  • Perl has a tendency to encourage unclear code (Klaas' opinion)



  • Python (especially with scipy) is great for scientific computing (Klaas' personal opinion)
  • Klaas has a little experience with Python, wxPython and cross-platform GUIs
  • Tobias has only about 2 hours experience with Python (writing one script of probably less than 100 lines)




  • Tobias is very familiar with Java
  • Less need to worry about platform cross compatibility


This was suggested to Klaas by Arne Mooers


  • Existing GUI
  • Existing user base
  • Cross platform compatible (JAVA)


  • User base does not really overlap with target audience
  • GUI too extensive and complicated -- our features would be buried in it

Klaas' concluding thoughts

The existing code in the bio* packages that will effect this project are:

  • Tree objects
  • Routines for loading trees from files/databases
  • Routines for displaying trees/creating graphical representations
  • Existing implementations of indices (Bio::Phylo)

The extent to which these features are available in the Bio* packages is something I have yet to fully investigate (perhaps Rutger or Tobias have some ideas and could list these under advantages/disadvantages)?

  • BioJava doesn't have any of these but JEBL (Java Evolutionary Biology Library) can read and write trees from/to Nexus files, and has a fairly advanced tree viewer GUI component (I think this alone took longer to develop than this entire project aims to). I'm not entirely convinced of the JEBL tree model (it appears hard to create a good tree object model, at least in Java), but it shouldn't be too hard to write code for. I'm not sure what you mean with those species indices - I'll try calling you again in your office later today and we can discuss. --Tobias.thierer 16:49, 8 May 2007 (EDT)

The features of the programming languages themselves that are desirable are:

  • A good GUI library
  • Interface to C/C++
  • Cross platform binary compilation (except Java)
    • ALthough that is possible, Perl also doesn't need cross platform binary compilation, does it (I think it's ok to require a perl interpreter)? Besides, for Java it is also possible to create installers (e.g. with install4j) that bundle the JRE and therefore appear just like a native application to the end user.

As far as I can tell Python, Perl and Java all have several options for doing these things.

Given my personal interests and skill base my preference is bioPython followed by bioPerl. However there is some overlap between this project and Bio::Phylo, together with the likely further future overlap between these projects this provides a compelling reason to stick with bioPerl (despite my Python preference).

GUI Options

NB: This section is largely irrelevant if bioJava were chosen

The algorithms implemented in this project are aimed at two distinct user groups. The first group consists of researchers that are actively involved in developing, testing and comparing algorithms in this field. Most of these users will be accessing functions directly and not using the GUI. They will therefore need to have Perl and bioPerl installed anyway.

The second user group are conservation managers and other ecologists who simply want to apply these methods to some dataset. These users will want something that works out of the box with minimal effort -- they should not have to install Perl, bioPerl or anything else. The GUI should also work on all major platforms -- Windows, Linux, OS X (anything else that should be added to this list eg. FreeBSD?).

To this end there are two main options that can be pursued: creating binaries for all target platforms or creating a web interface with the calculations being performed on the server.

Creating Binaries


  • resulting code possibly faster
  • stable for long computations
  • program will remain available and functional in it's last form as long as the bio* website still operates.


  • producing binaries for several platforms may be a pain
  • will still omit some platforms (but users of strange platforms will presumably be used to compiling things from scratch and installing programs like Perl)
  • possible platform dependent bugs
  • unsure if incorporating C/C++ code will be problematic

Options for producing binaries

Web interface

Options include XUL::node (requires Firefox!!!) and Google GWT Toolkit.


  • full control over the environment in which the algorithms run
  • relatively simple interface construction


  • need to find a server to run this program on
  • the program may eventually stop working when development is no longer active and the server is upgraded/changed
  • if the program actually gets used this may cause excessive load on the server (particularly for large problems which may take days to solve)
  • possible stability issues for problems that take a long time to solve
  • less control over the interface design

Klaas' concluding thoughts

Creating binaries seems the most reliable method to me and provided all works well will provide a relatively painfree experience for the end user. If we identify this as a desirable option I will try building a simple GUI which calls C/C++ code for multiple platforms prior to writing any real code and commiting to that option.

Gateway to other code

Some algorithms have been and continue to be developed in C/C++. It would therefore be useful to have an interface from the package to C/C++ code. Options for doing this include SWIG and Inline::C.


  • may not need to implement some algorithms in Perl
  • if Perl versions of all algorithms are implemented comparisons with their C/C++ counterparts will prove useful
  • C/C++ algorithms may be faster for large problems
  • further work done by others in C/C++ can easily be incorporated


  • Need to ensure the C/C++ code is suitably compiled for all target platforms
  • May cause problems with cross-platform binaries for the GUI

Timeline of Goals for Klaas

I will spend the first half of my time on the project implementing the various algorithms that have been developed in the literature. The second half of the project will be spent developing a GUI, documenting the project and developing a test suite.

Weeks < 1

  • Prior to project start an appropriate GUI development package and cross platform compilation system should have been determined

Weeks 1

  • Determine how to integrate the project with BioPerl and Bio::Phylo
  • Determine how the GUI will be developed and where this fits in with BioPerl
  • Evaluate the data structure currently used to represent trees, a more efficient data structure may be required for some algorithms

Week 2

  • Implement any species specific indices not already included in the package (eg. Quadratic Entropy -> although I think there is Perl code for this somewhere)

Week 2,3,4,5

  • Implement algorithms for solving the NAP (various greedy and dynamic programming algorithms)

Week 6

  • Finalise implementation of algorithms and check their validity.
  • Plan the GUI and do a mock up

End week 6: Mid term evaluation

  • Most of the algorithms should be completed and a plan for the GUI should be available

Weeks 7,8,9

  • Implement the GUI
  • Distribute a preliminary version to peers for comment

Week 10

  • Act on comments from peers
  • Document the project

Week 11-12

  • Develop a set of test cases and test routines for the project
  • Tie up loose ends

Beyond SoC This work is of great interest to me and I will continue adding to this project as results become available.

Literature References

The following papers contain most of the current results regarding the NAP that this project would seek to implement:

My work:



Other work:

Moulton, V., Semple, C., Steel, M. Optimizing phylogenetic diversity under constraints. Journal of Theoretical Biology.

Minh, B. Q., S. Klaere, and A. von Haesler. 2006. Phylogenetic diversity within seconds. Systematic Biology in press.

Pardi, F. and N. Goldman. 2007. Resource aware taxon selection for maximising phylogenetic diversity. Systematic Biology in press.

Pardi, F. and N. Goldman. 2005. Species choice for comparative genomics: no need for cooperation. PLoS Genetics 1:e71.

Simianer, H., S.~Marti, J.~Gibson, O.~Hanotte, and J.~Rege. 2003. An approach to the optimal allocation of conservation funds to minimize loss of genetic diversity between livestock breeds. Ecological Economics 45:377--392.

Steel, M. 2005. Phylogenetic diversity and the greedy algorithm. Systematic Biology 54:527--529.

External Links

Original Proposal

Project blog

Source code