Phylobase Hackathon

From Phyloinformatics
Jump to navigation Jump to search

Virtual Phylobase Hackathon

Held December 18-19 via IRC/Adobe connect/Skype/conference call

Participants: Ben Bolker, Peter Cowan, Francois Michonneau, Steve Kembel, Brian O'Meara, Hilmar Lapp, Klaus Scheipl, Marguerite Butler

To do list

Ben Bolker

Done

  • added @order slot to phylo4 class
  • node-naming convention -- default is now empty node and edge labels (character(0)): spent a lot of time fixing up code to be robust to empty labels. (passes R CMD check, but can't claim it has been thoroughly tested!)
  • I changed as(phylo4,data.frame) to use stringsAsFactors=FALSE so that labels (which are mostly unique) stay as strings rather than factors. However, this applies globally; should do this only for tip names, not for node type (which should be a factor)
  • make sure that ancestor(tree,"boguslabel") throws an appropriate error (and so on for other treewalking)
  • round-trip conversion from phylo to phylo4 and back to phylo does not quite produce an identical object (see as-methods.Rd) -- because edge matrix adds column names (strip these back off?)
  • defined head/tail methods (instead of automatically truncating printing)

To do

  • work on documentation for tree rules (done, but could always use more work: should be updated after we decide about label, etc.-ordering issues)
  • allow labels=="none" in summary of tdata (which means in as(tdata,data.frame) I think)??? summary currently provokes warnings that aren't the user's fault because the default is label="row.label" and nodes have empty labels by default
  • make sure that edge matrices are properly ordered when exporting to ape (requires cladewise reordering?)
  • uncover the error in the following code
   library(phylobase)
   example(tree.owls)
   z <- as(tree.owls,"phylo4")
   nodeLabels(z) <- as.character(nodeId(z))
   print(z)  ## note node labels
   z <- as(tree.owls,"phylo4")
   z@edge[3:4,] <- z@edge[4:3,]
   z@tip.label[1:2] <- z@tip.label[2:1]
   # labels come out in the same order (wrongly)
   print(z)

done, at least temporarily. labels() was scrambling order. Now reports in nodeId order, assuming that underlying slots (@tip.labels, @node.labels) are stored in edge-matrix order. These assumptions will have to be revisited -- see Steve's comments below)

Steven Kembel

Completed

  • Implement new "bound-root" version of edge matrix
    • The root now has a row in the edge matrix and in edge.length. Its ancestor in edge matrix is NA.
    • Need to fix tree checking for rooted and unrooted trees
      • Rooted trees have N nodes and N edges. One edge has NA ancestor.
      • Unrooted trees have N nodes and N-1 edges. None of the edges have NA ancestor.

To Do

  • New "bound-root" version of edge matrix
    • still needs testing, especially with unrooted trees
    • need to fix check and print methods for unrooted trees (should be barely working)
    • fix nodeId for unrooted trees
  • Metadata and annotations
    • Implement metadata and annotation slots to allow NeXML read/write
  • General
    • Sort out edge vs. node ordering and impact on print/summary/as methods (currently we mix these different orderings, leading to problems)
    • Check node counting (@Nnode) - current methods may only work correctly for rooted trees? (counting edges to get # nodes works differently for rooted vs. unrooted tree)

Marguerite Butler

  • strategic vision :) (another way to say "not very helpful":)
  • After the hackathon:
    • Develop more examples using phylobase code (vignette/tutorial)
      • I would really like to have some completely worked examples of actual comparative analyses (reproducing results from published studies).
      • This will involve using additional packages which I am not familiar with -- please send me code and I will be happy to work up a vignette illustrating how to use phylobase + your package = great comparative analysis
    • Helping with documentation

Peter Cowan

Finished

  • Merged plotting code and did basic fixes, for changes in tree data structure and reorder methods
  • Rewrote the tree reordering and changed name to the more proper 'postorder' added a preorder traversal as well
  • Changed the as(phylo4, 'phylo') method to add an attribute indicating order
  • Wrote help topic so that ?phylobase returns something useful
  • Finished the help file for reorder()

Still TODO

Brian O'Meara

Completed

  • ioNCL rename -- done (readNexus now rather than NexusTo*)

ToDo

  • NCL update -- I'm still having issues getting it to install consistently. The new version has nice features like handling mixed matrices and dealing with some issues with new GCC versions. I'll play with it more, but I don't think the CRAN release should be contingent on getting this working.

François Michonneau

Data associated with trees:

  • Make sure we don't use row names to assign data to nodes.
  • Develop functionalities to add additional data to a phylo4d object

Hilmar Lapp

  • Work on NexML functionality

Klaus Schliep

  • Tree manipulation (nni)
  • depends on tree ordering

Points for discussion/voting

  • we should develop a fairly firm list of release bug-fix/feature criteria soon ...
  • which of the following are needed for the 0.5 release on CRAN?

Ordering issues translating to/from ape

  • ordering and ape: there are several issues here.
    1. we export 'preorder' to ape as 'cladewise' (fine), 'postorder' to ape as 'pruningwise' -- that's not quite fine, apparently postorder!=pruningwise.
      • I looked into replicating the 'pruningwise' but couldn't understand the C code and there appear to be no algorithms online for a bottom-up traversal -- pdc a few bits of information about bottom-up ordering/traversal are in Vallente's book on Google books but I'm not sure how useful they are --Bbolker 17:17, 22 December 2008 (EST)
    2. for @order=='unknown', should we reorder to preorder/cladewise before exporting to ape? this or something like it seems necessary for safety.
      • I think for safety we should reorder unknown trees, and perhaps postorder ones as well -- we could always export in cladewise order -- pdc I am trying to balance my anal desire for clean round-tripping (and flexibility, as in Brian's example below) and safety. Can we leave the order alone for "unknown" and generate a warning? --Bbolker 17:17, 22 December 2008 (EST)
      • Wouldn't fixing ape (or having ape fixed) be a better solution? Can imagine users who want their trees in some order, perhaps use phylobase functions to do it, and then lose this when they output to geiger, ouch, etc. in phylo format just so that there's a workaround for a bug just in some parts of the ape package [for example, ouch has plotting functions, too]. I don't know how feasible it is to get ape fixed. --Bco 10:29, 22 December 2008 (EST) I think this is a vague specification in ape rather than a bug in the strict sense. Fixing all the algorithms in ape so that they are order-safe doesn't seem feasible. I suppose we could try to convince Emmanuel to use attr(x,"order") throughout and test for order in the non-safe places.
    3. [how] should we import ordering information from ape? order in ape is attr(phy,"order") -- if NULL then preorder/cladewise, can be explicitly 'cladewise' or 'pruningwise'. Should as(phylo,"phylo4") produce @order=='preorder' if attr(phylo,"phylo4") is either NULL or cladewise? (That could cause minor round-trip problems because a phylo object that starts with NULL will end up as 'cladewise' when re-exported; this problem would go away if we exported @order=='unknown' to NULL order.) What about if we have "postorder", which again is != 'pruningwise'?
    4. (not an ordering issue, but:) if phylobase allows singleton nodes, should we check for/throw an error/collapse them on export to phylo? (ape doesn't technically PROHIBIT these, since it has a collapse.singles function, but its API says that every node must have at least two descendants ...) [if we do throw an error, we should give the user a way to fix the problem -- which we could do with ape:::collapse.singles, but not if we absolutely prohibit singles -- this could be another case for a "OK, warn, fail" option -- but I don't think as() can have extra arguments ...]

Deprecate/drop: (a) reorder(phylo), printphylo, $

  • get rid of printphylo? this is an abbreviated print method, similar to ape's print.phylo, which is no longer used for anything. (Before throwing it away, double-check that it doesn't have features that aren't present in the summary method for phylo4 objects ...)
  • The $ hack for @ should probably be removed, it is unknown what if anything this will break
  • We currently have a reorder method for class phylo. This is a bit awkward because the ape reorder command which is called by this method takes different strings for its 'order' argument -- is there a reason we need to have this method? Should people be doing this in ape anyway? Maybe we can get rid of it --Bbolker 17:17, 22 December 2008 (EST)

Namespace

  • should we add a NAMESPACE file, pdc has written one that appears to work, but this could potentially be rejected or delayed until 0.6+

Naming conventions

  • naming conventions: are we going to follow through with our changes (singular where possible, thisCamelCase where possible)? Here is a naming table

Rordering conventions at @ and accessor levels

  • what is the convention for returning information about nodes? edge matrix order or nodeId order?? edgeLength does the former; nodeTypes does the latter; labels tries to do the former but is broken at the moment. Argument for nodeId order -- perhaps more transparent for end-user? Argument for edge matrix order -- at the moment probably more of our code does it that way, and we want to break as little as possible ... note distinction between accessors and underlying slot order! see Steve's notes above --Bbolker 17:17, 22 December 2008 (EST)

vcov/distance matrix translation

  • BMB has a function to convert from a phylogenetic variance-covariance matrix to a tree (the inverse function of vcv.phylo, which converts a tree to a phylog v-cov matrix). I (BMB) was thinking of including it in phylobase, perhaps as a tree-to-matrix coercion (as("matrix","phylo4")) -- the other could be included as a wrapper for ape:::vcv.phylo. Does this seem like a good idea? I guess my only concern is that there might be other kinds of matrix forms for trees -- would it be better to define a "phylomatrix" class instead?
      • the only other matrix I can think of is a distance matrix. I think this is a great idea (I'd find it helpful for my code). Perhaps different functions for vcv and distance matrices? Coercion from phylo4d would also be helpful so that something can be returned with data for each tip in the same order as the tips in the vcv. Perhaps this is a reason to do phylomatrix and phylomatrixd classes, where the kind of matrix ("distance","vcv") can be stored in a slot and there are accessors for the matrices and matrices+data --Bco 10:29, 22 December 2008 (EST)
      • thinking about it more, why not just have an accessor method that returns a VCV or distance matrix on a phylo4 or phylo4d object, rather than another class or coercion? --Bco 13:55, 22 December 2008 (EST) that's easy enough, but this really started when I was trying to think of a name for translation in the opposite direction ... perhaps the phylo4d() function could just take a matrix argument with another argument for the matrix type ... --Bbolker 17:17, 22 December 2008 (EST)
        • sorry, I misunderstood. If the input is a matrix, with a VCV the diagonal will mostly be positive (species have some variance due to some distance from the root), while with distance the matrix would have only zero on the diagonals (no distance between a species and itself), so the function could distinguish between these two types of matrices. --Bco 17:30, 22 December 2008 (EST)

Rooted vs. unrooted trees

  • most methods seem to assume a rooted tree but don't check for this (i.e. check_tree, treewalk methods, some node counting methods)
  • should we add checks for unrooted trees?
  • suggestion to treat unrooted trees by arbitrarily rooting them at some node and 'unrooting' them for summary purposes, would require a slot to indicate rootedness.

Miscellaneous

  • issues with feature request for warning about data loss on phylo4d to phylo4 conversion: hard (impossible?) to tell user-generated requests (which should give a warning) from implicit coercion (e.g. when using phylo4 methods such as hasNodeLabels on phylo4d objects) which shouldn't. See notes in setAs-methods.Rd
      • a kludge to do this would be as("phylo4nowarn","phylo4d") and as("phylo4","phylo4nowarn") for use internally (so, "as("phylo4",as("phylo4nowarn",input))" rather than "as("phylo4",input)"). But very inelegant. --Bco 10:33, 22 December 2008 (EST)

Design changes

  • assume that all code is order-agnostic: add an @order slot for functions that want to check whether re-ordering is necessary. When re-ordering, actually reorder the edge matrix (rather than having an internal index)
  • node numbering order follows ape rules (1:ntips are tips, ntips+1 is the root if any, the rest are internal).
  • tip labels must (?) exist, node labels need not. tip and node labels need not be unique but will throw errors if user tries to use them in matching data with tree
  • add root node to edge matrix explicitly, as in OUCH. Root node has NA for ancestor. It is legacy code's responsibility to strip the root node from the edge matrix. edge matrix is now one row longer for rooted trees (= ntips+nnodes, not ntips+nnodes-1!) as(x,"phylo") coercion modified
  • change names of ioNCL interface functions/merge them to readNexus
  • old plotting code will go away (tagged as version 0.4 in SVN), replaced by grid-based GSoC code


Misc

  • Use coding standards here, but with myRootedTreeString capitalization (rather than myRootedtreeString)
  • Do we have any output functions (save tree to file, etc.)? --Bco 13:16, 18 December 2008 (EST)