R Hackathon 1/Current Data Representations

From Phyloinformatics
Revision as of 06:26, 29 November 2007 by Jombart (talk) (Summary)
Jump to: navigation, search

On this page, for the test set of sample trees and datasets and any other relevant files, describe success/failure in loading the file, any information lost, and how the objects are represented in R (pending a different/better idea, we could use a similar format as used for the 'phylo' class (pdf) or the scheme for coding nucleotides (pdf)).

Summary

Package altnexus_simple.tre altnexus_treewts.tre nexus_rooted_brlen.tre nexus_simple.tre notes_multitrees.nex (has labels and notes) phylip.tre samplefile.nex (has trees and data)
ape

ok

WEIGHTS NOT STORED, BUT NO ERROR MESSAGE

ok

ok

WILL NOT LOAD

("
> notes_multitrees<-read.nexus(file="notes_multitrees.nex")
Error in edge[j, 1] <<- current.node : subscript out of bounds
")

ok

tree ok, DATA WILL NOT LOAD ("
> samplefiledata<-read.nexus.data(file="samplefile.nex")
Error in read.nexus.data(file = "samplefile.nex") : 
	nexus parser does not handle spaces in sequences or taxon names 
(ts>2)
")
ade4 (1)

?

?

?

?

?

?

?

apTreeshape

?

?

?

?

?

?

?

ComPairWise

?

?

?

?

?

?

?

Geiger

?

?

?

?

?

?

?

Laser

?

?

?

?

?

?

?

OUCH

?

?

?

?

?

?

?

PaleoTS

?

?

?

?

?

?

?

PhyloGR

?

?

?

?

?

?

?

PhySim

?

?

?

?

?

?

?

(1) import after 'manual' extraction of the tree from the file into a character string (i.e. no parsing of the files).

DPUT files (created using dput command, then using indents and line feeds to make the structure more readable)

File ape ade4 apTreeshape ComPairWise Geiger Laser OUCH PaleoTS PhyloGR PhySim
altnexus_simple.tre
structure(
	list(
	edge = structure(
		c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7),
		.Dim = c(12L, 2L)
	), 
	tip.label = c("taxon_1", "taxon_3", "taxon_2", "taxon_6", "taxon_4", "taxon_5", "taxon_7"), Nnode = 6L), 
	.Names = c("edge", "tip.label", "Nnode"),
	class = "phylo", 
	origin = "/Users/bcomeara/Desktop/RHackTrees/altnexus_simple.tre"
)
?
?
?
?
?
?
?
?
?
altnexus_treewts.tre NOTE LACK OF WEIGHTS
structure(
	list(
		tree1 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 10, 11, 11, 9, 12, 12, 8, 1, 9, 10, 2, 11, 3, 4, 12, 5, 6, 7), 
					.Dim = c(11L, 2L)
				), 
				edge.length = c(0.157, 0.1775, 0.07063, 0.31937, 0.04062, 0.333, 0.287, 0.0875, 0.313, 0.257, 0.153), 
				Nnode = 5L,
				tip.label = c("taxon_1", "taxon_2", "taxon_5", "taxon_6", "taxon_3", "taxon_7", "taxon_4")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree2 = structure(
			list(
    				edge = structure(
					c(8, 8, 9, 10, 10, 9, 11, 12, 12, 11, 8, 1, 9, 10, 2, 3, 11, 12, 4, 5, 6, 7), 
					.Dim = c(11L, 2L)
				), 
   				edge.length = c(0.139, 0.20375, 0.06875, 0.37, 0.3, 0.04688, 0.00812, 0.323, 0.347, 0.34188, 0.131), 
				Nnode = 5L, 
				tip.label = c("taxon_1", "taxon_2", "taxon_6", "taxon_3", "taxon_7", "taxon_5", "taxon_4" )
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"),
			class = "phylo"
		), 
		tree3 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 10, 9, 11, 12, 12, 11, 8, 1, 9, 10, 2, 3, 11, 12, 4, 5, 6, 7),
					.Dim = c(11L, 2L)
				), 
				edge.length = c(0.128, 0.20583, 0.02417, 0.351, 0.359, 0.02937, 0.08938, 0.29, 0.3, 0.32563, 0.122), 
				Nnode = 5L, 
				tip.label = c("taxon_1", "taxon_2", "taxon_5", "taxon_3", "taxon_7", "taxon_6", "taxon_4")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree4 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 11, 11, 10, 9, 12, 12, 8, 1, 9, 10, 11, 2, 3, 4, 12, 5, 6, 7), 
					.Dim = c(11L, 2L)
				), 
				edge.length = c(0.167, 0.17083, 0.03125, 0.00375, 0.349, 0.331, 0.37125, 0.07417, 0.353, 0.267, 0.153), 
				Nnode = 5L, 
				tip.label = c("taxon_1", "taxon_2", "taxon_6", "taxon_5", "taxon_3", "taxon_7", "taxon_4")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree5 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 11, 11, 10, 9, 12, 12, 8, 1, 9, 10, 11, 2, 3, 4, 12, 5, 6, 7), 
					.Dim = c(11L, 2L)
				), 
				edge.length = c(0.109, 0.23417, 0.0375, 0.01, 0.348, 0.282, 0.36, 0.04583, 0.364, 0.316, 0.111), 
				Nnode = 5L, 
				tip.label = c("taxon_1", "taxon_2", "taxon_6", "taxon_5", "taxon_3", "taxon_7", "taxon_4")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		)
	), 
	.Names = c("tree1", "tree2", "tree3", "tree4", "tree5"), 
	class = c("multi.tree", "phylo"), 
	origin = "/Users/bcomeara/Desktop/RHackTrees/altnexus_treewts.tre"
)
?
?
?
?
?
?
?
?
?
nexus_rooted_brlen.tre
structure(
	list(
		edge = structure(
			c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7), 
			.Dim = c(12L, 2L)
		), 
		edge.length = c(12, 14, 28, 46, 26, 33, 32, 8, 27, 26, 41, 34), 
		Nnode = 6L, 
		tip.label = c("taxon_1", "taxon_3", "taxon_2", "taxon_6", "taxon_4", "taxon_5", "taxon_7"), 
		root.edge = 0
	), 
	.Names = c("edge", "edge.length", "Nnode", "tip.label", "root.edge"), 
	class = "phylo", 
	origin = "/Users/bcomeara/Desktop/RHackTrees/nexus_rooted_brlen.tre"
)
?
?
?
?
?
?
?
?
?
nexus_simple.tre
structure(
	list(
		edge = structure(
			c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7), 
			.Dim = c(12L, 2L)
		), 
		tip.label = c("taxon_1", "taxon_3", "taxon_2", "taxon_6", "taxon_4", "taxon_5", "taxon_7"), 
		Nnode = 6L
	), 
	.Names = c("edge", "tip.label", "Nnode"), 
	class = "phylo", 
	origin = "/Users/bcomeara/Desktop/RHackTrees/nexus_simple.tre"
)
?
?
?
?
?
?
?
?
?
notes_multitrees.nex

FAILURE

?
?
?
?
?
?
?
?
?
phylip.tre
structure(
	list(
		tree1 = structure(
			list(
				edge = structure(
					c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(3.63659246127713, 5.75644760333622, 0.606959935386656, 0.606959935386656, 0.850173315120816, 5.51323422360206, 5.51323422360206, 5.89200083829354, 4.10799916170646, 3.76052686209822, 0.347472299608242, 0.347472299608242), 
				Nnode = 6L, 
				tip.label = c("taxon_6", "taxon_2", "taxon_3", "taxon_1", "taxon_4", "taxon_7", "taxon_5")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree2 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 11, 12, 12, 11, 10, 9, 13, 13, 1, 9, 10, 11, 12, 2, 3, 4, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(10, 1.22766471256685, 2.02571277023826, 2.23491368116935, 2.14940475897520, 2.36230407705033, 2.36230407705033, 4.51170883602554, 6.74662251719489, 8.17630238667944, 0.596032900753702, 0.596032900753702), 
				Nnode = 6L, 
				tip.label = c("taxon_2", "taxon_5", "taxon_7", "taxon_3", "taxon_1", "taxon_4", "taxon_6")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		),
		tree3 = structure(
			list(
				edge = structure(
					c(8, 9, 10, 10, 11, 11, 9, 12, 12, 8, 13, 13, 9, 10, 1, 11, 2, 3, 12, 4, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(3.65768435814907, 2.51101256019392, 3.83130308165701, 2.46098556027136, 1.37031752138566, 1.37031752138566, 6.0950862882223, 0.247229353628625, 0.247229353628625, 8.01673281717233, 1.98326718282767, 1.98326718282767), 
				Nnode = 6L, 
				tip.label = c("taxon_2", "taxon_7", "taxon_3", "taxon_1", "taxon_4", "taxon_6", "taxon_5")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		)
	), 
	.Names = c("tree1", "tree2", "tree3"), 
	class = c("multi.tree", "phylo")
)
?
?
?
?
?
?
?
?
?
samplefile.nex

PARTIAL FAILURE: Data not loaded, tree is loaded (below)

structure(
	list(
		tree1 = structure(
			list(
				edge = structure(
					c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(3.63659246127713, 5.75644760333622, 0.606959935386656, 0.606959935386656, 0.850173315120816, 5.51323422360206, 5.51323422360206, 5.89200083829354, 4.10799916170646, 3.76052686209822, 0.347472299608242, 0.347472299608242), 
				Nnode = 6L, 
				tip.label = c("taxon_6", "taxon_2", "taxon_3", "taxon_1", "taxon_4", "taxon_7", "taxon_5")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree2 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 11, 12, 12, 11, 10, 9, 13, 13, 1, 9, 10, 11, 12, 2, 3, 4, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(10, 1.22766471256685, 2.02571277023826, 2.23491368116935, 2.14940475897520, 2.36230407705033, 2.36230407705033, 4.51170883602554, 6.74662251719489, 8.17630238667944, 0.596032900753702, 0.596032900753702), 
				Nnode = 6L, 
				tip.label = c("taxon_2", "taxon_5", "taxon_7", "taxon_3", "taxon_1", "taxon_4", "taxon_6")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree3 = structure(
			list(
				edge = structure(
					c(8, 9, 10, 10, 11, 11, 9, 12, 12, 8, 13, 13, 9, 10, 1, 11, 2, 3, 12, 4, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(3.65768435814907, 2.51101256019392, 3.83130308165701, 2.46098556027136, 1.37031752138566, 1.37031752138566, 6.0950862882223, 0.247229353628625, 0.247229353628625, 8.01673281717233, 1.98326718282767, 1.98326718282767), 
				Nnode = 6L, 
				tip.label = c("taxon_2", "taxon_7", "taxon_3", "taxon_1", "taxon_4", "taxon_6", "taxon_5")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		)
	), 
	.Names = c("tree1", "tree2", "tree3"), 
	class = c("multi.tree", "phylo"), 
	origin = "/Users/bcomeara/Desktop/RHackTrees/samplefile.nex"
)
?
?
?
?
?
?
?
?
?

Other info

Ade4

The S3 class phylog is a list containing the following items.

  • Items strictly representing the tree:
    • $tre: the tree in newick format
    • $leaves: a vector giving the distance to the closest HTU for each tip
    • $nodes: a vector giving the distance to the root for each node
    • $parts: a list giving the direct descendants for each HTU
    • $paths: a list giving the path (i.e. set of HTU) to the root for each OTU and HTU.
    • $droot: a vector of distances to the root for all TU
    • $call: the matched call of the object
  • Optional items, but provided by default, and possibly requiring a lot of memory for large phylogenies:
    • $Wmat: matrix of expected covariances among OTU under a Brownian motion model
    • $Wdist: matrix of square roots of distances between OTU (sums of branches lengths)
    • $Wvalues and $Wscores: eigen analysis of $Wdist
    • $Amat: matrix underlying Abouheif's test (seen as a Moran's I)
    • $Avalues and $Ascores: eigenanalysis of $Amat
    • $Adim: number of positive eigenvalues of $Amat
    • $Aparam: auxiliary infos about HTU (for internal use)
    • $Bindica: dummy variables associated to the topology
    • $Bscores: the orthonormalisation of $Bindica using QR decomposition
    • $Blabels: for each node, the name of the dummy vector associated to it

Notes:

  • This class is consistent with rooted trees, as several items only make sense in this case ($nodes, $droot, $paths, $Amat, ...).
  • $Bvalues is documented but no longer exists
  • in S4 paradigm (and, in fact, as well for S3), the optional items should be methods associated to the class, not components of the class itself
  • Conversion from newick format to phylog (newick2phylog) is only implemented for a character string. In other words, a file .tre cannot be read directly, and there is no parser to seek a tree from a NEXUS file.

--Jombart 05:23, 29 November 2007 (EST)

Ape

There are two main data classes in ape: "phylo" and "DNA.bin". They are both described in ape Web pages (->Development section). Some features of "phylo" are summarized in the R_Hackathon_1/Data Standards page. Both classes have associated functions to read/write files on the disk, manipulate them and compute with in R, and ways to pass them to C.

apTreeshape

ComPairWise

Geiger

Laser

OUCH

PaleoTS

PhyloGR

PhySim