Difference between revisions of "R Hackathon 1/Current Data Representations"

From Phyloinformatics
Jump to: navigation, search
(ComPairWise)
(Laser)
Line 587: Line 587:
  
 
=== Laser ===
 
=== Laser ===
 +
Tree structure represented as a vector/matrix of branching times. Includes functions for reading Newick trees and converting to branching times.
  
 
=== OUCH ===
 
=== OUCH ===

Revision as of 01:22, 30 November 2007

On this page, for the test set of sample trees and datasets and any other relevant files, describe success/failure in loading the file, any information lost, and how the objects are represented in R (pending a different/better idea, we could use a similar format as used for the 'phylo' class (pdf) or the scheme for coding nucleotides (pdf)).

Summary

Package altnexus_simple.tre altnexus_treewts.tre nexus_rooted_brlen.tre nexus_simple.tre notes_multitrees.nex (has labels and notes) phylip.tre samplefile.nex (has trees and data)
ape ok WEIGHTS NOT STORED, BUT NO ERROR MESSAGE ok ok

WILL NOT LOAD

> notes_multitrees<-read.nexus(file="notes_multitrees.nex")
Error in edge[j, 1] <<- current.node : subscript out of bounds
ok tree ok, DATA WILL NOT LOAD
> samplefiledata<-read.nexus.data(file="samplefile.nex")
Error in read.nexus.data(file = "samplefile.nex") : 
	nexus parser does not handle spaces in sequences or taxon names 
(ts>2)
ade4 (1) ok ok but no weights tree ok, but labels of taxa not stored tree ok, but labels of taxa not stored tree 1 failed , others are ok but labels not stored trees ok trees ok, no data
apTreeshape

?

?

?

?

?

?

?

ComPairWise

?

?

?

?

?

?

?

Geiger

?

?

?

?

?

?

?

Laser

?

?

?

?

?

?

?

OUCH

?

?

?

?

?

?

?

PaleoTS

?

?

?

?

?

?

?

PhyloGR

?

?

?

?

?

?

?

PhySim

?

?

?

?

?

?

?

(1) no import from files : trees are copied/pasted as character strings to R and then converted.

DPUT files (created using dput command, then using indents and line feeds to make the structure more readable)

File ape ade4 apTreeshape ComPairWise Geiger Laser OUCH PaleoTS PhyloGR PhySim
altnexus_simple.tre
structure(
	list(
	edge = structure(
		c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7),
		.Dim = c(12L, 2L)
	), 
	tip.label = c("taxon_1", "taxon_3", "taxon_2", "taxon_6", "taxon_4", "taxon_5", "taxon_7"), Nnode = 6L), 
	.Names = c("edge", "tip.label", "Nnode"),
	class = "phylo", 
	origin = "/Users/bcomeara/Desktop/RHackTrees/altnexus_simple.tre"
)
?
?
?
?
?
?
?
?
?
altnexus_treewts.tre NOTE LACK OF WEIGHTS
structure(
	list(
		tree1 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 10, 11, 11, 9, 12, 12, 8, 1, 9, 10, 2, 11, 3, 4, 12, 5, 6, 7), 
					.Dim = c(11L, 2L)
				), 
				edge.length = c(0.157, 0.1775, 0.07063, 0.31937, 0.04062, 0.333, 0.287, 0.0875, 0.313, 0.257, 0.153), 
				Nnode = 5L,
				tip.label = c("taxon_1", "taxon_2", "taxon_5", "taxon_6", "taxon_3", "taxon_7", "taxon_4")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree2 = structure(
			list(
    				edge = structure(
					c(8, 8, 9, 10, 10, 9, 11, 12, 12, 11, 8, 1, 9, 10, 2, 3, 11, 12, 4, 5, 6, 7), 
					.Dim = c(11L, 2L)
				), 
   				edge.length = c(0.139, 0.20375, 0.06875, 0.37, 0.3, 0.04688, 0.00812, 0.323, 0.347, 0.34188, 0.131), 
				Nnode = 5L, 
				tip.label = c("taxon_1", "taxon_2", "taxon_6", "taxon_3", "taxon_7", "taxon_5", "taxon_4" )
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"),
			class = "phylo"
		), 
		tree3 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 10, 9, 11, 12, 12, 11, 8, 1, 9, 10, 2, 3, 11, 12, 4, 5, 6, 7),
					.Dim = c(11L, 2L)
				), 
				edge.length = c(0.128, 0.20583, 0.02417, 0.351, 0.359, 0.02937, 0.08938, 0.29, 0.3, 0.32563, 0.122), 
				Nnode = 5L, 
				tip.label = c("taxon_1", "taxon_2", "taxon_5", "taxon_3", "taxon_7", "taxon_6", "taxon_4")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree4 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 11, 11, 10, 9, 12, 12, 8, 1, 9, 10, 11, 2, 3, 4, 12, 5, 6, 7), 
					.Dim = c(11L, 2L)
				), 
				edge.length = c(0.167, 0.17083, 0.03125, 0.00375, 0.349, 0.331, 0.37125, 0.07417, 0.353, 0.267, 0.153), 
				Nnode = 5L, 
				tip.label = c("taxon_1", "taxon_2", "taxon_6", "taxon_5", "taxon_3", "taxon_7", "taxon_4")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree5 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 11, 11, 10, 9, 12, 12, 8, 1, 9, 10, 11, 2, 3, 4, 12, 5, 6, 7), 
					.Dim = c(11L, 2L)
				), 
				edge.length = c(0.109, 0.23417, 0.0375, 0.01, 0.348, 0.282, 0.36, 0.04583, 0.364, 0.316, 0.111), 
				Nnode = 5L, 
				tip.label = c("taxon_1", "taxon_2", "taxon_6", "taxon_5", "taxon_3", "taxon_7", "taxon_4")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		)
	), 
	.Names = c("tree1", "tree2", "tree3", "tree4", "tree5"), 
	class = c("multi.tree", "phylo"), 
	origin = "/Users/bcomeara/Desktop/RHackTrees/altnexus_treewts.tre"
)
?
?
?
?
?
?
?
?
?
nexus_rooted_brlen.tre
structure(
	list(
		edge = structure(
			c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7), 
			.Dim = c(12L, 2L)
		), 
		edge.length = c(12, 14, 28, 46, 26, 33, 32, 8, 27, 26, 41, 34), 
		Nnode = 6L, 
		tip.label = c("taxon_1", "taxon_3", "taxon_2", "taxon_6", "taxon_4", "taxon_5", "taxon_7"), 
		root.edge = 0
	), 
	.Names = c("edge", "edge.length", "Nnode", "tip.label", "root.edge"), 
	class = "phylo", 
	origin = "/Users/bcomeara/Desktop/RHackTrees/nexus_rooted_brlen.tre"
)
?
?
?
?
?
?
?
?
?
nexus_simple.tre
structure(
	list(
		edge = structure(
			c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7), 
			.Dim = c(12L, 2L)
		), 
		tip.label = c("taxon_1", "taxon_3", "taxon_2", "taxon_6", "taxon_4", "taxon_5", "taxon_7"), 
		Nnode = 6L
	), 
	.Names = c("edge", "tip.label", "Nnode"), 
	class = "phylo", 
	origin = "/Users/bcomeara/Desktop/RHackTrees/nexus_simple.tre"
)
?
?
?
?
?
?
?
?
?
notes_multitrees.nex

FAILURE

?
?
?
?
?
?
?
?
?
phylip.tre
structure(
	list(
		tree1 = structure(
			list(
				edge = structure(
					c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(3.63659246127713, 5.75644760333622, 0.606959935386656, 0.606959935386656, 0.850173315120816, 5.51323422360206, 5.51323422360206, 5.89200083829354, 4.10799916170646, 3.76052686209822, 0.347472299608242, 0.347472299608242), 
				Nnode = 6L, 
				tip.label = c("taxon_6", "taxon_2", "taxon_3", "taxon_1", "taxon_4", "taxon_7", "taxon_5")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree2 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 11, 12, 12, 11, 10, 9, 13, 13, 1, 9, 10, 11, 12, 2, 3, 4, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(10, 1.22766471256685, 2.02571277023826, 2.23491368116935, 2.14940475897520, 2.36230407705033, 2.36230407705033, 4.51170883602554, 6.74662251719489, 8.17630238667944, 0.596032900753702, 0.596032900753702), 
				Nnode = 6L, 
				tip.label = c("taxon_2", "taxon_5", "taxon_7", "taxon_3", "taxon_1", "taxon_4", "taxon_6")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		),
		tree3 = structure(
			list(
				edge = structure(
					c(8, 9, 10, 10, 11, 11, 9, 12, 12, 8, 13, 13, 9, 10, 1, 11, 2, 3, 12, 4, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(3.65768435814907, 2.51101256019392, 3.83130308165701, 2.46098556027136, 1.37031752138566, 1.37031752138566, 6.0950862882223, 0.247229353628625, 0.247229353628625, 8.01673281717233, 1.98326718282767, 1.98326718282767), 
				Nnode = 6L, 
				tip.label = c("taxon_2", "taxon_7", "taxon_3", "taxon_1", "taxon_4", "taxon_6", "taxon_5")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		)
	), 
	.Names = c("tree1", "tree2", "tree3"), 
	class = c("multi.tree", "phylo")
)
?
?
?
?
?
?
?
?
?
samplefile.nex

PARTIAL FAILURE: Data not loaded, tree is loaded (below)

structure(
	list(
		tree1 = structure(
			list(
				edge = structure(
					c(8, 9, 10, 10, 9, 11, 11, 8, 12, 12, 13, 13, 9, 10, 1, 2, 11, 3, 4, 12, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(3.63659246127713, 5.75644760333622, 0.606959935386656, 0.606959935386656, 0.850173315120816, 5.51323422360206, 5.51323422360206, 5.89200083829354, 4.10799916170646, 3.76052686209822, 0.347472299608242, 0.347472299608242), 
				Nnode = 6L, 
				tip.label = c("taxon_6", "taxon_2", "taxon_3", "taxon_1", "taxon_4", "taxon_7", "taxon_5")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree2 = structure(
			list(
				edge = structure(
					c(8, 8, 9, 10, 11, 12, 12, 11, 10, 9, 13, 13, 1, 9, 10, 11, 12, 2, 3, 4, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(10, 1.22766471256685, 2.02571277023826, 2.23491368116935, 2.14940475897520, 2.36230407705033, 2.36230407705033, 4.51170883602554, 6.74662251719489, 8.17630238667944, 0.596032900753702, 0.596032900753702), 
				Nnode = 6L, 
				tip.label = c("taxon_2", "taxon_5", "taxon_7", "taxon_3", "taxon_1", "taxon_4", "taxon_6")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		), 
		tree3 = structure(
			list(
				edge = structure(
					c(8, 9, 10, 10, 11, 11, 9, 12, 12, 8, 13, 13, 9, 10, 1, 11, 2, 3, 12, 4, 5, 13, 6, 7), 
					.Dim = c(12L, 2L)
				), 
				edge.length = c(3.65768435814907, 2.51101256019392, 3.83130308165701, 2.46098556027136, 1.37031752138566, 1.37031752138566, 6.0950862882223, 0.247229353628625, 0.247229353628625, 8.01673281717233, 1.98326718282767, 1.98326718282767), 
				Nnode = 6L, 
				tip.label = c("taxon_2", "taxon_7", "taxon_3", "taxon_1", "taxon_4", "taxon_6", "taxon_5")
			), 
			.Names = c("edge", "edge.length", "Nnode", "tip.label"), 
			class = "phylo"
		)
	), 
	.Names = c("tree1", "tree2", "tree3"), 
	class = c("multi.tree", "phylo"), 
	origin = "/Users/bcomeara/Desktop/RHackTrees/samplefile.nex"
)
?
?
?
?
?
?
?
?
?

Other info

Ade4

The S3 class phylog is a list containing the following items.

  • Items strictly representing the tree:
    • $tre: the tree in newick format
    • $leaves: a vector giving the distance to the closest HTU for each tip
    • $nodes: a vector giving the distance to the root for each node
    • $parts: a list giving the direct descendants for each HTU
    • $paths: a list giving the path (i.e. set of HTU) to the root for each OTU and HTU.
    • $droot: a vector of distances to the root for all TU
    • $call: the matched call of the object
  • Optional items, but provided by default, and possibly requiring a lot of memory for large phylogenies:
    • $Wmat: matrix of expected covariances among OTU under a Brownian motion model
    • $Wdist: matrix of square roots of distances between OTU (sums of branches lengths)
    • $Wvalues and $Wscores: eigen analysis of $Wdist
    • $Amat: matrix underlying Abouheif's test (seen as a Moran's I)
    • $Avalues and $Ascores: eigenanalysis of $Amat
    • $Adim: number of positive eigenvalues of $Amat
    • $Aparam: auxiliary infos about HTU (for internal use)
    • $Bindica: dummy variables associated to the topology
    • $Bscores: the orthonormalisation of $Bindica using QR decomposition
    • $Blabels: for each node, the name of the dummy vector associated to it

Notes:

  • This class is consistent with rooted trees, as several items only make sense in this case ($nodes, $droot, $paths, $Amat, ...).
  • $Bvalues is documented but no longer exists
  • in S4 paradigm (and, in fact, as well for S3), the optional items should be methods associated to the class, not components of the class itself
  • Conversion from newick format to phylog (newick2phylog) is only implemented for a character string. In other words, a file .tre cannot be read directly, and there is no parser to seek a tree from a NEXUS file.

--Jombart 05:23, 29 November 2007 (EST)

Ape

There are two main data classes in ape: "phylo" and "DNA.bin". They are both described in ape Web pages (->Development section). Some features of "phylo" are summarized in the R_Hackathon_1/Data Standards page. Both classes have associated functions to read/write files on the disk, manipulate them and compute with in R, and ways to pass them to C.

apTreeshape

A tree of class "treeshape" is a fully dichotomous binary tree. The purpose of the class "treeshape" is to study the topology of phylogenetic trees. The heights of branches are not provided for a tree of that class because we mainly focus on the balance aspect of the trees. The ’i’th row of the nodes matrix represents the children of the node number i in the tree (nodes[i,1] being the left child, and nodes[i,2] being the right child). A positive value represents an internal node, while a negative one stands for a tip of the tree. The last row always represents the children of the root of the tree.

ComPairWise

Reads sequence alignment data from Nexus and other formats, stores alignments in aln data structure and converts to various other formats (vector, matrix).

Geiger

Laser

Tree structure represented as a vector/matrix of branching times. Includes functions for reading Newick trees and converting to branching times.

OUCH

PaleoTS

PhyloGR

PhySim