# Difference between revisions of "R Hackathon 1/PGLS"

Line 3: | Line 3: | ||

PGLS is a powerful method for analyzing continuous data that has been applied to estimating adaptive optima (Butler and King 2004) and estimating the relationships among traits (e.g., body size and geographic range size in carnivores). PGLS allows the user to specify different ways in which the tree structure is expected to affect the covariance in trait values across taxa. For example, the user might assume that the trait evolves by Brownian motion and thus that the trait covariance between any pair of taxa decreases linearly with the time (in branch length) since their divergence. Alternately, the user might apply a Ornstein-Uhlenbeck model where the expected covariance decreases exponentially, as governed by the parameter alpha (Martins and Hansen 1997). These methods are implemented in the ape package. | PGLS is a powerful method for analyzing continuous data that has been applied to estimating adaptive optima (Butler and King 2004) and estimating the relationships among traits (e.g., body size and geographic range size in carnivores). PGLS allows the user to specify different ways in which the tree structure is expected to affect the covariance in trait values across taxa. For example, the user might assume that the trait evolves by Brownian motion and thus that the trait covariance between any pair of taxa decreases linearly with the time (in branch length) since their divergence. Alternately, the user might apply a Ornstein-Uhlenbeck model where the expected covariance decreases exponentially, as governed by the parameter alpha (Martins and Hansen 1997). These methods are implemented in the ape package. | ||

− | Fitting a | + | '''Fitting a Brownian Motion model in PGLS''' |

Let's return to the ''Geospiza'' dataset (within the geiger package) to try PGLS. We assume that you have already loaded the necessary packages (geiger for the data and ape for the function) as described on [https://www.nescent.org/wg_phyloinformatics/R_Hackathon/TransitionProbability this page]. Let's say we want to test whether there is a significant relationship between wing length and tarsus length, accounting for possible dependence among the data points (trait values) due to phylogenetic relatedness. | Let's return to the ''Geospiza'' dataset (within the geiger package) to try PGLS. We assume that you have already loaded the necessary packages (geiger for the data and ape for the function) as described on [https://www.nescent.org/wg_phyloinformatics/R_Hackathon/TransitionProbability this page]. Let's say we want to test whether there is a significant relationship between wing length and tarsus length, accounting for possible dependence among the data points (trait values) due to phylogenetic relatedness. | ||

Line 31: | Line 31: | ||

The summary shows you that the log likelihood of this model is 9.6 and its AIC score is -13.21. We also see that the effect of tarsus length on wing length is 1.37, and their relationship is signficant (p=0.001). | The summary shows you that the log likelihood of this model is 9.6 and its AIC score is -13.21. We also see that the effect of tarsus length on wing length is 1.37, and their relationship is signficant (p=0.001). | ||

+ | |||

+ | '''Fitting an Ornstein-Uhlenbeck Motion model in PGLS''' | ||

+ | |||

+ | Again we will first build the correlation structure, this time assuming if the traits evolve as expected under the Ornstein-Uhlenbeck process with a variance-restraining parameter, alpha. Ape automatically estimates the best fitting value of alpha for your data. | ||

+ | |||

+ | ou.geospiza<-corBrownian(phy=pruned.tree) | ||

+ | ou.gls<-gls(wingL~tarsusL,correlation=bm.geospiza,data=DF.geospiza) | ||

+ | summary(bm.gls) | ||

Now let's try a model with | Now let's try a model with | ||

library(geiger)Now we can fit a gen | library(geiger)Now we can fit a gen |

## Revision as of 14:51, 13 December 2007

**Phylogenetic Generalized Least Squares**

PGLS is a powerful method for analyzing continuous data that has been applied to estimating adaptive optima (Butler and King 2004) and estimating the relationships among traits (e.g., body size and geographic range size in carnivores). PGLS allows the user to specify different ways in which the tree structure is expected to affect the covariance in trait values across taxa. For example, the user might assume that the trait evolves by Brownian motion and thus that the trait covariance between any pair of taxa decreases linearly with the time (in branch length) since their divergence. Alternately, the user might apply a Ornstein-Uhlenbeck model where the expected covariance decreases exponentially, as governed by the parameter alpha (Martins and Hansen 1997). These methods are implemented in the ape package.

**Fitting a Brownian Motion model in PGLS**

Let's return to the *Geospiza* dataset (within the geiger package) to try PGLS. We assume that you have already loaded the necessary packages (geiger for the data and ape for the function) as described on this page. Let's say we want to test whether there is a significant relationship between wing length and tarsus length, accounting for possible dependence among the data points (trait values) due to phylogenetic relatedness.

First, let's prepare our tree for the analysis. Our original tree has 14 taxa but we only have data for 13, so we must prune the tree. We can take out the species for which we have no data (olivacea) as follows:

data(geospiza) attach(geospiza) pruned.tree<-drop.tip(geospiza.tree,"olivacea")

Now let's create a data frame containing just our traits of interest, with the row.names matching the tip.labels. Most users bring their data in from a tab-delimited table, for example by typing:

geospiza<-read.table("geospiza.txt",row.names=1)

where the row names are the taxon names. NOTE: you must associate the taxon names with the trait values so that the values can be correctly tied to the tips of the tree! See also this page. With the dataset in active memory, we can create the dataframe as follows:

wingL<-geospiza.data$wingL tarsusL<-geospiza.data$tarsusL DF.geospiza<-data.frame(wingL,tarsusL,row.names=pruned.tree$tip.label) DF.geospiza

When you typed the final command, you will see that you have a data.frame where the row names are the tip.labels in the pruned tree. Now we will first build the correlation structure expected if the traits evolve by Brownian motion and fit a generalized least squares model assuming this correlation structure.

bm.geospiza<-corBrownian(phy=pruned.tree) bm.gls<-gls(wingL~tarsusL,correlation=bm.geospiza,data=DF.geospiza) summary(bm.gls)

The summary shows you that the log likelihood of this model is 9.6 and its AIC score is -13.21. We also see that the effect of tarsus length on wing length is 1.37, and their relationship is signficant (p=0.001).

**Fitting an Ornstein-Uhlenbeck Motion model in PGLS**

Again we will first build the correlation structure, this time assuming if the traits evolve as expected under the Ornstein-Uhlenbeck process with a variance-restraining parameter, alpha. Ape automatically estimates the best fitting value of alpha for your data.

ou.geospiza<-corBrownian(phy=pruned.tree) ou.gls<-gls(wingL~tarsusL,correlation=bm.geospiza,data=DF.geospiza) summary(bm.gls)

Now let's try a model with library(geiger)Now we can fit a gen