Homework 8

1. Think about an ongoing study in your lab (or a paper you have read in a different class), and decide on a pattern that you might expect in your experiment if a specific hypothesis were true.

I am currently looking at how canopy gaps affect primary productivity, specifically the changes in algal community composition. I expect an increase algal biomass with an increase in canopy gap fraction.

2. To start simply, assume that the data in each of your treatment groups follow a normal distribution. Specify the sample sizes, means, and variances for each group that would be reasonable if your hypothesis were true. You may need to consult some previous literature and/or an expert in the field to come up with these numbers.

Consulted literature - Stream algal biomass response to experimental phosphorus and nitrogen gradients: A case for dual nutrient management in agricultural watersheds

Looking at measurements in this paper, the benthic chl a (ug cm^-2) ranges between 0 and 2

Consulted literature - Spatial characteristics of canopy disturbances in riparian old-growth hemlock – northern hardwood forests, Adirondack Mountains, New York, USA

Slope = 0.2

3. Using the methods we have covered in class, write code to create a random data set that has these attributes. Organize these data into a data frame with the appropriate structure.

library(ggplot2)

beta0 <- 0.2   #create slope

# use gamma distribution to create biomass data based on above parameters
biomass <- rgamma(n = 20, shape = 1.25, scale = 0.55)
qplot(biomass,color=I("black"),fill=I("goldenrod"))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# create intercept 
interc <- rnorm(n=20, mean = 0.2, sd = 0.1)
                  
# creating y variable
gap_frac <- biomass*beta0 + interc

#create data frame
d_frame <- data.frame(biomass, gap_frac)

4. Now write code to analyze the data (probably as an ANOVA or regression analysis, but possibly as a logistic regression or contingency table analysis. Write code to generate a useful graph of the data.

model <- lm(gap_frac~biomass, data = d_frame) # created linear model

summary(model)

## 
## Call:
## lm(formula = gap_frac ~ biomass, data = d_frame)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.200445 -0.067194 -0.003373  0.097374  0.169274 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.16002    0.03732   4.288 0.000443 ***
## biomass      0.23652    0.05061   4.673 0.000189 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1094 on 18 degrees of freedom
## Multiple R-squared:  0.5482, Adjusted R-squared:  0.5231 
## F-statistic: 21.84 on 1 and 18 DF,  p-value: 0.0001891

lm_plot <- ggplot(data=d_frame) +
            aes(x=biomass,y=gap_frac) +
            geom_point() +
            stat_smooth(method=lm,se=0.95) 
print(lm_plot)

## `geom_smooth()` using formula 'y ~ x'

5. Try running your analysis multiple times to get a feeling for how variable the results are with the same parameters, but different sets of random numbers.

# Run code Second Time 
beta0 <- 0.2   #create slope

# use gamma distribution to create biomass data based on above parameters
biomass <- rgamma(n = 20, shape = 1.25, scale = 0.55)
qplot(biomass,color=I("black"),fill=I("goldenrod"))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# create intercept 
interc <- rnorm(n=20, mean = 0.2, sd = 0.1)
                  
# creating y variable
gap_frac <- biomass*beta0 + interc

#create data frame
d_frame <- data.frame(biomass, gap_frac)

model <- lm(gap_frac~biomass, data = d_frame) # created linear model
summary(model)

## 
## Call:
## lm(formula = gap_frac ~ biomass, data = d_frame)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28186 -0.04222  0.03784  0.06100  0.14338 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.16013    0.03181   5.035 8.62e-05 ***
## biomass      0.27102    0.03267   8.295 1.46e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1022 on 18 degrees of freedom
## Multiple R-squared:  0.7926, Adjusted R-squared:  0.7811 
## F-statistic:  68.8 on 1 and 18 DF,  p-value: 1.458e-07

lm_plot <- ggplot(data=d_frame) +
            aes(x=biomass,y=gap_frac) +
            geom_point() +
            stat_smooth(method=lm,se=0.95) 
print(lm_plot)

## `geom_smooth()` using formula 'y ~ x'

# Run code Third Time 
beta0 <- 0.2   #create slope

# use gamma distribution to create biomass data based on above parameters
biomass <- rgamma(n = 20, shape = 1.25, scale = 0.55)
qplot(biomass,color=I("black"),fill=I("goldenrod"))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# create intercept 
interc <- rnorm(n=20, mean = 0.2, sd = 0.1)
                  
# creating y variable
gap_frac <- biomass*beta0 + interc

#create data frame
d_frame <- data.frame(biomass, gap_frac)

model <- lm(gap_frac~biomass, data = d_frame) # created linear model
summary(model)

## 
## Call:
## lm(formula = gap_frac ~ biomass, data = d_frame)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.19915 -0.10138  0.01308  0.05472  0.24435 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.20906    0.03810   5.488 3.27e-05 ***
## biomass      0.20376    0.05243   3.887  0.00108 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1208 on 18 degrees of freedom
## Multiple R-squared:  0.4563, Adjusted R-squared:  0.4261 
## F-statistic:  15.1 on 1 and 18 DF,  p-value: 0.001081

lm_plot <- ggplot(data=d_frame) +
            aes(x=biomass,y=gap_frac) +
            geom_point() +
            stat_smooth(method=lm,se=0.95) 
print(lm_plot)

## `geom_smooth()` using formula 'y ~ x'

6/7. Now begin adjusting the means of the different groups. Given the sample sizes you have chosen, how small can the differences between the groups be (the “effect size”) for you to still detect a significant pattern (p < 0.05)?

beta0 <- 0.2   #create slope

# use gamma distribution to create biomass data based on above parameters
biomass <- rgamma(n = 8, shape = 1.25, scale = 0.55)
qplot(biomass,color=I("black"),fill=I("goldenrod"))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# create intercept 
interc <- rnorm(n = 8, mean = 0.2, sd = 0.1)
                  
# creating y variable
gap_frac <- biomass*beta0 + interc

#create data frame
d_frame <- data.frame(biomass, gap_frac)

model <- lm(gap_frac~biomass, data = d_frame) # created linear model
summary(model)

## 
## Call:
## lm(formula = gap_frac ~ biomass, data = d_frame)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.18077 -0.05960  0.01481  0.05775  0.19013 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.08700    0.07064   1.232   0.2642  
## biomass      0.30120    0.09117   3.304   0.0163 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1338 on 6 degrees of freedom
## Multiple R-squared:  0.6453, Adjusted R-squared:  0.5862 
## F-statistic: 10.91 on 1 and 6 DF,  p-value: 0.01633

lm_plot <- ggplot(data=d_frame) +
            aes(x=biomass,y=gap_frac) +
            geom_point() +
            stat_smooth(method=lm,se=0.95) 
print(lm_plot)

## `geom_smooth()` using formula 'y ~ x'

After running the code with multiple different sample sizes, n = 8 was the smallest size sample group I could use in order to select a significant pattern.

Home Page