My data set looks at the length to mass ratio of yellow perch caught in Lake Champlain. To create this data set, I first pulled real values from 71 fish from my experimental data.
kbdata <- read.csv(file="kbsampledata.csv",header=TRUE, sep = ",")
kblength <- kbdata$length
kbmass <- kbdata$mass
From there, I ran a linear regression on the data to generate an r-squared value.
lin_reg <- lm(kbmass~kblength)
summary(lin_reg)
##
## Call:
## lm(formula = kbmass ~ kblength)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.8853 -3.3778 -0.3226 2.7534 30.6221
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -119.25372 7.26540 -16.41 <2e-16 ***
## kblength 0.98838 0.04097 24.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.355 on 69 degrees of freedom
## Multiple R-squared: 0.894, Adjusted R-squared: 0.8924
## F-statistic: 581.8 on 1 and 69 DF, p-value: < 2.2e-16
plot(kbdata)
I saved the standard deviation and mean from both the length and mass
data to apply to a random normal distribution with a sample size of
71.
length_mean<- mean(kblength)
mass_mean <- mean(kbmass)
length_sd <- sd(kblength)
mass_sd <- sd(kbmass)
kblength_rnorm <- rnorm(71, mean=length_mean, sd=length_sd)
kbmass_rnorm <- rnorm(71, mean=mass_mean, sd=mass_sd)
To alter this data set, I chose to vary the sample size and compare their r-squared values. I put these new sample sizes in the vector n. These new sample sizes represent the number of fish caught at each site.
n <- c(10,20,50,100)
Here is my for loop and the results yielded from it.
for (i in 1:4) {
new_length<- rnorm(n[i], mean=length_mean, sd=length_sd)
new_mass <- rnorm(n[i], mean=mass_mean, sd=mass_sd)
loop_linreg <- lm(new_mass~new_length)
sumstats<- summary(loop_linreg)
print(sumstats)
}
##
## Call:
## lm(formula = new_mass ~ new_length)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.0175 -8.1784 -0.3196 8.7124 22.1944
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.422089 51.778097 1.090 0.308
## new_length -0.002441 0.284971 -0.009 0.993
##
## Residual standard error: 13.31 on 8 degrees of freedom
## Multiple R-squared: 9.173e-06, Adjusted R-squared: -0.125
## F-statistic: 7.338e-05 on 1 and 8 DF, p-value: 0.9934
##
##
## Call:
## lm(formula = new_mass ~ new_length)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.500 -10.381 -1.957 12.043 32.074
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 77.9303 33.9864 2.293 0.0341 *
## new_length -0.1801 0.1967 -0.915 0.3721
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.35 on 18 degrees of freedom
## Multiple R-squared: 0.04449, Adjusted R-squared: -0.008599
## F-statistic: 0.838 on 1 and 18 DF, p-value: 0.3721
##
##
## Call:
## lm(formula = new_mass ~ new_length)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.870 -9.110 -0.972 11.418 25.752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.7069 28.1258 0.594 0.555
## new_length 0.2102 0.1567 1.341 0.186
##
## Residual standard error: 14.94 on 48 degrees of freedom
## Multiple R-squared: 0.03613, Adjusted R-squared: 0.01605
## F-statistic: 1.799 on 1 and 48 DF, p-value: 0.1861
##
##
## Call:
## lm(formula = new_mass ~ new_length)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.088 -13.018 -0.085 9.659 32.043
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.2372 19.3399 1.357 0.178
## new_length 0.1621 0.1087 1.491 0.139
##
## Residual standard error: 14.99 on 98 degrees of freedom
## Multiple R-squared: 0.02218, Adjusted R-squared: 0.0122
## F-statistic: 2.223 on 1 and 98 DF, p-value: 0.1392