Based on Chapter 7 of ModernDive. Code for Quiz 11.
Question: 7.2.4 in Modern Dive with different sample sizes and repetitions
tidyverse
and the moderndive
packagesModify the code for comparing different sample sizes from the virtual bowl
bowl
data set. Assign the output to virtual_samples_30virtual_samples_30 <- bowl %>%
rep_sample_n(size = 30, reps = 1120)
virtual_prop_red_30 <- virtual_samples_30 %>%
group_by(replicate) %>%
summarize(red = sum(color == "red")) %>%
mutate(prop_red = red / 30)
ggplot(virtual_prop_red_30, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 30 balls that were red", title = "30")
virtual_samples_55 <- bowl %>%
rep_sample_n(size = 55, reps = 1120)
virtual_prop_red_55 <- virtual_samples_55 %>%
group_by(replicate) %>%
summarize(red = sum(color == "red")) %>%
mutate(prop_red = red / 55)
ggplot(virtual_prop_red_55, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 55 balls that were red", title = "55")
virtual_samples_114 <- bowl %>%
rep_sample_n(size = 114, reps = 1120)
virtual_prop_red_114 <- virtual_samples_114 %>%
group_by(replicate) %>%
summarize(red = sum(color == "red")) %>%
mutate(prop_red = red / 114)
ggplot(virtual_prop_red_114, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 114 balls that were red", title = "114")
Calculate the standard deviations for your three sets of 1120 values of prop_red
using the standard deviation
n = 30
# A tibble: 1 x 1
sd
<dbl>
1 0.0893
n = 55
# A tibble: 1 x 1
sd
<dbl>
1 0.0641
n = 114
# A tibble: 1 x 1
sd
<dbl>
1 0.0448
The distribution with sample size, n = 114, has the smallest standard deviation (spread) around the estimated proportion of red balls.