ADA1: Class 02, Introduction to R and RStudio

Advanced Data Analysis 1, Stat 427/527, Fall 2025

Author

Your Name

Published

August 25, 2025

Rubric

The context of this assignment comes from OpenIntro Labs for R and tidyverse:

This is a template for the assignment. Modify this and turn it in.

# before loading the packages, you need to install them first
# run the following code in the lower left Console panel
# install.packages(tidyverse)
# install.packges(openintro) 
library(tidyverse)
library(openintro)

(0 p) 1. Dataset arbuthnot, Counts of girls

What command would you use to extract just the counts of girls baptized? Try it out in the console!

Solution: The command that I use to extract just the counts of girls baptized is

# use results="hide"to hide the output
arbuthnot$girls

(1 p) 2. arbuthnot, Trend of girls

Is there an apparent trend in the number of girls baptized over the years? How would you describe it? (To ensure that your lab report is comprehensive, be sure to include the code needed to make the plot as well as your written interpretation.)

Solution: I use the following code to plot the counts of girls baptized versus year. Code and graphs are listed below. The resulting plot shows a U-shaped trend: an initial decline from about 4,683 in 1629 to around 2,700–2,800 in 1649–1651, followed by a period of moderate fluctuations in the mid-1600s, and then a steady increase. There is a brief dip to 5,738 in 1704 before the numbers continue rising to a peak of 7779 in 1705. Overall, despite the early decline, the long-term trend in the number of girls baptized is upward.

p <- ggplot(data = arbuthnot, aes(x = year, y = girls))
p <- p + geom_point()
p <- p + geom_line()
p

(2 p) 3. arbuthnot, Proportion of boys

Now, generate a plot of the proportion of boys born over time. What do you see?

Write the text of your answer here…

# Insert code for Exercise 3 here

(1 p) 4. present, Data properties

What years are included in this data set? What are the dimensions of the data frame? What are the variable (column) names?

Write the text of your answer here…

# Insert code for Exercise 4 here

(1 p) 5. present, Frequency scale

How do these counts compare to Arbuthnot’s? Are they of a similar magnitude?

Write the text of your answer here…

# Insert code for Exercise 5 here

(3 p) 6. present, Proportion of boys in the US

Make a plot that displays the proportion of boys born over time. What do you see? Does Arbuthnot’s observation about boys being born in greater proportion than girls hold up in the U.S.? Include the plot in your response. Hint: You should be able to reuse your code from Exercise 3 above, just replace the name of the data frame.

Write the text of your answer here…

# Insert code for Exercise 6 here

(2 p) 7. present, Year with max births

In what year did we see the most total number of births in the U.S.? Hint: First calculate the totals and save it as a new variable. Then, sort your dataset in descending order based on the total column. You can do this interactively in the data viewer by clicking on the arrows next to the variable names. To include the sorted result in your report you will need to use two new functions. First we use arrange() to sorting the variable. Then we can arrange the data in a descending order with another function, desc(), for descending order. The sample code is provided below.

Write the text of your answer here…

# Insert code for Exercise 7 here