The Reproducibility Crisis and R Scripts
We'll cover the following
The reproducibility crisis
Scripting is a huge advance over point-and-click, but we can go further and use the R Markdown package to produce ‘analysis notebooks’ to better record and explain what we have done and why. The aim of this chapter is to make a short example of a script and then to convert it to an R Markdown document. Due to space limitations, the example is deliberately minimal and just demonstrates how to combine (knit) together some short input code, R output (output text, figures, and tables), and some narrative text. To avoid the need for extra packages it uses only those that come with the base distribution of R, including the plot() function from the base graphics package rather than using ggplot2. Luckily, the authors of the R Markdown package have writ- ten books about it (using R Markdown, or rather its offshoot R Bookdown), that are freely available on line in web page form
R script
A minimal script of the combined lines of R code with some comments would look something like this:
# Title: Atmospheric carbon dioxide concentrations # Author: Andy Hector Summary statistics for
# concentrations of atmospheric CO2
summary(co2)
# Line graph of changes in atmospheric carbon dioxide # (1959 - 1997)
plot(co2)
To write an R script, we need to stop writing commands directly into the console (where they are lost once we close R)—instead, we can open a new window for the script using the RStudio menu options File -> New File -> R script. The result is just a text file, but saving it from within RStudio will assign it a .R extension. Double-clicking it will open it within R or RStudio rather than a default text file reader (you can specify which application you want to open files with the .R extension within the system preferences—we are using RStudio, so set it to that rather than R). Saving an R script allows us, or anyone else, to reproduce an analysis (just as the methods in a scientific paper should allow others to recreate our scientific studies) so long as we make both the script and the data available. However, readers still need both the script and the data, and have to run the script (this could be time-consuming with a complex analysis and a big data set). A complementary approach is to use the R script to produce a reproducible research document that guides the reader through the analysis process.
Analysis notebooks
The reproducibility crisis suggests that our current methods of analysis leave a lot to be desired. One problem is the poor documenting of how the analysis was done. In some ways, it is odd how casual we are about documenting our analyses. For example, in areas of science where the details of the research could be examined in court (as part of a patent dispute for example), keeping detailed lab notebooks is compulsory, some- times with very strict protocols for how they must be filled out. Why not apply similar standards to how data analysis is performed? Writing scripts is a big contribution to reproducible research, but we can go further and produce ‘analysis notebooks’ that document the analysis process in a similar way to that in which lab notebooks record what was done at the bench. The R Markdown package currently provides the easiest way to generate reproducible research documents from R (R Markdown uses the earlier knitr package, which followed the even earlier sweave). Outside of R, popular options include Jupyter notebooks. The idea is to produce documents that knit (weave) together the input code and the software- generated output (text, tables, and figures) with our own text to produce an understandable narrative of the analysis process from start to finish. As a brief example, let’s take the example R script given above and convert it into an analysis notebook using R Markdown.
R markdown
R Markdown documents are produced by the package of the same name (in conjunction with other R packages and software). The R Markdown package is included with RStudio, so we should not need to install the package—it will be used when needed (we don’t even need to use the library() function to activate it). A new R Markdown document can be opened using the menu options File -> New file -> R Markdown, which will open a template for a new R Markdown document where you can give a title and author (HTML is the recommended default choice)
The template that opens is not blank—it attempts to help explain how R Markdown works. Unfortunately, it is more complex than it needs to be. At the top is a header (in YAML) that simply reproduces the information you entered when creating the file a moment ago. I suggest you delete everything below the header (which starts and ends with lines of three dashes).
One of the big differences between an R Markdown document and a script is that in an R Markdown document the R code is segregated into ‘chunks’. You can insert a new empty chunk (ready to hold R code) using
Get hands-on with 1400+ tech skills courses.