Reproducibility

Reproducibility is the ability of an entire analysis and computation of a problem under study to be duplicated. It is one of the main principles of the computational research enterprise. As the scientific method is being transformed by massive computations and data manipulations, it becomes essential to establish procedures to ensure and facilitate reproducibility. Replication can however be complicated due to the emergence of new computational tools and technologies, massive amounts of data, interdisciplinary approaches, and the complexity of the problems such as Grand Challenges, including problems related to climate modeling, modern materials, and medicine. Establishing feasible practices of reproducibility is currently a central focus of researchers in the field of computational science.

A good practice of reproducibility is necessary, for instance, to allow previously developed methodologies to be effectively and efficiently applied to new data and problems. Releasing and making the data and developed software accessible to the wider scientific community would allow reuse of developed codes and data for new projects. Nowadays, most peer-reviewed journals allow articles to be supplemented with online material (such as code and raw data), and some journals have initiated further efforts for making data and code more integrated with publications. Importantly, the establishment of such rules and practices will increase transparency and reproducibility of scientific works. See here and the references therein if you are interested in learning more about thi topic.

In this class we will concentrate on version control as one easy way to greatly improve the reproducibility of our scientific works.