Hitchhikers guide to transparent, reproducible, and collaborative computational science

Author

Wolfram Barfuss

Published

April 24, 2024

Science as amateur software development

Science is one of humanity’s greatest inventions. Academia, on the other hand, is not. It is remarkable how successful science has been, given the often chaotic habits of scientists. In contrast to other fields, […] science as a profession is largely unprofessional—apprentice scientists are taught less about how to work responsibly than about how to earn promotions. This results in ubiquitous and costly errors.

Software development has become indispensable to scientific work. [But] it can become even more useful by transferring some aspects of its professionalism, the day-to-day tracking and back-tracking and testing that is especially part of distributed, open-source software development. Science, after all, aspires to be distributed, open-source knowledge development.Richard McElreath

This guide

In this guide, I want to outline my approach to science as amateur software development. I am a computational social ecologist who focuses on developing theoretical models of human-environment interactions. However, the tools and methods shown here should apply to the broad range of computational sciences, which all develop models and analyze data to some extent.

Transparency is required because all models are wrong. Our brains have limited capacity to process the already limited data we receive from the complex world around us, so we simplify. Most often, we do this subconsciously, resulting in a mental model. We use formal models – computational and mathematical ones – to make the process of simplification and abstraction a conscious effort. Since a model, by definition, can nor aims to be true, we should optimize the model design and presentation for transparency to make the conscious modeling thought process a collective one. Transparency enables reproducibility and collaboration.

If science is not reproducible, it is not science. However, the reproducibility crisis 1 shows that many scientific study results are difficult or impossible to reproduce. This undermines the credibility of scientific knowledge and can lead to costly and inefficient policies.

Collaboration is required because the world is too complex for any individual to do it all by themselves. We need to be able to integrate different perspectives of one phenomenon.

Luckily, (open-source) software development has not only produced a set of modern software tools to realize these principles. It also provides a range of cultural practices to enable the continuous integration of knowledge collaboratively. We don’t need to invent new tools or practices. We just need to be willing to learn. In that sense, this guide is a hitchhiker’s guide. We don’t need to drive all by ourselves. We just have to find the right companions (tools and practices) to make the journey worthwhile and enjoyable.

What do we want?

An integrated computational environment for development, analysis, and writing.

  • Publications (as a PDF, on the web, with images)
    • Cross-referencing
    • Citations
  • Execute, reuse, and document code
  • Version control
  • Unit tests
  • Animations
  • Presentations

Tools: How do we get it?

  • Jupyter notebooks
  • quarto
  • nbdev
  • Jupyter lab (extensions)
  • git & github
Note

This is work in progress. More to come.


  1. https://www.nature.com/collections/prbfkwmwvz↩︎