I don't see how R specifically addresses the reproducibility problem, It's been ...

ggrothendieck · on March 22, 2022

1. Before R commercial statistical packages were mainly used. You can, in principle, just use assembler too and develop everything yourself but it isn't practical. Regarding C/C++ and Fortran, many R packages are, in fact, wrappers around code in those or other languages making it easier to access them. From that point of view R can be regarded as a glue language. 2. Regarding keeping versions straight, all past versions of packages in the CRAN repository are kept on CRAN. Microsoft MRAN repository also maintains histories of packages that can be accessed via the checkpoint package which will install packages as they existed on a given date. Furthermore, install_version in the remotes and devtools packages can install specific versions. 3. Regarding tidyverse dependencies you can reduce the number of packages you load by not using library(tidyverse) and instead load the specific packages you need. This will result in fewer packages being loaded.

_Wintermute · on March 22, 2022

> Before R commercial statistical packages were mainly used.

Maybe in your field, I work in bioinformatics - before R, perl was widely used as a high-level language.

> Regarding keeping versions straight, all past versions of packages in the CRAN repository are kept on CRAN...

This is woefully inadequate if you need to replicate somebody else's environment. Nobody should think manually guessing and then typing in each package version and hoping they're compatible is a viable option. Not to mention even if you specify an older version of a package it doesn't pull in compatible dependencies, it just pulls in the latest version. There's renv but it's not reached widespread use.

> Regarding tidyverse dependencies you can reduce the number of packages you load by not using library(tidyverse) and instead load the specific packages you need. This will result in fewer packages being loaded

We're talking about replicating other people's work. We don't have any control over their code, and R users are largely ignorant of best-software practices.

scientism · on March 22, 2022

Totally agree. I find it frustrating trying to reproduce other people's work in R. How has this situation has been allowed to continue for so long? It's unacceptable, especially when used for science. It's impossible to replicate anything unless you are lucky enough you manage to find which package version introduces breaking changes and even then this is something you have to do repeatedly for every code break. Even with _renv_ it's a library you have to install within your R environment which is pointless. Where is a dependency solver like conda for R? - Not that it's perfect, but I've been happy with its drop-in replacement - mamba recently.

ggrothendieck · on March 22, 2022

The packages that were used in statistics were SAS, SPSS and Stata. perl is not a statistical package and has nowhere near the depth of statistical capabilities of R.

Don't forget that I also mentioned the checkpoint package in my post. You only need to know the date for that, not the version of each of the packages.

In your last paragraph I think you are referring more to software development practices than what is available through R. Simply using R or any language doesn't guarantee this.

scientism · on March 22, 2022

That's a very roundabout way to solve an actual problem. In many cases you don't pin your package version to _latest_ (whatever that date is) and you need a more fine-grained solution to keeping package versions. I don't think that solves this and I don't know if you can do it with checkpoint.

ggrothendieck · on March 22, 2022

Of course it is possible to screw up but if you don't update your packages and record the date that does not seem to be R's fault.

popcube · on March 22, 2022

um... this is about statistics, before R people should finish analysis in MATLAB