Giter VIP home page Giter VIP logo

oss_in_rr's Introduction

Defining the role of open source software in research reproducibility

Contribution to IEEE Computer, fo the SI: June 2022: Research Reproducibility.

Preprint arXiv:2204.12564 [cs.CY]

  • submitted 15-Nov-2021 (v1.0.0)
  • revised 13-May-2022

Rights to use

(c) Lorena A. Barba, 2022.

This repository contains the source manuscript files for an article submitted to IEEE Computer and upon acceptance the copyright will be transferred to IEEE. The article content is therefore not openly licensed and only fair use applies.

oss_in_rr's People

Contributors

labarba avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

oss_in_rr's Issues

Reviewer 2

Recommendation: Author Should Prepare A Major Revision For A Second Review

Comments:
It's an interesting topic and I would be happy to review a revised submission.

Additional Questions:

  1. How relevant is this manuscript to the readers of this periodical? Please explain your rating in the Detailed Comments section.: Relevant

  2. Please summarize what you view as the key point(s) of the manuscript and the importance of the content to the readers of this periodical. If you don't have any comments, please type No Comments.: This article presents a high level analysis and commentary on the role of open-source software as it relates to the problem of research reproducibility. The thesis of the article is that reproducibility is primarily a matter of trust, and that open source software is a means to developing community, and therefore trust.

The article begins with a summary of recent interest and reports on the problem of reproducibility, and observes that these generally conclude with a call for availability of artifacts in some form, not necessarily open source. The continues with a reflection on the underlying reason for reproducibility, which is to develop trust (if appropriate) in experimental results. The recent LIGO results are given as an example of trustworthy results, primarily because of the team's commitment to evaluating alternative hypotheses. Despite a general commitment to sharing of artifacts, the reproducibility of the final results was imperfect.

Next, a discussion of the meaning and history of the "open source" term is given, and its relationship to the notion of "free software." It notes that open source contributes to the transparency of results but is insufficient for reproducibility. Quality software design and documentation are also required. However, this article argues that open source drives quality: developing in the open exposes bug, gains feedback from users, improves documentation, which all contribute to reusability.

Two concerns against open source software are addressed. One concern is that open source enables modifications from anyone, which is clearly not the case. Another is that open source requires more work to clean up and document, and this article argues that such low-quality software processes should not be trusted in the first place. An example of how open source procedures in the author's lab enabled a collaborator to find a bug in a method.

The article then reaches its core argument by describing the process of science as a conversation among collaborators. In summary: "Openness promotes rich networks, lively communities, and fertile connections." In particular, the tools of open source software — pull requests, issue trackers, etc — encourage a particular structure and custom around interactions and encourage the archiving on communications. The conclusion then links reproducibility and trust, observing that prior failures have reduced public trust in scientific activities. The article posits that there is no technical "one click" solution to reproducibility. Rather, that open source collaboration develops relationships between parties, who will then feel a responsibility to produce quality artifacts for each other, and to learn to trust and value artifacts produced by others.

  1. Is the manuscript technically sound? Please explain your answer in the Detailed Comments section.: Not Applicable

  2. What do you see as this manuscript's contribution to the literature in this field?: Reproducibility is a challenging topic because it has multiple interlocking dimensions: technical capabilities, professional expectations, social relationships, and more. I appreciate that this article is striving to sort through some of these connections and refine the meaning and purpose of terms that are familiar. This is a worthy effort, but to this reviewer, the article does not succeed in connecting all of the dots.

The first two thirds or so of the paper follow an agreeable path, from the preliminary discussion of the merits of reproducibility through the definition of open source, and the observation that open source itself does not guarantee reproducibility. I also find myself in agreement with the idea that science is a conversation, and it has also been my experience that fine-grained interactions through open source result in an acceleration of ideas, insight, and built trust between parties.

(Although I might quibble that open source doesn't necessarily lead to stability. The current state of software is that a given product may depend upon thousands of distributed components. If each one of them is a lively conversation with daily updates, it can be a large challenge even to find a set of compatible versions and then compile them in a predictable way. In some ways, a closed-source binary blob is more 'reproducible' in that a single artifact can be saved and reused without the hassle of building.)

  • In any case, I do not agree with the final leap that reproducibility is about trust. In fact, I think the opposite is true, or at least ought to be. A statement about the physical world is objectively true or false. If the author of such a statement has used reproducible scientific techniques, it should be possible to evaluate the truth of that statement independently of one's personal evaluation of the author. This is the gold science to which science aspires.

Of course, the community has fallen short, particularly in computational techniques where the cost of reproducibility ought to be low. In the absence of reproducible techniques, the reviewer of a scientific claim may fall back on the author's prior work, their academic pedigree, or their friendship (or conflict!) with the author in order to evaluate their trustworthiness. And the result is (or will be) a gradual diminution in the quality of scientific work, in which authors rely upon trust in the author instead of proof of the work itself.

Perhaps this all hinges on the definition of 'trust'. Trust in a person is a future evaluation based on past behavior. (If I trusted you to hold my keys yesterday, perhaps I will trust you to hold my wallet tomorrow.) And while trust is essential to human relationships, it isn't the right foundation on which to build science: our friends can be careless, or mistaken, or misled.

Or perhaps there is argument that open source implies a willingness to be corrected. If an open source product has 10 bugs reported and fixed in the last year, that gives one some confidence that bugs are in fact being found. And such a product (might) be seen as more trustworthy than a closed-source project in which no bugs were corrected. (But what about an open source product with 1000 bugs corrected?)

  • In short, I think this article contemplates a number of interesting and challenging issues, but doesn't sell me that trust leads to reproducibility. I think it's the other way around.
  1. What do you see as the strongest aspect of this manuscript?:

  2. What do you see as the weakest aspect of this manuscript?:

  3. Does the manuscript contain sufficient and appropriate references? Please elaborate in the Detailed Comments section.: References are sufficient and appropriate

  4. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer in the Detailed Comments section.: Yes

  5. How would you rate the organization of the manuscript? Please elaborate in the Detailed Comments section.: Satisfactory

  6. Is the length of the manuscript appropriate for the topic? Please elaborate in the Detailed Comments section.: Satisfactory

  7. Please rate the readability of this manuscript in the Detailed Comments section.: Easy to read

Please rate the manuscript. Explain your choice in the Detailed Comments section.: Fair

Reviewer 3

Recommendation: Accept With No Changes

Comments:

I think this is an excellent persuasive article on the importance of open source software (and associated practices) to reproducibility in research. It helpfully explains concepts and terms to newcomers, but is also a valuable read to people (like me) who are familiar with the practices discussed.

I have only minor suggestions to improve the manuscript:

  • 1. For the first section title, perhaps "What is reproducibility, and why does it matter?"
  • 2. lines 55-56 on page 1: "The sister magazine..."
  • 3. line 12 on page 3, "committed to the relentless quest..."
  • 4. page 4: I agree with the statement that "Academic and research software benefits most from permissive licensing, enabling more impact and innovation", but I think this statement would be strengthened with supporting evidence.
  • 5. There are numerous in-text URLs, which should probably be replaced with formal citations (even to YouTube videos).
  • 6. In contrast, I'm not sure that the links/citations to Wikipedia in a few places are appropriate—a reader can look up a definition or concept on their own wherever they want, but I think only formal sources should be included.
  • 7. I saw some inconsistencies between referring to people by last name only, and using both first and last. This isn't a major issue, but just ensuring consistency would be nice.

Additional Questions:

  1. How relevant is this manuscript to the readers of this periodical? Please explain your rating in the Detailed Comments section.: Very Relevant

  2. Please summarize what you view as the key point(s) of the manuscript and the importance of the content to the readers of this periodical. If you don't have any comments, please type No Comments.: This manuscript discusses the importance of open source software to reproducibility of research, and the success of modern research in general.

  3. Is the manuscript technically sound? Please explain your answer in the Detailed Comments section.: Yes

  4. What do you see as this manuscript's contribution to the literature in this field?: This is not a technical manuscript, but an important discussion of the connections between open source software and reproducibility in research. I think this would be important for readers of this journal, but I would also share this with broader disciplinary communities.

  5. What do you see as the strongest aspect of this manuscript?: It is extremely readable and easy to understand, and is persuasive while not preaching.

  6. What do you see as the weakest aspect of this manuscript?: none

  7. Does the manuscript contain sufficient and appropriate references? Please elaborate in the Detailed Comments section.: References are sufficient and appropriate

  8. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer in the Detailed Comments section.: Yes

  9. How would you rate the organization of the manuscript? Please elaborate in the Detailed Comments section.: Satisfactory

  10. Is the length of the manuscript appropriate for the topic? Please elaborate in the Detailed Comments section.: Satisfactory

  11. Please rate the readability of this manuscript in the Detailed Comments section.: Easy to read

Please rate the manuscript. Explain your choice in the Detailed Comments section.: Excellent

Reviewer 1

Recommendation: Accept With No Changes

Comments:
I have no major criticism of the article and would like to instead offer just some small comments and suggestions for places where the scope could be broadened or shifted somewhat. But I don't consider any of my comments crucial.

  • I find the discussion of different open-source software licenses somewhat tangential to the overall flow of the article. In particular, the differences between GPL-style and BSD-style licenses ultimately don't matter for the main point of the article, in my opinion.
  • Even more specifically, unless the intention is to align license choices with more general political orientation (i.e., standard left vs. right government politics), I think that the word "political" (page 4, line 22) may be confusing. Maybe "ideological" better conveys the intention here? If the intention is to convey an alignment with standard political orientation, that should be made more explicit. I imagine that copyleft might be considered more "leftist" and BSD-style license -- being business-friendly -- might be considered more "right-wing / neoliberal".
  • Regarding the relationship between "reproducible" and "reusable": an interesting perspective is offered by Gael Varoquaux in a blog post he wrote about this topic: http://gael-varoquaux.info/programming/software-for-reproducible-science-lets-not-have-a-misunderstanding.html . This raises some question about the conclusion reached at the end of the section "Open-source software in research" -- suggesting instead that there is a partial overlap between reproducibility and reusability, but that these are not identical and that prioritizing one over the other (Varoquaux clearly prefers reusability) guides the activities that a research team may pursue. Debating this perspective might add some interesting complexity.
  • If the scope allows, it might also be interesting to discuss some of the challenges/risks of openness. For example, researchers in climate science, who are under threat of having their research scrutinized in bad faith by climate change-denying "merchants of doubt".

Additional Questions:

  1. How relevant is this manuscript to the readers of this periodical? Please explain your rating in the Detailed Comments section.: Don’t know the readership

  2. Please summarize what you view as the key point(s) of the manuscript and the importance of the content to the readers of this periodical. If you don't have any comments, please type No Comments.: This paper links the praxis of open source software with the goals of computational reproducibility.

  3. Is the manuscript technically sound? Please explain your answer in the Detailed Comments section.: Yes

  4. What do you see as this manuscript's contribution to the literature in this field?: Using Winograd and Flores concept of connectivity, the article reaches the conclusion that reproducibility is not primarily a technical issue, but a socio-technical one, revolving around creating and maintaining trust through conversation. Along the way, the paper does nicely in providing an explanation of some of the basic tenets of open-source software development and the mechanics of the conversations that underlie these mechanics, and at the same time in providing a framework for thinking about how these mechanics create a particular substrate for reproducible research, especially within computational science.

  5. What do you see as the strongest aspect of this manuscript?: Novel framework for thinking about the links between open-source software and broader computational reproducibility.

  6. What do you see as the weakest aspect of this manuscript?:

  7. Does the manuscript contain sufficient and appropriate references? Please elaborate in the Detailed Comments section.: References are sufficient and appropriate

  8. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer in the Detailed Comments section.:

  9. How would you rate the organization of the manuscript? Please elaborate in the Detailed Comments section.:

  10. Is the length of the manuscript appropriate for the topic? Please elaborate in the Detailed Comments section.: Satisfactory

  11. Please rate the readability of this manuscript in the Detailed Comments section.: Easy to read

Please rate the manuscript. Explain your choice in the Detailed Comments section.: Excellent

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.