Himmelstein's review for Publications

I provided a peer review of this manuscript for Publications, presumably corresponding to 6b24b46. Glad to see that the manuscript is publicly available, such that I can immediately post my review.

Feel free to use this issue for additional conversion or clarification related to my review.

Review of "Verified, shared, modular, and provenance based research communication with the Dat protocol"

Hartgerink proposes a loosely-defined protocol for scholarly communication based on Dat. Dat provides peer-to-peer storage organized into filesystems, which are self-contained directories that are sort of like Git repositories. However, unlike Git repositories, the address of a Dat filesystem provides a persistent ID to identify the filesystem as well as to verify write operations.

The manuscript proposes organizing scholarly research into modules, each stored by a separate Dat filesystem. In addition to modules, scholars would create filesystems to represent themselves referred to as profiles. Since Dat filesystems can contain anything, the proposed system depends on scholars adhering to specific standards, such that their modules/profiles are interpretable by other participants. This is similar to HTML pages are free to put anything in their <head>, but certain elements are interpreted in special ways according to standards. One benefit of this approach is that no central authority is required. One downside is that not all modules will encode information in consistent ways.

It can be a bit challenging to visualize how scholarship can be organized into modules. However, I think Git repositories are a good example with substantial precedent. If scholars become accustomed to making a repository for every project, as many currently do, how to structure scholarship in modules may become more widely understood. One aspect missing from the manuscript is comparing the proposed ecosystem to one where every research uses public Git repositories to create the same profile/module structure, but without using Dat.

The manuscript provides a thought-provoking proposal on how a distributed scientific ecosystem could store its information. As such, I found it to be a valuable contribution to the field of open science. Nonetheless, such a system is far off and several obstacles remain. For example, if a scholar loses their private key, they can no longer edit their module/profile. Alternatively, if their private key is leaked, anyone can edit their module/profile. Furthermore, it is challenging to assign a real identity to a digital identity. The current proposal doesn't seem to contain any mechanisms to verify that scholars are who they claim. Other issues may arise like plagiarism. For example, a researcher could monitor the network for new modules and immediately copy them, but assigning their authorship. Note that plagiarism is not unique to the proposed system, but may become more difficult to address if their are no trusted intermediaries.

Immutability

The manuscript assumes that Dat archives are immutable in many places. In the abstract:

All these scholarly modules would be communicated on the new peer-to-peer Web protocol Dat (datproject.org), which provides a decentralized register that is immutable

In the "Dat protocol" section:

The persistent public key combined with the append-only register, results in persistent versioned addresses for filesystems that also ensure content integrity. … By appending +5 to the public key (dat://0c66...613+5) we can view the Dat filesystem as it existed at version 5 and be ensured that the contents we receive are the exact contents at that version.

However, according to my understanding, Dat archives have no mechanism to ensure immutability. Anyone who possesses the private key to a Dat can create multiple divergent histories and there is no protocol-level mechanism for reaching consensus over which history is correct.

Content addressing would be one solution to ensure integrity when referencing a specific revision. hypercore-strong-link may be one implementation of this. Another implementation would be for the modules property of a scholar's Dat to specify a content checksum in addition to a revision number.

Content addressing protects against links resolving to a modified revision. However, it does not fix the underlying problem that history can be rewritten. Timestamps, such as those implemented in OpenTimestamps, could anchor Dats to an more-immutable & timestamped ledger like Bitcoin. If scholarly Dats were only recognized if they contained valid timestamps, retroactively editing revision history would become infeasible. Alternatively, perhaps scholarly institutions could be trusted to monitor for rewritten Dat histories and apply tools like OpenTimestamps in bulk to all known scholarly Dats.

The Merkle Tree figure seems a bit under-explained. Do data blocks L1, L2, L3, L4 correspond to file1, file2, file3, file4? It wasn't clear to me how a put or del operation would be applied to the Merkle Tree. I found this alternative explanation of Dat's Merkle Tree usage helpful.

Dat availability

Would it make sense to use an existing Dat-to-HTTP gateway to allow users without Beaker to view the Dat archives? For example, this link provides HTTP access to the "Summary" Dat.

When I attempt to view the "Prototype" dat shown in Figure 5 (dat://b068f5365f26491557dce8da1fe2f60ec5bda681424970673059228811b193dd), Beaker Browser returned the error message that "It doesn't seem like anybody is sharing this site right now." It is important, of course, that all the the Dats referenced in this manuscript are persistently available. How are the Dats currently hosted?

The discussion mentions that university libraries are appropriate agents for persistent hosting of Dats. I agree, but is there an immediate solution to maintaining availability of a Dat? While I agree that in the longterm, the Dat architecture could be more robust and persistent than the current centralized web, the unavailability of some example Dats shows this is currently not the case.

Cell I29 in the mock-modules-overview.ods has text but should probably be blank.

Intermediaries

The manuscript compares its proposed system to existing scholarly profiles, stating:

A decentralized scholarly profile in a Dat filesystem is similar and provides a unique ID (i.e., public key) for each researcher. However, researchers can modify their profiles freely because they retain full ownership and control of their data (as opposed to centralized profiles) and are not tied to one platform.

However, this assumes scholars would run their own Dat infrastructure, which seems somewhat unlikely (at least for the vast majority of scholars). More likely in my opinion is that scholars will rely on various interfaces and intermediaries that abstract away the underlying interaction with the Dat archive. While these services could be designed in such a way where scholars retain full control (i.e. possess their private key and perform all signatures client-side), services could also be custodial. If cryptocurrency wallets and storage are any lesson here, it's evident that many users will relinquish ownership of private keys for convenience. One possibility is that custodial entities may arise that offer proprietary platforms for interacting with Dat archives. I can imagine certain publishers being keenly interested in applying the gatekeeper / subscription model to future systems.

Typos

"regarded as actions [to] append to a register"
"main file for Alice her profile"
"More specifically, Alice her own profile"

chartgerink / 2018dat-com Goto Github PK

2018dat-com's Introduction

2018dat-com's People

Contributors

Stargazers

Watchers

Forkers

2018dat-com's Issues

Himmelstein's review for Publications

Review of "Verified, shared, modular, and provenance based research communication with the Dat protocol"

Immutability

Dat availability

Intermediaries

Typos

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent