Giter VIP home page Giter VIP logo

documentation's People

Contributors

chris-ha458 avatar pjox avatar uinelj avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

chris-ha458

documentation's Issues

Improve dev documentation

The documentation should highlight the organization layout and outline the primary repositories, explaining how it all fits together.
For each project, we should have enough information on the README.md so that people could understand what it is about, as well as a link to the "dev documentation" page.

Add a contributing page

This should be a tiny guide on contributing, both for people interested in coding (Rust and Python) and data validation/sourcing.

Documentation Entrypoint

This page should serve as a one-stop shop for anything. It is destined to point people towards three documentations:

  • Accessing/Getting the data
  • Using the data
  • Contributing
  • Developing (=contributing to the code base)

comment on the documentation

  • I find this sentence confusing "!!! note A single document can have multiple categories."
  • Also we can have an example for this for more clarity "These categories are in a field that is at this path: metadata.categories."
  • Something wrong with this "These hashes are particular in that two similar documents should have similar hashes."

@Uinelj

Request for clarification regarding OSCAR Schema v2 (schema-v2.md)

OSCAR Schema v2 found in docs/schema/schema-v2.md
shows json files compressed into gz format.
my understanding is that OSCAR-2301 has moved to zstd format with the extension .zst

I think this should change to show files with .zst format or to merely mention that these files are compressed and
that this is just an example choice for compression.

Security considerations regarding checksum choice

This is also an extension of my study into docs/schema/schema-v2.md
and docs/tools/generation-jeanzay.md

The hash function sha256 is used for the checksum. Althoguh sha256 is considered secure, it is not resistant to length extension attack.
There are several ways to address this, (if it is a concern)

  1. Add filesize information into the checksum.sha256 when it was hashed. This way atleast that length extension attacks become difficult if not impossible.
  2. Change the hash into sha384 or sha512/256. Both are similar in speed since they both first build sha512 (which is still vulnerable to length extension) and truncated and the bottleneck is the hashing step. (SHA512, SHA384,SHA512/256 each use different initialization vectors so you shouldnt try to roll your own btw). Compared to the total size of OSCAR releases, the increased size of sha384 is negligible as is the processing cost. If processing is an issue, other secure hashes such as BLAKE3 could be used. I guess further indepth considerations for hashes should be directed towards Ungoliant which i plan to do.

My point is that, atleast such considerations should be made, and it would be helpful for the documentation to communicate that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.