Giter VIP home page Giter VIP logo

carbites's Introduction

carbites

Build dependencies Status JavaScript Style Guide npm bundle size

Chunking for CAR files. Split a single CAR into multiple CARs.

Install

npm install carbites

Usage

Carbites supports 3 different strategies:

  1. Simple (default) - fast but naive, only the first CAR output has a root CID, subsequent CARs have a placeholder "empty" CID.
  2. Rooted - like simple, but creates a custom root node to ensure all blocks in a CAR are referenced.
  3. Treewalk - walks the DAG to pack sub-graphs into each CAR file that is output. Every CAR has the same root CID, but contains a different portion of the DAG.

Simple

import { CarSplitter } from 'carbites'
import { CarReader } from '@ipld/car'
import fs from 'fs'

const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const targetSize = 1024 * 1024 * 100 // chunk to ~100MB CARs
const splitter = new CarSplitter(bigCar, targetSize) // (simple strategy)

for await (const car of splitter.cars()) {
  // Each `car` is an AsyncIterable<Uint8Array>
}

⚠️ Note: The first CAR output has roots in the header, subsequent CARs have an empty root CID bafkqaaa as recommended.

Rooted

Instead of an empty CID, carbites can generate a special root node for each split CAR that references all the blocks and the original roots (only in the first CAR). To do this, use the RootedCarSplitter constructor. When reading/extracting data from the CARs, the root node should be discarded.

Example
import { RootedCarSplitter } from 'carbites/rooted'
import { CarReader } from '@ipld/car/reader'
import * as dagCbor from '@ipld/dag-cbor'
import fs from 'fs'

const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const targetSize = 1024 * 1024 * 100 // chunk to ~100MB CARs
const splitter = new RootedCarSplitter(bigCar, targetSize)

const cars = splitter.cars()

// Every CAR has a single root - a CBOR node that is an tuple of `/carbites/1`,
// an array of root CIDs and an array of block CIDs.
// e.g. ['/carbites/1', ['bafkroot'], ['bafy1', 'bafy2']]

const { done, value: car } = await cars.next()
const reader = await CarReader.fromIterable(car)
const rootCids = await reader.getRoots()
const rootNode = dagCbor.decode(await reader.get(rootCids[0]))

console.log(rootNode[0]) // /carbites/1
console.log(rootNode[1]) // Root CIDs (only in first CAR)
/*
[
  CID(bafybeictvyf6polqzgop3jt32owubfmsg3kl226omqrfte4eyidubc4rpq)
]
*/
console.log(rootNode[2]) // Block CIDs (all blocks in this CAR)
/*
[
  CID(bafybeictvyf6polqzgop3jt32owubfmsg3kl226omqrfte4eyidubc4rpq),
  CID(bafyreihcsxqhd6agqpboc3wrlvpy5bwuxctv5upicdnt3u2wojv4exxl24),
  CID(bafyreiasq7d2ihbqm5xvhjjzlmzsensuadrpmpt2tkjsuwq42xpa34qevu)
]
*/

The root node is limited to 4MB in size (the largest message IPFS will bitswap). Depending on the settings used to construct the DAG in the CAR, this may mean a split CAR size limit of around 30GiB.

Treewalk

Every CAR file has the same root CID but a different portion of the DAG. The DAG is traversed from the root node and each block is decoded and links extracted in order to determine which sub-graph to include in each CAR.

Example
import { TreewalkCarSplitter } from 'carbites/treewalk'
import { CarReader } from '@ipld/car/reader'
import * as dagCbor from '@ipld/dag-cbor'
import fs from 'fs'

const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const [rootCid] = await bigCar.getRoots()
const targetSize = 1024 * 1024 * 100 // chunk to ~100MB CARs
const splitter = new TreewalkCarSplitter(bigCar, targetSize)

for await (const car of splitter.cars()) {
  // Each `car` is an AsyncIterable<Uint8Array>
  const reader = await CarReader.fromIterable(car)
  const [splitCarRootCid] = await reader.getRoots()
  console.assert(rootCid.equals(splitCarRootCid)) // all cars will have the same root
}

CLI

Install the CLI tool to use Carbites from the comfort of your terminal:

npm i -g carbites-cli

# Split a big CAR into many smaller CARs
carbites split big.car --size 100MB --strategy simple # (default size & strategy)

# Join many split CARs back into a single CAR.
carbites join big-0.car big-1.car ...
# Note: not a tool for joining arbitrary CARs together! The split CARs MUST
# belong to the same CAR and big-0.car should be the first argument.

API

class CarSplitter

Split a CAR file into several smaller CAR files.

Import in the browser:

import { CarSplitter } from 'https://cdn.skypack.dev/carbites'

Import in Node.js:

import { CarSplitter } from 'carbites'

Note: This is an alias of SimpleCarSplitter - the default strategy for splitting CARs.

constructor(car: CarReader, targetSize: number)

Create a new CarSplitter for the passed CAR file, aiming to generate CARs of around targetSize bytes in size.

cars(): AsyncGenerator<AsyncIterable<Uint8Array> & RootsReader>

Split the CAR file and create multiple smaller CAR files. Returns an AsyncGenerator that yields the split CAR files (of type AsyncIterable<Uint8Array>).

The CAR files output also implement the RootsReader interface from @ipld/car which means you can call getRoots(): Promise<CID[]> to obtain the root CIDs.

static async fromBlob(blob: Blob, targetSize: number): CarSplitter

Convenience function to create a new CarSplitter from a blob of CAR file data.

static async fromIterable(iterable: AsyncIterable<Uint8Array>, targetSize: number): CarSplitter

Convenience function to create a new CarSplitter from an AsyncIterable<Uint8Array> of CAR file data.

class CarJoiner

Join together split CAR files into a single big CAR.

Import in the browser:

import { CarJoiner } from 'https://cdn.skypack.dev/carbites'

Import in Node.js:

import { CarJoiner } from 'carbites'

Note: This is an alias of SimpleCarJoiner - a joiner for the the default CAR splitting strategy.

constructor(cars: Iterable<CarReader>)

Create a new CarJoiner for joining the passed CAR files together.

car(): AsyncGenerator<Uint8Array>

Join the CAR files together and return the joined CAR.

class RootedCarSplitter

Split a CAR file into several smaller CAR files ensuring every CAR file contains a single root node that references all the blocks and the original roots (only in the first CAR). When reading/extracting data from the CARs, the root node should be discarded.

Import in the browser:

import { RootedCarSplitter } from 'https://cdn.skypack.dev/carbites/rooted'

Import in Node.js:

import { RootedCarSplitter } from 'carbites/rooted'

The API is the same as for CarSplitter.

Root Node Format

The root node is a dag-cbor node that is a tuple of the string /carbites/1, an array of root CIDs (only seen in first CAR) and an array of block CIDs (all the blocks in the CAR). e.g. ['/carbites/1', ['bafkroot'], ['bafy1', 'bafy2']].

Note: The root node is limited to 4MB in size (the largest message IPFS will bitswap). Depending on the settings used to construct the DAG in the CAR, this may mean a split CAR size limit of around 30GiB.

class RootedCarJoiner

Join together CAR files that were split using RootedCarSplitter.

The API is the same as for CarJoiner.

class TreewalkCarSplitter

Split a CAR file into several smaller CAR files. Every CAR file has the same root CID but a different portion of the DAG. The DAG is traversed from the root node and each block is decoded and links extracted in order to determine which sub-graph to include in each CAR.

Import in the browser:

import { TreewalkCarSplitter } from 'https://cdn.skypack.dev/carbites/treewalk'

Import in Node.js:

import { TreewalkCarSplitter } from 'carbites/treewalk'

The API is the same as for CarSplitter.

class TreewalkCarJoiner

Join together CAR files that were split using TreewalkCarSplitter.

The API is the same as for CarJoiner.

Releasing

You can publish by either running npm publish in the dist directory or using npx ipjs publish.

Related

Contribute

Feel free to dive in! Open an issue or submit PRs.

License

Dual-licensed under MIT + Apache 2.0

carbites's People

Contributors

alanshaw avatar gobengo avatar hacdias avatar vasco-santos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

carbites's Issues

Getting error when trying to install carbites

codespace ➜ .../packages/client/examples/node.js (main βœ—) $ npm install carbites
npm WARN deprecated [email protected]: Please see https://github.com/lydell/urix#deprecated
npm WARN deprecated [email protected]: https://github.com/lydell/resolve-url#deprecated
npm WARN deprecated [email protected]: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated [email protected]: support for ECMAScript is superseded by `uglify-js` as of v3.13.0
npm ERR! code ENOENT
npm ERR! syscall chmod
npm ERR! path /workspaces/nft.storage/packages/client/examples/node.js/node_modules/carbites/bin/carbites.js
npm ERR! errno -2
npm ERR! enoent ENOENT: no such file or directory, chmod '/workspaces/nft.storage/packages/client/examples/node.js/node_modules/carbites/bin/carbites.js'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent 

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/codespace/.npm/_logs/2021-06-23T18_14_58_379Z-debug.log

2021-06-23T18_14_58_379Z-debug.log

Guardrails around Pinning Service API

There needs to be guardrails around using the Pinning Service API. This epic will track what we want to implement / discussion for the holistic solution.

Tree walk splitter can throw error getting the block links

I got this error while splitting arbitrary large car Files (over 100mb)

RangeError: Too many properties to enumerate
      at Function.entries (<anonymous>)
      at links (/Users/vsantos/gh/nft.storage/node_modules/carbites/node_modules/multiformats/cjs/src/block.js:19:37)
      at links.next (<anonymous>)
      at TreewalkCarSplitter._cars (/Users/vsantos/gh/nft.storage/node_modules/carbites/cjs/lib/treewalk/splitter.js:94:22)
      at _cars.next (<anonymous>)
      at TreewalkCarSplitter._cars (/Users/vsantos/gh/nft.storage/node_modules/carbites/cjs/lib/treewalk/splitter.js:95:24)
      at _cars.next (<anonymous>)
      at TreewalkCarSplitter.cars (/Users/vsantos/gh/nft.storage/node_modules/carbites/cjs/lib/treewalk/splitter.js:54:22)
      at cars.next (<anonymous>)

Error: Cannot find module 'carbites/treewalk'

Hi !
I just installed nft.storage and tried the example provided but I'm getting an error when trying to run it :

internal/modules/cjs/loader.js:626
throw err;
^

Error: Cannot find module 'carbites/treewalk'

Node version 12.3.1
running on windows ...

very barebones project with just nft storage installed

Thanks for any help.

support incomplete CAR files

Currently, carbites will fail on parsing an incomplete CAR file as input, in looking for blocks that aren’t in it. This means incomplete CAR files error in the web3.storage client.

I unblocked a user by directing them to the REST API for car uploads but it would be nice if carbites just skipped traversal over blocks that’s aren’t in the input CAR file.

Exception: Cannot find module 'carbites/treewalk'

Os: Ubuntu 16.0.4
Nodejs version: 14.0.0

Desired behavior: I'm looking to uploading some NFT images using nft.storage client library from Nodejs.
What I have done:

// some code omitted
const uploadImageToIPFS = async (
  path,
  name,
) => {
  const API_TOKEN = process.env.API_TOKEN
  const client = new NFTStorage({ token: API_TOKEN })
  try {

    const file = fs.promises.readFile(path)
    const metadata = await client.store({
      name: 'Maasai Morans',
      description: `Hello, all the from Maasai Mara. Here come the morans`,
      image: new File(
        [file],
        `${name}.jpg`,
        { type: 'image/jpg' }
      ),
    })

    console.log(metadata)
    return {
      url: metadata.url,
      ipnft: metadata.ipnft,
      ipfsImage: metadata.data.image.href,
      hostedImage: metadata.embed().image.href
    }

  } catch (ex) {
    console.log(ex)
  }
}

Output:

events.js:292
      throw er; // Unhandled 'error' event
      ^

TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer. Received an instance of Uint8Array
    at write_ (_http_outgoing.js:679:11)
    at ClientRequest.write (_http_outgoing.js:644:15)
    at Readable.ondata (_stream_readable.js:707:22)
    at Readable.emit (events.js:315:20)
    at addChunk (_stream_readable.js:296:12)
    at readableAddChunk (_stream_readable.js:272:9)
    at Readable.push (_stream_readable.js:213:10)
    at next (internal/streams/from.js:51:27)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
Emitted 'error' event on Readable instance at:
    at emitErrorNT (internal/streams/destroy.js:96:8)
    at emitErrorCloseNT (internal/streams/destroy.js:68:3)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  code: 'ERR_INVALID_ARG_TYPE'
}

Output: (Stepping through the code using a debuger):
Process exited with code 1 Uncaught Error: Cannot find module 'carbites/treewalk
I can't seem to get this work. Any kind of help is highly appreciated.

Thank you

Difficult to access files after cutting

I split the car file by Treewalk and uploaded it to the local gateway, but it was very slow to get the file back through root cid, which was basically unsuccessful. Using nftstorge, I quickly got access. What should I do?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.