Giter VIP home page Giter VIP logo

ipfs-pack's Introduction

ipfs-pack - filesystem packing tool

ipfs-pack is a tool and library to work with ipfs and large collections of data in UNIX/POSIX filesystems.

  • It identifies singular collections or bundles of data (the pack).
  • It creates a light-weight cryptographically secure manifest that preserves the integrity of the collection over time, and travels with the data (PackManifest).
  • It helps use ipfs in a mode that references the filesystem files directly and avoids duplicating data (filestore).
  • It carries a standard dataset metadata file to capture and present information about the dataset (data-package.json).
  • It helps verify the authenticity of data through a file carrying cryptographic signatures (PackAuth).

Installing

Pre-built binaries are available on the ipfs distributions page.

From source

If there is not a pre-built binary for your system, or you'd like to try out unreleased features, or for any other reason you want to build from source, its relatively simple. First, make sure you have go installed and properly configured. This guide from the go team should help with that. Once thats done, simply run make build.

Usage

$ ipfs-pack --help
NAME:
   ipfs-pack - A filesystem packing tool.

USAGE:
   ipfs-pack [global options] command [command options] [arguments...]

VERSION:
   v0.1.0

COMMANDS:
     make     makes the package, overwriting the PackManifest file.
     verify   verifies the ipfs-pack manifest file is correct.
     repo     manipulate the ipfs repo cache associated with this pack.
     serve    start an ipfs node to serve this pack's contents.
     help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h     show help
   --version, -v  print the version

Make a pack

$ cd /path/to/data/dir
$ ipfs-pack make
wrote PackManifest

Verify a pack

$ ipfs-pack verify
Pack verified successfully!

Testing

Tests require the random-files module

go get -u github.com/jbenet/go-random-files/random-files

Run tests with

./test/pack-basic.sh
./test/pack-serve.sh

Spec

Read the ipfs-pack work-in-progress "spec" here: Spec (WIP).

ipfs-pack's People

Contributors

chriscool avatar jbenet avatar kevina avatar parkan avatar rht avatar whyrusleeping avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ipfs-pack's Issues

Release new version

I just updated ipfs-pack to use go-ipfs 0.4.14. This should fix some of the memory issues people were running into when adding really large datasets.

We should push out a new version with these changes, and maybe fix a the other issue @Stebalien pointed out.

Specify format for PackManifest files

In order to implement pack make, which generates a PackManifest file, we need to know how PackManifest files should be formatted.

Some references:

  • bagit payload manifests -- extremely similar to what we need
  • data packages format -- this json format is designed to capture metadata about a dataset. Not likely to be included in PackManifest, but could be included in the pack's contents... (if you have ideas around this, please create issues in this repo to discuss)

From the spec

> cat PackManifest
QmVP2aaAWFe21QjUujMw5hwYRKD1eGx3yYWEBbMtuxpqXs <fmtstr> moreData/0
QmV7eDE2WXuwQnvccsoXSzK5CQGXdFfay1LSadZCwyfbDV <fmtstr> moreData/1
QmaMY7h9pmTcA5w9S2dsQT5eGLEQ1CwYQ32HwMTXAev5gQ <fmtstr> moreData/2
QmQjYU5PscpCHadDbL1fDvTK4P9eXirSwD8hzJbAyrd5mf <fmtstr> moreData/3
QmRErwActoLmffucXq7HPtefBC19MjWUcj1DdBoaAnMm6p <fmtstr> moreData/4
QmeWvL929Tdhzw27CS5ZVHD73NQ9TT1xvLvCaXCgi7a9YB <fmtstr> moreData/5
QmXbzZeh44jJEUueWjFxEiLcfAfzoaKYEy1fMHygkSD3hm <fmtstr> moreData/6
QmYL17nYZrZsAhJut5v7ooD9hmz2rBotC1tqC9ZPxzCfer <fmtstr> moreData/7
QmPKkidoUYX12PyCuKzehQuhEJofUJ9PPaX2Gc2iYd4GRs <fmtstr> moreData/8
QmQAubXA3Gji5v5oaJhMbvmbGbiuwDf1u9sYsN125mcqrn <fmtstr> moreData/9
QmYbYduoHMZAUMB5mjHoJHgJ9WndrdWkTCzuQ6yHkbgqkU <fmtstr> someJSON.json
QmeWiZD5cdyiJoS3b7h87Cs9G21uQ1sLmeKrunTae9h5qG <fmtstr> someXML.xml
QmVizQ5fUceForgWogbb2m2v5RRrE8xEm8uSkbkyNB4Rdm <fmtstr> moreData
QmZ7iEGqahTHdUWGGZMUxYRXPwSM3UjBouneLcCmj9e6q6 <fmtstr> .

Progress indicator revisited.

It would also be helpful to have ipfs-pack serve show how far it is with verifying the pack integrity. My 30+ GB pack takes about 7-8 minutes - and unless I am actively monitoring the console there is no way to really know when it starts...

Could there be a way to add something like a webhook in the config, that the daemon pings when it is up - or sends a heartbeat to? Sometimes the daemon will not be accessible via NAT traversal and for some use-cases it would be great to be able to have it report to a monitoring endpoint.

Is this lib still maintained?

There is a link in the ipfs's documentation to this lib in the section for working with the big files. Nonetheless, the latest commit was made about 2 years ago, which brings suspicion that it is outdated/deprecated/not supported any more, is it?

Pack paths prefixed by `./`

Right now the manifest contains all paths as

QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH  f0000120001     ./abcd

This is not consistent with spec (which has just regular relative paths) and makes the file slightly more annoying to parse. Manifest still verifies after removing the prefix so maybe we should remove it?

(this may be idiosyncratic to OS X)

File Attributes of UNIX File System Objects

File Attributes (either in POSIX or extended attributes) of UNIX File System Objects SHOULD be preserved as long as they allow canonical serialisation (i.e. can be uniquely hashed regardless of env).

References:

  1. Nix Archive: preserve executable flag and symlinks (required for software build process); strip permissions, timestamps, and ownerships (since they are environment-dependent) http://nixos.org/~eelco/pubs/phd-thesis.pdf#5.2.1
  2. Git: preserve executable flag and symlinks. Several tools exist to extend git to preserve ownership and permission,
  3. Torrent: extended file attributes and symlinks still WIP, an issue is raised for potential data loss after deduplication http://www.bittorrent.org/beps/bep_0047.html
  4. BagIt: unspecified
  5. DataPackage: unspecified (but MAY contain other metadata http://specs.frictionlessdata.io/data-packages/#optional-fields-1)
  6. WARC: unspecified (but header MUST contain several fields that may overlap with optional metadata in DataPackage, e.g. https://github.com/internetarchive/warc/blob/8f05a000a23bbd6501217e37cfd862ffdf19da7f/warc/warc.py#L43)

Story: Selectively Ignore Files and Directories

Given
There are files and/or sub-directories that I do not want added to the pack,

Then
I add the files paths, or path matching patterns, to a .ipfsignore file and then build the pack manifest.

Then
The manifest does not include those files

Add more tests (use sharness)

We should add some solid tests here, a lot of edge cases to test.

  • Test basic creation, serving, verify, repo regen rm and other commands
  • Test different permissions issues (how does pack make fail under bad permissions)
  • Test pack creation from different directories (passing 'dir' parameter for different commands)
  • Test changing files in the pack before and/or during serve
  • Test moving, deleting, changing perms, etc on files before/during serve

cc @chriscool, do you think you would have time to help us out on this?

Error: "too many open files" when verifying

I built a pack based on my working copy of this git repo. Immediately after making the PackManifest I ran ipfs-pack verify and got a bunch of errors:

Checksum mismatch on ./vendor/src/github.com/ipfs/go-ipfs/bin/gx. (QmcNhnTpKWEeHoBegJjjbDsvtpNxCroWxWFg8yzHVmXmyh)
Checksum mismatch on ./vendor/src/github.com/ipfs/go-ipfs/bin/gx-go. (QmfNsYvjmztqya5RipyDoFQU7Ps4YNhd3BeXk4EzNNraD8)
error: in manifest, missing from pack: ./vendor/src/github.com/multiformats/go-multihash/test/sharness/bin
error: checking file ./vendor/src/golang.org/x/crypto/pbkdf2: open ./vendor/src/golang.org/x/crypto/pbkdf2: too many open files
[... about a couple hundred repetitions of that error with different paths ...]
Pack verify found some corruption.

Compare/Contrast PackManifests with torrent files. Identify UX strengths we want to repeat

ipfs-pack PackManifest files have a similar structure to BitTorrent's "manifest-like" .torrent files. Torrent files play a prominent role in the BitTorrent UX. This issue is aimed at comparing & contrasting PackManifests and torrent files so that we can consider any implications for the ipfs-pack UX.

First-Pass at Comparison/Contrast

While IPFS has some distinct advantages over bit torrent, especially for distributed archives, there are strong UX advantages to using manifest files as a convenient, compact, relatable way of identifying and sharing datasets.

Implications

The main UX benefit for manifest files is that it allows UX scenarios where I give you a PackManifest (which is a lot smaller than the pack), or the hash of the PackManifest (which is only a few bytes), and you can use that to acquire the rest of the pack's contents directly from the network.

Some implications of this:

  • ipfs-pack repo should include the PackManifest in the object store that it builds.
  • Consider: in scenarios where I share a datasets, should I share the hash of the PackManifest or should I share the root hash of the dataset? What are the implications of each approach, since both of them ultimately let me retrieve the same blocks.

Background

The original draft proposal for ipfs-pack lists:

Other tooling examples

The ipfs-pack spec addresses the relationship to BagIt bag files in the background and the ipfs-pack bag method description, but it doesn't address the relationships to torrent files. This issue is aimed at gathering that info so we can update the spec and consider any implications for the ipfs-pack UX.

Move to IPFS Shipyard

This seems unmaintained and sadly, we have little bandwidth to maintain it as a core component. I suggest moving to ipfs-shipyard.

Will do in 1 week if no voices against.

Serve test fails with bad multihash

> ./pack-serve.sh
stuff/pqrc40ifgvj-e9
stuff/-3hom6p-y-d7
stuff/v0b5ncrxf6vr
stuff/n0q_0w87y_j6
stuff/dyt30ri50e-
stuff/pua5sscmt
stuff/j980wq8b_fzvwdv
stuff/j980wq8b_fzvwdv/g9h8u02o_1i
stuff/j980wq8b_fzvwdv/cvm2k6
stuff/j980wq8b_fzvwdv/ylylvk5fg
stuff/j980wq8b_fzvwdv/k25c0jwx2bu
stuff/j980wq8b_fzvwdv/0v351_bxiqt
stuff/j980wq8b_fzvwdv/of7rq0aki
stuff/j980wq8b_fzvwdv/jllrl5s
stuff/j980wq8b_fzvwdv/jllrl5s/rsnhhrcq0s5
stuff/j980wq8b_fzvwdv/jllrl5s/a-kvk4xhb9qige
stuff/j980wq8b_fzvwdv/jllrl5s/l4u7w6j
stuff/j980wq8b_fzvwdv/jllrl5s/twsvrubs1z
stuff/j980wq8b_fzvwdv/jllrl5s/qg10
stuff/j980wq8b_fzvwdv/jllrl5s/a-37dlga6-dlzpx
stuff/j980wq8b_fzvwdv/dqk_hn5
stuff/j980wq8b_fzvwdv/dqk_hn5/rfjv69ui0cln7a
stuff/j980wq8b_fzvwdv/dqk_hn5/g8opho4fa396
stuff/j980wq8b_fzvwdv/dqk_hn5/e2orz5x_df0l_v
stuff/j980wq8b_fzvwdv/dqk_hn5/6c1j
stuff/j980wq8b_fzvwdv/dqk_hn5/t41_9nasn6fn2f4
stuff/j980wq8b_fzvwdv/dqk_hn5/6yfohahmyqzu
stuff/j980wq8b_fzvwdv/2scf_k8p43zpgn
stuff/j980wq8b_fzvwdv/2scf_k8p43zpgn/hqi0extsi
stuff/j980wq8b_fzvwdv/2scf_k8p43zpgn/1c7cfsisepajgkx
stuff/j980wq8b_fzvwdv/2scf_k8p43zpgn/m5-_omk7g2ach
stuff/j980wq8b_fzvwdv/2scf_k8p43zpgn/qwlpknbm0v-uc
stuff/j980wq8b_fzvwdv/2scf_k8p43zpgn/y0f_b4s34
stuff/j980wq8b_fzvwdv/2scf_k8p43zpgn/c5-9g
stuff/j980wq8b_fzvwdv/f4ima_nry6tb60
stuff/j980wq8b_fzvwdv/f4ima_nry6tb60/5wy2sxa
stuff/j980wq8b_fzvwdv/f4ima_nry6tb60/tsjchxkfb
stuff/j980wq8b_fzvwdv/f4ima_nry6tb60/y6v8jj
stuff/j980wq8b_fzvwdv/f4ima_nry6tb60/hcwrsdc3c_
stuff/j980wq8b_fzvwdv/f4ima_nry6tb60/i0d93a6smz69a4
stuff/j980wq8b_fzvwdv/f4ima_nry6tb60/c5wirr
stuff/j980wq8b_fzvwdv/s0yk0bf
stuff/j980wq8b_fzvwdv/s0yk0bf/ubra_27wvw
stuff/j980wq8b_fzvwdv/s0yk0bf/r8m0ue7jafsqp
stuff/j980wq8b_fzvwdv/s0yk0bf/f0fu
stuff/j980wq8b_fzvwdv/s0yk0bf/fxoi1h
stuff/j980wq8b_fzvwdv/s0yk0bf/xblmaf
stuff/j980wq8b_fzvwdv/s0yk0bf/l4rj7in
stuff/padme3
stuff/padme3/b0hlng6yf
stuff/padme3/j6jj-0
stuff/padme3/d1ag9o
stuff/padme3/24bip5v0t
stuff/padme3/kw3l6xf1p
stuff/padme3/aoyxento6sn
stuff/padme3/iwfxvxc7r
stuff/padme3/iwfxvxc7r/c8107b55c0lv50z
stuff/padme3/iwfxvxc7r/rasto6n_nqog0
stuff/padme3/iwfxvxc7r/n_x2f0p
stuff/padme3/iwfxvxc7r/y5nym04hrbnp05y
stuff/padme3/iwfxvxc7r/_f4_kt74ojx
stuff/padme3/iwfxvxc7r/0756zr5b4cr8xu
stuff/padme3/up5ibypcjo5upr
stuff/padme3/up5ibypcjo5upr/ceh-3x6pew
stuff/padme3/up5ibypcjo5upr/ayvmwbl51
stuff/padme3/up5ibypcjo5upr/4ffaxeo8-s7x0v
stuff/padme3/up5ibypcjo5upr/aopfyxoe
stuff/padme3/up5ibypcjo5upr/rca_vbd2v6
stuff/padme3/up5ibypcjo5upr/nlpt_0mhp4
stuff/padme3/6c4ez04kqledz
stuff/padme3/6c4ez04kqledz/9eaua0dua0c
stuff/padme3/6c4ez04kqledz/99b__cb1
stuff/padme3/6c4ez04kqledz/ndbhu4yn5uh0wp
stuff/padme3/6c4ez04kqledz/iju2vett
stuff/padme3/6c4ez04kqledz/1orasjapoj0
stuff/padme3/6c4ez04kqledz/u0x_k-lsqtg-
stuff/padme3/ys-3c4epzpzmo5
stuff/padme3/ys-3c4epzpzmo5/ctxwo4kwk
stuff/padme3/ys-3c4epzpzmo5/xvh1k
stuff/padme3/ys-3c4epzpzmo5/6yhhq4_z00
stuff/padme3/ys-3c4epzpzmo5/020s2gr
stuff/padme3/ys-3c4epzpzmo5/988y-aq7-
stuff/padme3/ys-3c4epzpzmo5/gqihv8e5sc10
stuff/padme3/602-wp4x7c
stuff/padme3/602-wp4x7c/4vyrzrnmdi
stuff/padme3/602-wp4x7c/akhj5smrcwc031
stuff/padme3/602-wp4x7c/1elfgp-z
stuff/padme3/602-wp4x7c/y27cx4jvde
stuff/padme3/602-wp4x7c/ua-3t9fc4opz
stuff/padme3/602-wp4x7c/c89nhjjk
stuff/v8utdekcva74
stuff/v8utdekcva74/m0mofoj0
stuff/v8utdekcva74/cuuuq
stuff/v8utdekcva74/3k79mg29j2u76u
stuff/v8utdekcva74/0qk01e_3ch8jgui
stuff/v8utdekcva74/rz_i00unv9w
stuff/v8utdekcva74/c44vjixtm
stuff/v8utdekcva74/x_an_8c1uk7x
stuff/v8utdekcva74/x_an_8c1uk7x/0_7iz9b5o7heel
stuff/v8utdekcva74/x_an_8c1uk7x/gnzumopfep
stuff/v8utdekcva74/x_an_8c1uk7x/92v4
stuff/v8utdekcva74/x_an_8c1uk7x/fjd30
stuff/v8utdekcva74/x_an_8c1uk7x/ep1wr-ly
stuff/v8utdekcva74/x_an_8c1uk7x/isjgsgy
stuff/v8utdekcva74/oieg8qx
stuff/v8utdekcva74/oieg8qx/8g_2v
stuff/v8utdekcva74/oieg8qx/_xfj2m9_snj
stuff/v8utdekcva74/oieg8qx/k321zx7c4mdrbd9
stuff/v8utdekcva74/oieg8qx/cselbf-_4qhr
stuff/v8utdekcva74/oieg8qx/xhlfwtzyb1
stuff/v8utdekcva74/oieg8qx/-nvaam45qj
stuff/v8utdekcva74/9c8_29xarkv3w
stuff/v8utdekcva74/9c8_29xarkv3w/ux6evzuaywf
stuff/v8utdekcva74/9c8_29xarkv3w/mn-w7hdv
stuff/v8utdekcva74/9c8_29xarkv3w/9lidzw654
stuff/v8utdekcva74/9c8_29xarkv3w/x53xmpdoqva10
stuff/v8utdekcva74/9c8_29xarkv3w/l-s-7kaw2
stuff/v8utdekcva74/9c8_29xarkv3w/0zqu4s95951wi0
stuff/v8utdekcva74/bwya_4mwio0
stuff/v8utdekcva74/bwya_4mwio0/3w8han0o-r1m938
stuff/v8utdekcva74/bwya_4mwio0/3ia9mjh
stuff/v8utdekcva74/bwya_4mwio0/vgnwb5
stuff/v8utdekcva74/bwya_4mwio0/vt7ad4ghyi
stuff/v8utdekcva74/bwya_4mwio0/2iiycuy
stuff/v8utdekcva74/bwya_4mwio0/pye9l82lou7v
stuff/v8utdekcva74/n1r9o
stuff/v8utdekcva74/n1r9o/0e8rri1ry
stuff/v8utdekcva74/n1r9o/7gfs8ug0s3uq
stuff/v8utdekcva74/n1r9o/u5ln4vitk1_
stuff/v8utdekcva74/n1r9o/rlbsk71t0a
stuff/v8utdekcva74/n1r9o/y3ky
stuff/v8utdekcva74/n1r9o/wju-g0
stuff/-r3ze5mqwo
stuff/-r3ze5mqwo/u0jp1hd1tbqh
stuff/-r3ze5mqwo/i2lh9ubdk
stuff/-r3ze5mqwo/jr9ormm
stuff/-r3ze5mqwo/r_rmjpvs8tk
stuff/-r3ze5mqwo/t9ieqyzesi
stuff/-r3ze5mqwo/stlu2jvhrxduy
stuff/-r3ze5mqwo/tzmahgpl32flfd
stuff/-r3ze5mqwo/tzmahgpl32flfd/h0usaqjjlrg10
stuff/-r3ze5mqwo/tzmahgpl32flfd/z9400wz8vk--
stuff/-r3ze5mqwo/tzmahgpl32flfd/j23gc20e
stuff/-r3ze5mqwo/tzmahgpl32flfd/iu4sti
stuff/-r3ze5mqwo/tzmahgpl32flfd/td030v_o1xwkgp
stuff/-r3ze5mqwo/tzmahgpl32flfd/v7phihjm3xa_1
stuff/-r3ze5mqwo/f-m4szo
stuff/-r3ze5mqwo/f-m4szo/c0gwwwj1r6
stuff/-r3ze5mqwo/f-m4szo/-t810v_lkyqjps
stuff/-r3ze5mqwo/f-m4szo/92z4pdrvwoo
stuff/-r3ze5mqwo/f-m4szo/a32z_b
stuff/-r3ze5mqwo/f-m4szo/s7s6iv97t
stuff/-r3ze5mqwo/f-m4szo/4g7qv
stuff/-r3ze5mqwo/8w-10y2zw78_
stuff/-r3ze5mqwo/8w-10y2zw78_/8848v
stuff/-r3ze5mqwo/8w-10y2zw78_/9qb4wco
stuff/-r3ze5mqwo/8w-10y2zw78_/_v294nlbba
stuff/-r3ze5mqwo/8w-10y2zw78_/isupi8jxnn
stuff/-r3ze5mqwo/8w-10y2zw78_/eiv--iyhjt6r0f
stuff/-r3ze5mqwo/8w-10y2zw78_/yoxj
stuff/-r3ze5mqwo/kowln28vetdhesr
stuff/-r3ze5mqwo/kowln28vetdhesr/0e9cw
stuff/-r3ze5mqwo/kowln28vetdhesr/e4_3u1vkl5703o
stuff/-r3ze5mqwo/kowln28vetdhesr/uuro
stuff/-r3ze5mqwo/kowln28vetdhesr/pqhwoccq6lzqp
stuff/-r3ze5mqwo/kowln28vetdhesr/nme07
stuff/-r3ze5mqwo/kowln28vetdhesr/9b_qez3xfzsh
stuff/-r3ze5mqwo/k5tl3fvb8j
stuff/-r3ze5mqwo/k5tl3fvb8j/4gvy38dm
stuff/-r3ze5mqwo/k5tl3fvb8j/cdb_i91
stuff/-r3ze5mqwo/k5tl3fvb8j/8pjbux831
stuff/-r3ze5mqwo/k5tl3fvb8j/niojk_oix08ye
stuff/-r3ze5mqwo/k5tl3fvb8j/dgv0
stuff/-r3ze5mqwo/k5tl3fvb8j/vz3ir9sk
stuff/fc49c-vh6emp
stuff/fc49c-vh6emp/jlzod
stuff/fc49c-vh6emp/0aatyx3wubb
stuff/fc49c-vh6emp/c3u0koyla5f
stuff/fc49c-vh6emp/qew7_wlx
stuff/fc49c-vh6emp/un4dtly16jd8bz
stuff/fc49c-vh6emp/7s21--w4p-e
stuff/fc49c-vh6emp/40yxnsc21p
stuff/fc49c-vh6emp/40yxnsc21p/awo6
stuff/fc49c-vh6emp/40yxnsc21p/cctye6g16
stuff/fc49c-vh6emp/40yxnsc21p/1m0-0e56
stuff/fc49c-vh6emp/40yxnsc21p/l6mru0aks6e
stuff/fc49c-vh6emp/40yxnsc21p/puk6mtxybovb
stuff/fc49c-vh6emp/40yxnsc21p/l5bjtgq9adlu4
stuff/fc49c-vh6emp/ovng
stuff/fc49c-vh6emp/ovng/bfa5789dyt0xq
stuff/fc49c-vh6emp/ovng/ayvsesz8z8
stuff/fc49c-vh6emp/ovng/x2icv5
stuff/fc49c-vh6emp/ovng/jt8pank6e29wbv
stuff/fc49c-vh6emp/ovng/5sw_73ihvr6s
stuff/fc49c-vh6emp/ovng/0azu
stuff/fc49c-vh6emp/-054j5a4v6prym
stuff/fc49c-vh6emp/-054j5a4v6prym/tx4dkc
stuff/fc49c-vh6emp/-054j5a4v6prym/xv2koqbj
stuff/fc49c-vh6emp/-054j5a4v6prym/02scs5ek39
stuff/fc49c-vh6emp/-054j5a4v6prym/fo2f
stuff/fc49c-vh6emp/-054j5a4v6prym/6d0c4nc
stuff/fc49c-vh6emp/-054j5a4v6prym/2pw00f10
stuff/fc49c-vh6emp/5qcjorxl6w2kh0k
stuff/fc49c-vh6emp/5qcjorxl6w2kh0k/hai6u1n
stuff/fc49c-vh6emp/5qcjorxl6w2kh0k/7ftx-ba
stuff/fc49c-vh6emp/5qcjorxl6w2kh0k/09339oj
stuff/fc49c-vh6emp/5qcjorxl6w2kh0k/-_suq18ka251zq
stuff/fc49c-vh6emp/5qcjorxl6w2kh0k/07v6
stuff/fc49c-vh6emp/5qcjorxl6w2kh0k/ilv7r
stuff/fc49c-vh6emp/86hi4b0y
stuff/fc49c-vh6emp/86hi4b0y/4u64
stuff/fc49c-vh6emp/86hi4b0y/__2m3srfw3l
stuff/fc49c-vh6emp/86hi4b0y/0xxtmbicrn0hh
stuff/fc49c-vh6emp/86hi4b0y/1u00zv2
stuff/fc49c-vh6emp/86hi4b0y/8gqh54y
stuff/fc49c-vh6emp/86hi4b0y/2put1e9m5pc3y6
wrote PackManifest
peerid QmakrYQyBZyKMByetMZEXYchX5Rn5ntYLZREMAKD9Yfo4m
addr /ip4/127.0.0.1/tcp/63402
initializing ipfs node at ../ipfs
generating 2048-bit RSA keypair...done
peer identity: QmbPN4o7NgvTN7BkX75oi4xehb26g6TWwSwpArrpVw7kYs
to get started, enter:

        ipfs cat /ipfs/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme
/ip4/104.131.131.82/tcp/4001/ipfs/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ
/ip4/104.236.151.122/tcp/4001/ipfs/QmSoLju6m7xTh3DuokvT3886QRYqxAzb1kShaanJgW36yx
/ip4/104.236.176.52/tcp/4001/ipfs/QmSoLnSGccFuZQJzRadHn95W2CrSFmZuTdDWP8HXaHca9z
/ip4/104.236.179.241/tcp/4001/ipfs/QmSoLPppuBtQSGwKDZT2M73ULpjvfd3aZ6ha4oFGL1KrGM
/ip4/104.236.76.40/tcp/4001/ipfs/QmSoLV4Bbm51jM9C4gDYZQ9Cy3U6aXMJDAbzgu2fzaDs64
/ip4/128.199.219.111/tcp/4001/ipfs/QmSoLSafTMBsPKadTEgaXctDQVcqN88CNLHXMkTNwMKPnu
/ip4/162.243.248.213/tcp/4001/ipfs/QmSoLueR4xBeUbY9WZ9xGUUxunbKWcrNFTDAadQJmocnWm
/ip4/178.62.158.247/tcp/4001/ipfs/QmSoLer265NRgSp2LA3dPaeykiS1J6DifTC88f5uVQKNAd
/ip4/178.62.61.185/tcp/4001/ipfs/QmSoLMeWqB7YGVLJN3pNLQpmmEk35v6wYtsMGLzSr5QBU3
Initializing daemon...
Swarm listening on /ip4/127.0.0.1/tcp/4001
Swarm listening on /ip4/192.168.0.12/tcp/4001
Swarm listening on /ip6/2604:2000:8052:4900:62c5:47ff:fe98:e972/tcp/4001
Swarm listening on /ip6/2604:2000:8052:4900::6/tcp/4001
Swarm listening on /ip6/2604:2000:8052:4900:ad4d:b8bf:25f9:b801/tcp/4001
Swarm listening on /ip6/::1/tcp/4001
API server listening on /ip4/127.0.0.1/tcp/5001
Gateway (readonly) server listening on /ip4/127.0.0.1/tcp/8080
Daemon is ready
connect QmakrYQyBZyKMByetMZEXYchX5Rn5ntYLZREMAKD9Yfo4m success
Qmb2EDks9ez3MxHkKoJJeWWuoAMwF2BbPkBaLA2wWka8wC
Error: incorrectly formatted merkledag node: Link hash #0 is not valid multihash. multihash length inconsistent: &{1  85 [18 32 184 236 251 135 58 52 202 154 85 67 148 238 146 15 139 80 239 129 97 233 195 38 87 37 40 16 110 203 173 248 195 124]}

ipfs-pack has trouble with newlines in filenames

Newlines are technically valid in filenames, but since the PackManifest file is newline delimited this poses a slight problem.

Unsure how exactly to progress here. Do we disallow newlines in names? or escape them? Probably escape, but its worth a little bit of discussion.

Support for exotic chunking

Given the considerations mentioned in ipld/legacy-unixfs-v2#15 (comment) (following which I found ipfs-pack) for data that chunked in a content-aware fashion (there are mentioned video streams chunked at keyframes; my applications would rather have raw camera files that contain the literal jpeg inside them, images with XMP that is served as a copy in a sidecar file, or zip files without compression), I think that there are practical cases in which having a fmtstr that only tells the chunker and its parameters may be insufficient to reproduce the data already published on IPFS.

Would it be viable to have the option to explicitly describe the chunking, rather than giving rules the recipient might not be able to follow?

Straw man example:

Qm... <f0000120001> file1.jpg
Qm... <(3200, f0000120001) (4096000, f0000120001), (50000000, f0000120001)> file1.raw
Qm... <f0000120001> .

where file1.jpg is 4096000 bytes long and somewhere in the first part of file1.raw (with 3k of header and 50M of actual raw image at the tail). To reproduce the hash of file1.raw that has been published on IPFS, one needs to concatenate the results of chunking the first 3k, the jpeg data and the rest with the DefaultRaw chunker each rather than running it over the whole file.

I am currently unaware of tools that produce such content-aware chunkings of data (only tried it out manually so far), but the above-linked suggestion indicates that this is more than a pet-peeve of mine.

Progress indicator while calculating should be clear

@whyrusleeping said:

The 'ipfs-pack make' process is computing the total size of the files being added in the background. this will take some time, so until it finishes you won't get any sort of a time estimate

I think this is signaled by the ? and the "working" bar in this image:

  • I don't think just ? is super clear.
  • maybe there's more cut off in the rest of the image
  • some products use calc or calculating total or something
  • we should avoid giving an ETA in time until the total is calculated. (that could say calc or calculating or estimating too)

Very strange behaviour with a weird fix.

Not sure if this should be here - will move to go-ipfs if its more appropriate...

So, I have a pretty big set of video files that I have packed (over 30 GB) - the ones of interest are 720p (ca 2.5 Mbs) - all of which have been correctly transcoded for streaming using h264 baseline encoding.

I have been testing using the https://ipfs.io/ipfs/Qm... gateway. Most of the videos playback just freaking great, not a framedrop, smoother than vimeo, as a matter of fact. So this report is also a report about the success of the IPFS system and the ease with which ipfs-pack can be used. Fast and easy. 💯

However, and its a big however, I have noticed that some (1 in 10 maybe?) of them stall at a specific point (unique to each file, but generally it is at about 10 sec) - my guess is that it is at a block boundary... This was bugging me. Sometimes I would get them to play after reloading the browser several times, but one of them consistently stalled on several different machines. Less then 100% is not an option, so I needed to get to the bottom of it.

This video was 42 seconds long and was consistently stopping at around 34 seconds - on several different computers. Both playback and direct download would stall at the 7.5 MB mark...

Was the file corrupt / broken?

  • rsynced to a local machine from the remote box
  • file size checks out (10MB, not 7.5MB)
  • plays back appropriately in firefox and vlc.

So it wasn't a corrupt file. Then I remembered that I actually moved this particular file from one folder to another folder, renamed it and rebuilt the pack (PACK v2). Before I did this I shared the link with someone, who tried to watch the film the next day. (Several hours ago, actually.) Even though I sent them a new link to the film, they still tried to watch it with the stale link from PACK v1.

PACK v1
/packroot
  /folder
    wrongName.mp4
  /folder2

PACK v2
/packroot
  /folder
  /folder2
    rightName.mp4

I was no longer serving the "stale" link with ipfs-pack, because I regenerated the PackList and made a new PackRoot hash (PACK v2). But they were watching it, even though I wasn't serving it - I suppose it got cached on the IPFS network, which is technically awesome. The weird thing was that the video referenced by both the old link from PACK v1 and the new link from PACK v2 (that I definitely was serving) were stopping at exactly the same place!

I checked the ipfs-pack HUD, and saw that indeed, ipfs-pack (PACK v2) was throwing an error that ./folder/wrongName.mp4 couldn't be found. So I changed directory to ./packroot and wrote cp ./folder2/rightName.mp4 ./folder1/wrongName.mp4 sigkilled ipfs-pack and started it back up again...

Now the file is accessible from links at either PACK v1 or PACK v2, serve to completion and ipfs-pack doesn't throw any more errors.

ipfs-pack bag

Just a quick question

Is the ipfs-pack bag available in the latest version 0.6.0?

Never shuts down

Hitting ^C causes ipfs pack to echo "Shutting down ipfs-pack node..." and then it hangs.

Maintaining Dynamic Packs

So, I have a fundamental question about maintaining packs. Can you, @whyrusleeping give me some advice?

I am having to manually rebuild my 34GB pack almost every day. This size of a pack is kind of the limit of usefulness, because that means that I have to tear down the daemon while I am rebuilding - which takes about 15 minutes. Our use case, however, foresees Packs growing to 3TB size (the size of a current consumer HDD) - and this is a problem, because packing that many files TAKES A VERY LONG TIME®.

These are the cases that I would like to look into:

  1. An unhashed file needs to be created and added to the pack.
  2. A hashed file needs to have its metadata changed. (Is this even possible? Can I just switch out the first block...?)
  3. A hashed file needs to be in a different directory.
  4. A hashed file was transcoded wrong, and needs to be unpinned (EVERYWHERE) and replaced.

In all cases, it is not really acceptable to kill the main ipfs-pack thread. Can I manually change or add a hash to the PackManifest instead of rebuilding the pack - while still serving? I have found two files that I need to replace (my file ingest was borked, need to do it again for the main file and transcode it for the second file). For now rebuilding the pack would work, but like I said above, our data volume is going to explode and we can't be offline for 30 minutes. It would be great to have commands like:

 $ ipfs-pack inject ./folder1/somefile.mp4 
 $ ipfs-pack move ./folder1/somefile.mp4 ./folder2/somefile.mp4
 $ ipfs-pack purge ./folder2/somefile.mp4

Of course, this could all be managed by integrating a watchfolder pattern and updating the PackManifest accordingly. I am not sure how possible that is, but it would really make the daemon superpowered. Like:

ipfs pack serve --watchfs

Of course it is also possible that I am trying to find ways to use ipfs-pack that are inappropriate anti-patterns and for which other tools exist. If that is the case, I would love to hear about a better way to do this...

Decide: Should we encourage people to use ipfs-pack as the primary way of adding files to IPFS?

Should we remove the Files on IPFS Lesson from the Decentralized Web Primer and tell people to use ipfs-pack instead?

The old Files on IPFS lesson revolves around using ipfs add -w to add file content and wrap it in directory info. Should we still include that in the "introductory" material?

If we keep that older lesson, which gets closer to the core of how IPFS works, how should we frame the lesson that covers creating ipfs-packs and serving their contents?

This decision is important because it will influence how people encounter IPFS for the first time and how they think it's supposed to be used.

verify reports symlinks as checksum mismatches

With a pack that contains symlinks, ipfs-pack verify reports the symlinks as corruption.

For example, I cloned a copy of this repo and ran

make build
ipfs-pack make
ipfs-pack verify

Which resulted in this output:

Checksum mismatch on ./vendor/src/github.com/ipfs/go-ipfs/bin/gx. (QmcNhnTpKWEeHoBegJjjbDsvtpNxCroWxWFg8yzHVmXmyh)
Checksum mismatch on ./vendor/src/github.com/ipfs/go-ipfs/bin/gx-go. (QmfNsYvjmztqya5RipyDoFQU7Ps4YNhd3BeXk4EzNNraD8)
error: in manifest, missing from pack: ./vendor/src/github.com/multiformats/go-multihash/test/sharness/bin
Pack verify found some corruption.

Why shouldn't `ipfs-pack` be simply an ipfs command?

From the spec, I have seen lots of duplications in functionality with the vanilla ipfs, such as ipfs repo *, ipfs tar *, ...

This results in more documentations and cases to write for each of these new commands, even more documentations to describe how these differ from the default ones.

The only conceivable reason is that restructuring go-ipfs might necessarily result in breaking change, and ipfs-pack was meant to be a fresh, cleaner implementation for managing one repository whereas ipfs itself handles many repositories (~ npm vs npm -g) and doesn't default to filestore. But most api calls people have used are add/get/object, unaffected by these details of archive/exchange formats.

Wouldn't it be simpler to have them all be:

  • ipfs add -> ipfs add
  • ipfs tar add -> ipfs add --format [bag|warc|car|nar|tar]
  • ipfs-pack make -> ipfs add --nocopy --pack . or ipfs add --nocopy --format car
  • ipfs-pack verify -> ipfs repo verify --local
  • ipfs-pack ls -> ipfs repo ls --local # implies filestore is being used
  • ipfs-pack gc -> ipfs repo gc --local
  • ipfs-pack serve -> ipfs daemon --local
  • ipfs get/ipfs cat detects whether the hashed content is formatted in bag/warc/car/nar/tar. ipfs get has options to write the output to be whether in bag/warc/car/nar/tar

Include PackManifest files in the pack's contents

Given I have the root hash of a pack
Then I should be able to view the pack manifest with ipfs cat ROOTHASH/PackManifest

This would allow me to rebuild an exact replica of the pack. It also allows me to quickly and consistently see what's in any dataset that's been published using pack.

seek PackManifest: invalid argument

✔fil:tutut [ master | ✔ ]> pwd
/Users/fil/Source/ipfs-pack/tutut
✔fil:tutut [ master | ✔ ]> ../ipfs-pack make
wrote PackManifest
✔fil:tutut [ master | ✔ ]> ../ipfs-pack serve
verified pack, starting server...
seek PackManifest: invalid argument
✔fil:tutut [ master | ✔ ]> cat PackManifest
zb2rhXbLoTfgP8xqk912oE8eynnDsMLhmZHxCSiVymfdNE6Hq	f0000120001	./hell1.txt
zb2rhXbLoTfgP8xqk912oE8eynnDsMLhmZHxCSiVymfdNE6Hq	f0000120001	./hellO.txt
QmXKJxD7MPJKtfo9oYo7x6yko5vYtHUwGRjj3R5YSKPvdc	f0000120001	.

(an OSX issue?)
@whyrusleeping

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.