Giter VIP home page Giter VIP logo

go-mtree's Introduction

go-mtree

Go Go Report Card

mtree is a filesystem hierarchy validation tooling and format. This is a library and simple cli tool for mtree(8) support.

While the traditional mtree cli utility is primarily on BSDs (FreeBSD, openBSD, etc), even broader support for the mtree specification format is provided with libarchive (libarchive-formats(5)).

There is also an mtree port for Linux though it is not widely packaged for Linux distributions.

There was a Google Summer of Code project to create a portable library and parser for mtree. It is available at github.com/mratajsky/libmtree and a talk on it.

Format

The BSD mtree specification is published in mtree(5).

The format of hierarchy specification is consistent with the # mtree v2.0 format. Both the BSD mtree and libarchive ought to be interoperable with it with only one definite caveat. On Linux, extended attributes (xattr) on files are often a critical aspect of the file, holding ACLs, capabilities, etc. While FreeBSD filesystem do support extattr, this feature has not made its way into their mtree.

This implementation of mtree supports a few non-upstream "keyword"s, such as: xattr and tar_time. If you include these keywords, the FreeBSD mtree will fail, as they are unknown keywords to that implementation.

To have go-mtree produce specifications that will be strictly compatible with the BSD mtree, use the -bsd-keywords flag when creating a manifest. This will make sure that only the keywords supported by BSD mtree are used in the program.

Typical form

With the standard keywords, plus say sha256digest, the hierarchy specification looks like:

# .
/set type=file nlink=1 mode=0664 uid=1000 gid=100
. size=4096 type=dir mode=0755 nlink=6 time=1459370393.273231538
    LICENSE size=1502 mode=0644 time=1458851690.0 sha256digest=ef4e53d83096be56dc38dbf9bc8ba9e3068bec1ec37c179033d1e8f99a1c2a95
    README.md size=2820 mode=0644 time=1459370256.316148361 sha256digest=d9b955134d99f84b17c0a711ce507515cc93cd7080a9dcd50400e3d993d876ac

[...]

See the directory presently in, and the files present. Along with each path, is provided the keywords and the unique values for each path. Any common keyword and values are established in the /set command.

Extended attributes form

# .
/set type=file nlink=1 mode=0664 uid=1000 gid=1000
. size=4096 type=dir mode=0775 nlink=6 time=1459370191.11179595 xattr.security.selinux=dW5jb25maW5lZF91Om9iamVjdF9yOnVzZXJfaG9tZV90OnMwAA==
    LICENSE size=1502 time=1458851690.583562292 xattr.security.selinux=dW5jb25maW5lZF91Om9iamVjdF9yOnVzZXJfaG9tZV90OnMwAA==
    README.md size=2366 mode=0644 time=1459369604.0 xattr.security.selinux=dW5jb25maW5lZF91Om9iamVjdF9yOnVzZXJfaG9tZV90OnMwAA==

[...]

See the keyword prefixed with xattr. followed by the extended attribute's namespace and keyword. This setup is consistent for use with Linux extended attributes as well as FreeBSD extended attributes.

Since extended attributes are an unordered hashmap, this approach allows for checking each <namespace>.<key> individually.

The value is the base64 encoded of the value of the particular extended attribute. Since the values themselves could be raw bytes, this approach avoids issues with encoding.

Tar form

# .
/set type=file mode=0664 uid=1000 gid=1000
. type=dir mode=0775 tar_time=1468430408.000000000

# samedir
samedir type=dir mode=0775 tar_time=1468000972.000000000
    file2 size=0 tar_time=1467999782.000000000
    file1 size=0 tar_time=1467999781.000000000
    
[...]

While go-mtree serves mainly as a library for upstream mtree support, go-mtree is also compatible with tar archives (which is not an upstream feature). This means that we can now create and validate a manifest by specifying a tar file. More interestingly, this also means that we can create a manifest from an archive, and then validate this manifest against a filesystem hierarchy that's on disk, and vice versa.

Notice that for the output of creating a validation manifest from a tar file, the default behavior for evaluating a notion of time is to use the tar_time keyword. In the "filesystem hierarchy" format of mtree, time is being evaluated with nanosecond precision. However, GNU tar truncates a file's modification time to 1-second precision. That is, if a file's full modification time is 123456789.123456789, the "tar time" equivalent would be 123456789.000000000. This way, if you validate a manifest created using a tar file against an actual root directory, there will be no complaints from go-mtree so long as the 1-second precision time of a file in the root directory is the same.

Usage

To use the Go programming language library, see the docs.

To use the command line tool, first build it, then the following.

Create a manifest

This will also include the sha512 digest of the files.

gomtree validate -c -K sha512digest -p . > /tmp/root.mtree

With a tar file:

gomtree validate -c -K sha512digest -T sometarfile.tar > /tmp/tar.mtree

Validate a manifest

gomtree validate -p . -f /tmp/root.mtree

With a tar file:

gomtree validate -T sometarfile.tar -f /tmp/root.mtree

See the supported keywords

gomtree validate -list-keywords
Available keywords:
 uname
 sha1
 sha1digest
 sha256digest
 xattrs (not upstream)
 link (default)
 nlink (default)
 md5digest
 rmd160digest
 mode (default)
 cksum
 md5
 rmd160
 type (default)
 time (default)
 uid (default)
 gid (default)
 sha256
 sha384
 sha512
 xattr (not upstream)
 tar_time (not upstream)
 size (default)
 ripemd160digest
 sha384digest
 sha512digest

Building

Either:

go install github.com/vbatts/go-mtree/cmd/gomtree@latest

or

git clone git://github.com/vbatts/go-mtree.git $GOPATH/src/github.com/vbatts/go-mtree
cd $GOPATH/src/github.com/vbatts/go-mtree
go build ./cmd/gomtree

Build for many OS/Arch

make build.arches

Testing

On Linux:

cd $GOPATH/src/github.com/vbatts/go-mtree
make

On FreeBSD:

cd $GOPATH/src/github.com/vbatts/go-mtree
gmake

Related tools

go-mtree's People

Contributors

asellappen avatar baude avatar cyphar avatar dependabot[bot] avatar lsm5 avatar mjg59 avatar thesayyn avatar tklauser avatar tych0 avatar vbatts avatar wking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-mtree's Issues

[RFE] given a manifest, return keywords used

Something like:

$ gomtree -f ./foo.mtree -list-used

to return a parseable string of keywords used, such that subsequent tooling could easily choose to validate the manifest against that set of keywords or less (i.e. ignoring mtimes).

[bug] fails validating on broken symlink

$ sudo /home/vbatts/bin/gomtree -c -p fedora/ -k mode,uid,gid,size,sha512digest > fedora.mtree
$ sudo /home/vbatts/bin/gomtree -f ./fedora.mtree -p fedora/
2016/07/20 23:00:38 open etc/alternatives/cifs-idmap-plugin: no such file or directory
$ sudo file fedora/etc/alternatives/cifs-idmap-plugin
fedora/etc/alternatives/cifs-idmap-plugin: broken symbolic link to /usr/lib64/cifs-utils/cifs_idmap_sss.so

Vis and Unvis break on UTF-8

Sigh. Okay, so if we have an especially well-named file such as AC_Raíz_Certicámara_S.A..pem, go-mtree will not handle it correctly when you call .Path() on an entry which has its name set to the above.

Effectively what happens is that you have a multi-byte encoded character being passed to the lovely Vis and Unvis code -- which obviously break in horrible ways. The string is then mutated in a very ugly way.

IMO the only way of handling this is to rewrite Vis and Unvis in Go...

[RFE] Attempted to restore attributes from specification

like the -t and -u flags of the mtree cli.

     -t                 Modify the modified time of existing files, the device
                        type of devices, and symbolic link targets, to match
                        the specification.
     -u                 Modify the owner, group, permissions, and flags of
                        existing files, the device type of devices, and sym-
                        bolic link targets, to match the specification.  Cre-
                        ate any missing directories, devices or symbolic
                        links.  User, group, and permissions must all be spec-
                        ified for missing directories to be created.  Note
                        that unless the -i option is given, the schg and
                        sappnd flags will not be set, even if specified.  If
                        -m is given, these flags will be reset.  Exit with a
                        status of 0 on success, 2 if the file hierarchy did
                        not match the specification, and 1 if any other error
                        occurred.

simple plain code api to create DirectoryHierarchy

ref: https://twitter.com/paultag/status/936063255948165120

@paultag

Hey @vbatts - I'm looking at using go-mtree -- is there an easy way to create a new DirectoryHierarchy in plain code (e.g. no archive or filesystem tree exists)?
The Entry pointers look nested enough to where I don't want to do that by hand :)

@vbatts

hey! good question. I haven't put thought to crafting one up in plain code. What do you have? os.FileInfo?

@paultag

Worse than that, sadly - just a path and a single key/value to set; it's not a standard use-case, and a "lolno" answer is maybe the right one :)

@vbatts

lolno is not the answer, but i haven't considered the case to make it easy. Are we talking like the output of libarchive/bsdtar mtree? or casync mtree?

[bug]: gomtree-tar only works with archives that are created under a common directory

right now populateTree functions under the assumption that the the archive passed in by the user was created under one common directory. It looks at a header's path (via hdr.Name), and determines that if filepath.dir(pathname) is equal to ".", then that header must be the top-level root directory.

Ex: with tar -cvf <archivename> <somedirectory>: First part of path is considered the "root directory"

tardemo/
tardemo/rootfile
tardemo/dir3/
tardemo/dir3/dir5/
tardemo/dir3/dir5/.file5
tardemo/dir3/file3
tardemo/dir2/
tardemo/dir2/dir4/
tardemo/dir2/dir4/file4
tardemo/dir1/
tardemo/dir1/file1
tardemo/dir1/dir6/
tardemo/dir1/dir7/
tardemo/dir1/dir7/file8
tardemo/dir1/dir7/file6
tardemo/dir1/dir7/file7
tardemo/dir1/file2

This causes an issue when the archive was instead created using a collection of directories/files.

Ex: with tar -cvf <archivename> <dir1> <dir2> <file2> ...:

dir1/
dir1/file1
dir1/dir6/
dir1/dir7/
dir1/dir7/file8
dir1/dir7/file6
dir1/dir7/file7
dir1/file2
dir2/
dir2/dir4/
dir2/dir4/file4
dir3/
dir3/dir5/
dir3/dir5/.file5
dir3/file3
rootfile

indentation of directories wrong, if 'type' keyword not used

$ gomtree -c -k time -p ~/bin  | grep sync > /tmp/1
$ gomtree -c -k time,type -p ~/bin  | grep sync > /tmp/2
$ diff -up /tmp/1 /tmp/2
--- /tmp/1      2016-07-14 23:36:24.386730325 -0400
+++ /tmp/2      2016-07-14 23:36:12.803630747 -0400
@@ -3,7 +3,7 @@
     syncthing time=1460560866.873076226
     syncthing.old time=1458764693.169687624
 # .sync
-    .sync time=1386609429.618041696
+.sync time=1386609429.618041696 type=dir
     sync.dat time=1386609172.616042077
     sync.lng time=1386609380.614041611
     sync.log time=1386609368.970042045

[RFE] support an exclude list

Both by golang API and by command line flags.

Upstream as -X <exclude-file>:

-X exclude-file    The specified file contains fnmatch(3) patterns match-
                   ing files to be excluded from the specification, one
                   to a line.  If the pattern contains a `/' character,
                   it will be matched against entire pathnames (relative
                   to the starting directory); otherwise, it will be
                   matched against basenames only.  Comments are permit-
                   ted in the exclude-list file.

This will come in handy for some archives that are using layer-like entries that have file entries that whiteout markers, and will not exist as files once extracted.

cannot generate manifest if insufficient permissions

If we own a file we have the right to change the mode, but as an unprivileged user we don't have the rights to read said file unless it has the read bit set. So if you try to generate an mtree manifest for a directory containing chmod -r files and you're not root then you're going to have a bad time.

Now, this isn't technically a huge issue for most cases, but currently this is blocking cyphar/umoci#26. Currently I'm working around it by doing a chmod u+r on every file that we extract, but that's just a horrible hack. I could also hack around it by modifying the DirectoryHierarchy manually after I get it from go-mtree.

But all of these solutions are hacking around the base of the problem: Currently we require reading every single file in a tree to make the manifest but we bail if we can't just open(O_RDONLY) it. Unfortunately the only workaround I've managed to figure out is that we temporarily change the mode (then change it back and reset the atime and mtime).

size reported for "type=dir" headers is 0

version: 0b85ce

When collecting values of keywords from a header hdr via tar stream in readHeaders(), hdr.FileInfo().Size() returns size 0 when it is type=dir.

For example, for .git folder:

Actual: .git type=dir size=0 time=1461100246.000000000
Expected: .git type=dir size=4096 time=1461100246.000000000

Missing files not reported

I'm seeing two errors related to verification of a manifest:

  • When you create an mtree manifest for foo1 and then use that manifest to verify foo2, which is missing missing files that foo1 has, go-mtree does not report these.
  • When you create an mtree manifest for foo and then use the manifest to verify foo2, where foo2 has files foo1 does not, go-mtree does not report these.
#!/bin/bash
set -x
set -e

name=$(basename $0)
root=$1
gomtree=/home/bbaude/bin/gomtree
left=$(mktemp -d /tmp/go-mtree.XXXXXX)
right=$(mktemp -d /tmp/go-mtree.XXXXXX)

echo "[${name}] Running in ${left} and ${right}"

touch ${left}/one
touch ${left}/two
cp -a ${left}/one ${right}/
ls -lR ${left}
ls -lR ${right}

$gomtree -K "sha256digest" -p ${left} -c > /tmp/left.mtree
$gomtree -k "sha256digest" -p ${right} -f /tmp/left.mtree
echo $?
rm -fr ${left} ${right}

release?

It's been a bit since a release, and because of the golang default of latest-released-version, I have to put:

replace github.com/vbatts/go-mtree v0.4.4 => github.com/vbatts/go-mtree v0.4.5-0.20190122034725-8b6de6073c1a

Everywhere to use some of the new code. It would be nice to get rid of this :)

add option to not error on "{filename} extra" output

From mtree(8):

     -e    Do not complain about files that are in the file hierarchy, but not in the specification.

I think this would be useful for when we just want to know if the hierarchy derived from the specification is at least a subset of the actual file hierarchy it is being compared to.

[RFE] tar: handle OCI image layout

Tar archives are well known for containing files and directories. The OCI Image Layer describes "whiteout" entries that signify the removal of files/directories. These whiteouts are effectively events to delete portions of the the DirectoryHierarchy.

An interesting function that would come from supporting these whiteout files would be, staging a view of what the layered filesystem attributes will be. (i.e. snapshots of the expected mtree directory hierarchy as these set of layers are processed)

[RFE] switch flag parser to `github.com/urfave/cli/v2`

we could use some subcommands, say for manifest conversions or handling layered container tar archives, and this is tough with the built-in flag library.
switching to github.com/urfave/cli/v2 will make that cleaner and easier, and all the single hyphen flags we currently use will translate over just fine.

package main

import (
        "os"

        "github.com/urfave/cli/v2"
)

func main() {
        app := cli.NewApp()
        app.Name = "greet"
        app.Usage = "say a greeting"
        app.Flags = []cli.Flag{
                &cli.BoolFlag{
                        Name:  "list-keywords",
                        Usage: "List the keywords available",
                },
                &cli.BoolFlag{
                        Name:  "list-used",
                        Usage: "List all the keywords found in a validation manifest",
                },
        }
        app.Action = func(c *cli.Context) error {
                println("Greetings")
                return nil
        }

        app.Run(os.Args)
}
vbatts@vbatts-Lemur-Pro:/tmp/tmp.HEyhSXjCsc$ gr . -help
NAME:
   greet - say a greeting

USAGE:
   foo [global options] command [command options] [arguments...]

COMMANDS:
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --list-keywords  List the keywords available (default: false)
   --list-used      List all the keywords found in a validation manifest (default: false)
   --help, -h       show help (default: false)
vbatts@vbatts-Lemur-Pro:/tmp/tmp.HEyhSXjCsc$ gr . --list-used
Greetings
vbatts@vbatts-Lemur-Pro:/tmp/tmp.HEyhSXjCsc$ gr . -list-used
Greetings

(notice the single or double hyphen usage)

Perhaps BSD compat keywords

Like how keywords are annotated for being in the default set:

$ gomtree -list-keywords
Available keywords:
  md5
  rmd160digest                                                                                                                                                                                                       sha1
  xattr
  sha512
  sha512digest
  size  (default)
  link  (default)                                                                                                                                                                                                    rmd160
  ripemd160digest
  sha256digest
  sha1digest
  sha384digest                                                                                                                                                                                                       type  (default)
  time  (default)
  gid  (default)
  uname
  md5digest
  sha384
  tar_time
  uid  (default)
  nlink  (default)
  mode  (default)
  cksum
  sha256

Perhaps we should also annotate keywords for whether they are in the upstream mtree? Because upstream behavior is to fail if there is a keyword present that the tool does not recognize (i.e. xattr or tar_time).

mode is not including all data

    overlay size=3889552 uid=0 mode=0755 time=1466694165.200189680

but

  File: 'bin/overlay'
  Size: 3889552         Blocks: 7600       IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 2363656     Links: 1
Access: (4755/-rwsr-xr-x)  Uid: (    0/    root)   Gid: (  100/   users)
Access: 2016-06-24 13:38:23.815235844 -0400
Modify: 2016-06-23 11:02:45.200189680 -0400
Change: 2016-06-23 11:02:51.133238302 -0400
 Birth: -

[bug] resolving hardlinks in tar archive

Expected (with path creation):

[root@localhost layers] # gomtree -c -p testnlink -K sha256
# .
/set type=file nlink=1 mode=0664 uid=0 gid=0
. size=4096 type=dir mode=0755 nlink=2 time=1470782811.790464026
    file1 size=6 mode=0644 nlink=2 time=1470783087.001518807 sha256digest=5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
    file2 size=6 mode=0644 nlink=2 time=1470783087.001518807 sha256digest=5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03

Actual (using tar creation):

[root@localhost layers] # gomtree -c -T test.tar -K sha256
. type=dir

# testnlink
/set type=file mode=0664 uid=0 gid=0
testnlink type=dir mode=0755 tar_time=1470782811.000000000
    file2 size=0 mode=0644 link=testnlink/file1 tar_time=1470783087.000000000 sha256digest=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    file1 size=6 mode=0644 tar_time=1470783087.000000000 sha256digest=5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
# testnlink
..
# .
..

[BUG] xattr: new xattrs are not noticed in Compare

If you add a new xattr to a file, Compare won't find it. The reason for this is because we ignore all non-Modified keywords (on the assumption that each keyword (-k) maps to only one keyword in the spec). For xattr that is not the case, and this means that if you add a new xattr to some files, go-mtree won't notice it.

chokes on `gname=`

also, gomtree appears to choke on "gname="
"uname=" it apparently accepts though
hmm, and i get this:
2017/04/05 09:50:58 Unknown keyword "" for file "."
not sure what i shall make of that
that's triggered by my first line:
. type=dir mode=0755 uid=1000 gid=1000 uname=lennart time=1491378655.401146957

Reported-by: Lennart Poettering [email protected]

make .cli.test doesn't error out

Because of how the .cli.test rule is written, make will not exit with an error. I noticed this while adding tests to #96. It's a little concerning to be honest.

Also, I don't understand why you've implemented the touch .cli.test code. Why should testing ever be skipped?

test failure: including sha256digest keyword

https://travis-ci.org/vbatts/go-mtree/jobs/175124512

go vet ./... && touch .vet
go build ./cmd/gomtree
[0001.sh] Running in /tmp/go-mtree.qvaEFy
~/gopath/src/github.com/vbatts/go-mtree ~/gopath/src/github.com/vbatts/go-mtree
~/gopath/src/github.com/vbatts/go-mtree
[0002.sh] Running in /tmp/go-mtree.cC6opx
~/gopath/src/github.com/vbatts/go-mtree ~/gopath/src/github.com/vbatts/go-mtree
2016/11/11 17:08:59 cannot verify keywords not in mtree specification: sha256digest
make: *** [.cli.test] Error 1
The command "make validation" exited with 2.

[BUG] xattr: spaces in extended attribute keys fail

If you install python3-xattrs, you can check this pretty easily:

% mkdir dir
% touch dir/a
% xattr -w user."some key" "some value" dir/a
% gomtree -c -p dir/ > mtree
% gomtree -f mtree -p dir/
2016/12/18 03:00:43 Unknown keyword "" for file "dir"

Which happens because the space is not encoded with Vis.

@vbatts We really should do a pass and make sure that everything is encoded properly. Because we've had this issue quite a few times.

[bug] digest validation only happening when specified

[vbatts@bananaboat] {master *} ~/containers$ gomtree -c -T ./fedora.tar -K sha512digest,nlink > fedora.mtree.tar
[vbatts@bananaboat] {master *} ~/containers$ gomtree -f ./fedor.tara.mtree -p ./fedora/
[vbatts@bananaboat] {master *} ~/containers$ echo "" >> fedora/etc/fstab 
[vbatts@bananaboat] {master *} ~/containers$ gomtree -f ./fedora.mtree.tar -p ./fedora/
"etc/fstab": keyword "size": expected 313; got 314
[vbatts@bananaboat] {master *} ~/containers$ gomtree -f ./fedora.mtree.tar -p ./fedora/ -k sha512digest
"etc/fstab": keyword "sha512digest": expected 84e7b670c534f3f7e595c5ad5825de555d0d09cd37fbdb94beeee1023d9a1b5ceeae75ae4d31d9d9a974017894aed7f0ca9e7d66d34cbc8c84efb19b264b8e54; got 0df9ed341b0fceecceadc91e54bff16559512a2a27a6988c5111d92f1004d4c28eae90f70ba4565563615cf45d3bab57a15f0546cdc3a7ca02a6f8c6257a2f3b

The sha512digest keyword is in the manifest, but only being checked against when specifically instructed to do so.

vis: move govis to library and rewrite it

The current state of Vis and Unvis is quite ugly. They are a port of vis(3) and unvis(3) from BSD, which were written in C in the 80s. As a result, the code is effectively impossible to follow and required hacking around in order to make it work nicely with unicode (because Go is different to C when iterating over strings).

Vis and Unvis also need to be much better tested -- especially with unicode input.

[bug] gomtree writing output before readHeaders() finishes?...

Recently I've been playing around with how we could integrate gomtree into validating images, so I tried performing gomtree -c -T <imagename>. Currently I'm running into a nasty bug. It seems as if readHeaders() is returning prematurely. Putting in some print statements, the call to flatten after reading the entire archive does not even return. Could we take a closer look at the call to go ts.readHeaders()?

[schung@localhost Desktop] $ sudo ~/go/src/github.com/vbatts/go-mtree/cmd/gomtree/gomtree -c -T 7c91a140e7a1025c3bc3aace4c80c0d9933ac4ee24b8630a6b0b5d8b9ce6b9d4.tar
#          user: root
#       machine: dhcp-25-98.bos.redhat.com
#          tree: <user specified tar archive>
#          date: Wed Jul 27 13:46:29 2016

# .
/set type=file mode=0664 uid=0 gid=0
. type=dir mode=0755 tar_time=1466071917.000000000
    run size=0 type=link mode=0777 link=../run tar_time=1466071905.000000000
    mail size=0 type=link mode=0777 link=spool/mail tar_time=1454537432.000000000
    lock size=0 type=link mode=0777 link=../run/lock tar_time=1466071905.000000000
    tmp size=0 type=link mode=0777 link=../var/tmp tar_time=1454537432.000000000

[...]

    librpmio.so.7.0.0 size=178304 mode=0755 tar_time=1461591946.000000000
    librpmio.so.7 size=0 type=link mode=0777 link=librpmio.so.7.0.0 tar_time=1461591933.000000000
    librpmbuild.so.7.0.0 size=152792 mode=0755 tar_time=1461591946.000000000
    librpmbuild.so.7 size=0 type=link mode=0777 link=librpmbuild.so.7.0.0 tar_time=1461591934.000000000
    librpm.so.7.0.0 size=480648 mode=0755 tar_time=1461591946.000000000
    librpm.so.7 size=0 type=link mode=0777 link=librpm.so.7.0.0 tar_time=1461591933.000000000
    libresolv.so.2 size=0 type=link mode=0777 link=libresolv-2.23.so tar_time=1462965770.000000000
    libresolv-2.23.so size=110184 mode=0755 tar_time=1462965984.000000000
    librepo.so.0 size=147600 mode=0755 tar_time=1460380791.000000000
    libreadline.so.6.3 size=296072 mode=0755 tar_time=1454646611.000000000
    libreadline.so.6 size=0 type=link mode=0777 link=libreadline.so.6.3 tar_time=1454646611.000000000
    libqrencode.so.3.4.2 size=52184 mode=0755 tar_time=1454644282.000000000
    libqrencode.so.3 size=0 type=link mode=0777 link=libqrencode.so.3.4.2 tar_time=1454644282.000000000
    libpython3.so size=6600 mode=0755 tar_time=1457105066.000000000
    libpython3.5m.so.1.0 size=2772416 mode=0755 tar_time=1457105066.000000000
    libpwquality.so.1.0.2 size=23488 mode=0755 tar_time=1454571541.000000000
    libpwquality.so.1 size=0 type=link mode=0777 link=libpwquality.so.1.0.2 tar_time=1454571537.000000000
    libpthread.so.0 size=0 type=link mode=0777 link=libpthread-2.23.so tar_time=1462965770.000000000
    libpthread-2.23.so size=142296 mode=0755 tar_time=1462965988.000000000

vs the same call to the same file:

[schung@localhost Desktop] $ sudo ~/go/src/github.com/vbatts/go-mtree/cmd/gomtree/gomtree -c -T 7c91a140e7a1025c3bc3aace4c80c0d9933ac4ee24b8630a6b0b5d8b9ce6b9d4.tar
#          user: root
#       machine: dhcp-25-98.bos.redhat.com
#          tree: <user specified tar archive>
#          date: Wed Jul 27 13:48:57 2016

# .
/set type=file mode=0664 uid=0 gid=0
. type=dir mode=0755 tar_time=1466071917.000000000
    run size=0 type=link mode=0777 link=../run tar_time=1466071905.000000000
    mail size=0 type=link mode=0777 link=spool/mail tar_time=1454537432.000000000
    lock size=0 type=link mode=0777 link=../run/lock tar_time=1466071905.000000000
    tmp size=0 type=link mode=0777 link=../var/tmp tar_time=1454537432.000000000

  [...]
    libseccomp.so.2 size=0 type=link mode=0777 link=libseccomp.so.2.3.1 tar_time=1461186806.000000000
    libsasl2.so.3.0.0 size=120136 mode=0755 tar_time=1454529287.000000000
    libsasl2.so.3 size=0 type=link mode=0777 link=libsasl2.so.3.0.0 tar_time=1454529278.000000000
    librt.so.1 size=0 type=link mode=0777 link=librt-2.23.so tar_time=1462965770.000000000
    librt-2.23.so size=43664 mode=0755 tar_time=1462965985.000000000
    librpmsign.so.7.0.0 size=19192 mode=0755 tar_time=1461591946.000000000
    librpmsign.so.7 size=0 type=link mode=0777 link=librpmsign.so.7.0.0 tar_time=1461591934.000000000
    librpmio.so.7.0.0 size=178304 mode=0755 tar_time=1461591946.000000000
    librpmio.so.7 size=0 type=link mode=0777 link=librpmio.so.7.0.0 tar_time=1461591933.000000000
    librpmbuild.so.7.0.0 size=152792 mode=0755 tar_time=1461591946.000000000
    librpmbuild.so.7 size=0 type=link mode=0777 link=librpmbuild.so.7.0.0 tar_time=1461591934.000000000
    librpm.so.7.0.0 size=480648 mode=0755 tar_time=1461591946.000000000

SElinux xattrs are context sensitive

Spinning off from: #110 (comment)

If particular xattrs are set, then they ought to be compared, but when expecting only certain xattrs and finding more than expected (like .security, .caps, .acls, etc) this ought not cause a failed check.

tar: generates incomplete manifests

If you do something like this:

% gomtree -T a.tar -c
#          user: cyphar
#       machine: gordon
#          tree: <user specified tar archive>
#          date: Mon Oct 31 04:35:03 2016

# .
. type=dir
    tmpfile size=12 type=file uid=1000 gid=100 mode=0644 tar_time=100.000000000

# testdir
/set type=file mode=0664 uid=1000 gid=100
testdir type=dir mode=0755 tar_time=100.000000000
    anotherfile size=3 mode=0644 tar_time=100.000000000
# testdir
..
# .
..

You can clearly see that /set is in the wrong place. Not to mention that the directories are incorrectly labeled and . is completely unlabeled. This is breaking the comparison tests for #48.

[bug] digest synonyms on validation

By defining sha256 and sha256digest with the same function, we are able to create a manifest with the same output by doing gomtree -c -p . -K sha256 and gomtree -c -p . -K sha256digest.

However, when we validate a manifest, we must do the same, in that if we specified gomtree to validate with -k sha256, and then come across sha256digest in the validation manifest, the digest is still validated. This is because when we do the validation, we check if the keyword we find in the manifest is in the set of keywords that the user specifies.

whitespace not accounted for in file names

gomtree doesn't recognize files with whitespace in them. mtree replaces spaces with ASCII character \040. A solution is to either modify the parsing algorithm, or change how walk stores the name of a file.

gomtree:

[schung@dhcp-25-98 space] $ gomtree -c -p . > /tmp/spaces.mtree
[schung@dhcp-25-98 space] $ cat /tmp/spaces.mtree
#          user: schung
#       machine: localhost.localdomain
#          tree: /home/schung/Desktop/space
#          date: Tue Jul 12 10:21:52 2016

# .
/set type=file nlink=1 mode=0664 uid=1000 gid=1000
. size=4096 type=dir mode=0775 nlink=5 time=1468332765.765634496
         this has a lotofspaces size=6 time=1468332150.591553224

#    hi
   hi size=4096 type=dir mode=0775 nlink=2 time=1468332052.420019002
..

# hello_there
hello_there size=4096 type=dir mode=0775 nlink=2 time=1468332042.981063786
..

# space rocks
space rocks size=4096 type=dir mode=0775 nlink=2 time=1468331840.664023694
..
..
[schung@dhcp-25-98 space] $ gomtree -p . -f /tmp/spaces.mtree
2016/07/12 10:22:14 lstat this: no such file or directory

mtree:

[schung@dhcp-25-98 space] $ mtree -c -p . > /tmp/spaces.mtree
[schung@dhcp-25-98 space] $ cat /tmp/spaces.mtree
#      user: schung
#   machine: localhost.localdomain
#      tree: /home/schung/Desktop/space
#      date: Tue Jul 12 10:22:49 2016

# .
/set type=file uid=1000 gid=1000 mode=0775 nlink=1 flags=none
.               type=dir nlink=5 size=4096 time=1468332765.765634496
    \040\040\040\040\040this\040has\040a\040lotofspaces \
                mode=0664 size=6 time=1468332150.591553224

# ./   hi
\040\040\040hi  type=dir nlink=2 size=4096 time=1468332052.420019002
# ./   hi
..


# ./hello_there
hello_there     type=dir nlink=2 size=4096 time=1468332042.981063786
# ./hello_there
..


# ./space rocks
space\040rocks  type=dir nlink=2 size=4096 time=1468331840.664023694
# ./space rocks
..

..

[schung@dhcp-25-98 space] $ mtree -p . -f /tmp/spaces.mtree
[schung@dhcp-25-98 space] $ echo $?
0

[RFC] expose Tar processing internals

Rather than having only mtree.NewTarStreamer() with returning an io.Reader.
This currently requires either a goroutine, or a struct that keeps track of position and a buffered reader. (I opted for the goroutine)

Potentially we could have an internal model of collecting the mtree entries from tar headers. Something like:

  fh, _ := os.Open("file.tar")
  //...
  dh := mtree.DirectoryHierarchy{}
  entryRdr := mtree.TarEntryReader(fh)
  for {
    e, err := entryRdr.Next()
    if err != nil {
      break
    }
    dh.Entries = append(dh.Entries, e)
  }

Then the current signature could just wrap this in a goroutine:

func NewTarStreamer(r io.Reader, keywords []string) Streamer

And then we could more easily expose a non-goroutine, but more blocking, signature like:

func NewDhFromTar(r io.Reader, keywords []string) (*DirectoryHierarchy, error)

[tar_stream] time is being rounded, exact nanoseconds aren't being preserved

basic tar -cf does not seem to be preserving exact mod time in nanoseconds. Below is output of creating a new mtree spec from extracted tar file.

[schung@localhost Desktop] $ gomtree -c -p newdemo
#          user: schung
#       machine: localhost.localdomain
#          tree: newdemo
#          date: Mon Jul 18 10:51:07 2016

# .
/set type=file nlink=1 mode=0664 uid=1000 gid=1000
. size=4096 type=dir mode=0775 nlink=2 time=1468853428.512943631
    file1 size=0 time=1468853425.971947758
    file2 size=0 time=1468853428.512943631
..

But:

[schung@localhost Desktop] $ tar -cf newdemo.tar newdemo
[schung@localhost Desktop] $ tar --same-owner -xvf newdemo.tar
newdemo/
newdemo/file1
newdemo/file2
[schung@localhost Desktop] $ gomtree -c -p newdemo
#          user: schung
#       machine: localhost.localdomain
#          tree: newdemo
#          date: Mon Jul 18 10:51:43 2016

# .
/set type=file nlink=1 mode=0664 uid=1000 gid=1000
. size=4096 type=dir mode=0775 nlink=2 time=1468853428.000000000
    file1 size=0 time=1468853425.000000000
    file2 size=0 time=1468853428.000000000
..

SIGSEGV: segmentation violation code

Tool Description
OS CentOS 7; Debian 10,11
go-mtree version: gomtree :: 0.5.1-dev

Test case (provoke) gomtree to work on file instead of directory:

#!/bin/sh

echo '#!/bin/sh
echo Hi
' > source.sh

chmod -c 770  source.sh

gomtree -c -K uname,uid,gname,gid,type,nlink,link,mode,flags,xattr,xattrs,size,time,sha256 -p source.sh >manifest.source.sh

Result:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x4ef1a2]

goroutine 1 [running]:
github.com/vbatts/go-mtree.Walk.func1.3({0x54e830, 0xc00010d790}, 0xc000137458, {0x7fff7d28932a, 0x1e}, 0xc000136e60, 0xc00012a200)
	...src/go-mtree/walk.go:208 +0x3a2
github.com/vbatts/go-mtree.Walk.func1({0x7fff7d28932a, 0x1e}, {0x54e830, 0xc00010d790}, {0x0?, 0x0?})
	...src/go-mtree/walk.go:213 +0xb11
github.com/vbatts/go-mtree.walk(0xc000137458, {0x7fff7d28932a, 0x1e}, {0x54e830, 0xc00010d790}, 0xc000137488)
	...src/go-mtree/walk.go:253 +0x73
github.com/vbatts/go-mtree.startWalk(0xc000137458, {0x7fff7d28932a, 0x1e}, 0xc000137488)
	...src/go-mtree/walk.go:248 +0x8d
github.com/vbatts/go-mtree.Walk({0x7fff7d28932a, 0x1e}, {0x630c58, 0x0, 0x0}, {0xc00012c100, 0xd, 0x10}, {0x0, 0x0})
	...src/go-mtree/walk.go:48 +0x1dc
main.app()
	...src/go-mtree/cmd/gomtree/main.go:247 +0xfd3
main.main()
	...src/go-mtree/cmd/gomtree/main.go:39 +0x19

Expected behavior

Original mtree program on such cases throw the error message:

mtree: source.sh: Not a directory

The option of specifying the option `-f` twice is not implemented

I believe this is a feature request.

I have attempted to compare two specification files with
gomtree -f first.txt -f second.txt

The result of this seems that the second option is ignored.

This is the capability that I am referencing:
https://www.freebsd.org/cgi/man.cgi?mtree(8)

-f spec

  	Read the specification from file, instead of from the
  	standard input.

  	If this	option is specified twice, the two specifica-
  	tions are compared to each other rather	than to	the
  	file hierarchy.	 The specifications will be sorted
  	like output generated using -c.	 The output format in
  	this case is somewhat reminiscent of comm(1), having
  	"in first spec only", "in second spec only", and "dif-
  	ferent"	columns, prefixed by zero, one and two TAB
  	characters respectively.  Each entry in	the "differ-
  	ent" column occupies two lines,	one from each specifi-
  	cation.

The addition of this capability would be valuable as part if at all possible.

mtree no longer detects xattr changes

You can test this fairly simply:

% mkdir somedir && touch somedir/file
% gomtree -c -f 1.mtree -p somedir
% xattr -w user.something something somedir/file
% gomtree -f 1.mtree -p somedir
% echo $?
0

This isn't caused by the specKeywords code, it happens even if there are already xattrs (or even if you remove existing xattrs). This broke with v0.4.0.

ACL changes failed veification

Tool Description
OS CentOS 7; Debian 10,11
go-mtree version: gomtree :: 0.5.1-dev

Test case (create file with extended ACL and manifest it):

#!/bin/sh

mkdir -pv ./gomtree-test

echo '#!/bin/sh
echo Hi
' > ./gomtree-test/source.sh

chmod -c 770  ./gomtree-test/source.sh

setfacl -m u:www-data:rwx ./gomtree-test/source.sh   # Set ACL (change www-data to another user if it isn't exists on test machine)

gomtree -c -K uname,uid,gname,gid,type,nlink,link,mode,flags,xattr,xattrs,size,time,sha256 -p ./gomtree-test  >gomtree-test.manifest

Result:

Manifest will reflect changes in ACL

xattr.system.posix_acl_access=AgAAAAEABwD/////AgAHACEAAAAEAAcA/////xAABwD/////IAAAAP////8=

but if ACL will be changed:

### Remove ACL from file
setfacl -b ./gomtree-test/source.sh

verification doesn't alarm the fact that metadata was changed:

cd gomtree-test
gomtree < ../gomtree-test.manifest

checking does not respect path correctly

Version: d5aab78

Expected results (per upstream mtree(8):

vbatts@bananaboat ~ (master) $ mtree -c -k time -p ./bin > bin.mtree
vbatts@bananaboat ~ (master) $ mtree -f ./bin.mtree -p ./bin
vbatts@bananaboat ~ (master) $ echo $?
0

Actual results:

vbatts@bananaboat ~ (master) $ gomtree -c -k time -p ./bin > bin.mtree
vbatts@bananaboat ~ (master) $ gomtree -f ./bin.mtree -p ./bin
2016/06/27 14:19:39 lstat bin: no such file or directory

symlinks with spaces cause validation errors

If you create a symlink like so:

% ln -s "this is a dummy symlink" link
% gomtree -K sha256sum -p . -c >../mtree
% gomtree -K sha256sum -p . -f ../mtree
"link": keyword "link": expected this; got this is a dummy symlink

We should just hash link like we do xattrs and other similar things. You also get parsing errors because we assume that symlinks don't have spaces in them.

[RFC] Implement "unsupported" tag entries

This came up in #48 because implementing generic comparisons means we have to come to terms with the limitations of certain spec generators. In particular, because most tar archives don't have a . entry and they don't store directory sizes (see #77) there isn't a sane way of handling this implicitly -- we don't tag a manifest based on where it came from.

A solution that I think would work is if we always set the requested keywords on every object (which would ensure that you don't get Missing errors for keywords in #48). But if a keyword is not supported for that object we can set the value of the object to a special value like \x00 (which is not valid for any keyword).

This would be an extension of BSD's mtree(8), but I think it's a fairly safe one because all of the old code will continue to work (because BSD's mtree(8) doesn't support tar archives anyway).

[BUG] TestTarCompare failure on openbsd

$ uname -a
OpenBSD puffy.attlocal.net 6.0 GENERIC.MP#0 amd64
$ go version
go version go1.7.3 openbsd/amd64

xref #105

=== RUN   TestTarCompare
--- FAIL: TestTarCompare (0.07s)
        compare_test.go:449: FAILURE: diff[0] = {
                  "type": "modified",
                  "path": "tmpfile",
                  "keys": [
                    {
                      "type": "modified",
                      "name": "gid",
                      "old": "0",
                      "new": "1000"
                    }
                  ]
                }
        compare_test.go:449: FAILURE: diff[1] = {
                  "type": "modified",
                  "path": "testdir",
                  "keys": [
                    {
                      "type": "modified",
                      "name": "gid",
                      "old": "0",
                      "new": "1000"
                    }
                  ]
                }
        compare_test.go:449: FAILURE: diff[2] = {
                  "type": "modified",
                  "path": "testdir/anotherfile",
                  "keys": [
                    {
                      "type": "modified",
                      "name": "gid",
                      "old": "0",
                      "new": "1000"
                    }
                  ]
                }
        compare_test.go:456: expected the diff length to be 0, got 3
FAIL
exit status 1
FAIL    github.com/vbatts/go-mtree      0.123s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.