hpc / mpifileutils Goto Github PK

File utilities designed for scalability and performance.

Home Page: https://hpc.github.io/mpifileutils

License: BSD 3-Clause "New" or "Revised" License

CMake 1.05% Shell 2.66% C 95.20% Python 1.04% Makefile 0.04%

mpifileutils's Introduction

mpiFileUtils

mpiFileUtils provides both a library called libmfu and a suite of MPI-based tools to manage large datasets, which may vary from large directory trees to large files. High-performance computing users often generate large datasets with parallel applications that run with many processes (millions in some cases). However those users are then stuck with single-process tools like cp and rm to manage their datasets. This suite provides MPI-based tools to handle typical jobs like copy, remove, and compare for such datasets, providing speedups of up to 20-30x. It also provides a library that simplifies the creation of new tools or can be used in applications.

Documentation is available on ReadTheDocs.

DAOS Support

mpiFileUtils supports a DAOS backend for dcp, dsync, and dcmp. Custom serialization and deserialization for DAOS containers to and from a POSIX filesystem is provided with daos-serialize and daos-deserialize. Details and usage examples are provided in DAOS Support.

Contributors

We welcome contributions to the project. For details on how to help, see our Contributor Guide

Copyrights

Build Status

The current status of the mpiFileUtils master branch is .

mpifileutils's People

Contributors

Stargazers

Watchers

mpifileutils's Issues

dfind build is broken

dfind isn't compiling on ubuntu 13.10.

dfind.c:81:42: error: ‘struct stat’ has no member named ‘st_mtimespec’
pred_add(pred_newer, (void *)(statbuf.st_mtimespec.tv_sec));

Reporting job progress during DCP transfer

During our data migration from ORNL Spider 1 file system to Spider 2, it is not uncommon to have a transfer job that last more than a day. One issue user/Ops guys wonders is: how much has been transferred so far, and how much is left? Is there a solution to this problem? Thanks.

Update man pages for v0.7

Move dtar to non-experimental

I haven't really looked into it, but is there any reason why dtar is in experimental?

dcmp: support light-weight meta data comparison

Like rsync, let's add a mode to dcmp --sync where we can assume the source and target files are identical if their sizes and mod times are the same, but different otherwise. In this mode, we can avoid reading the full contents of each file.

We could perhaps use this mode by default, again like rsync, and execute the full check if the user throws an additional option.

Determine underlying file systems

Different underlying file systems can require different approaches, especially at scale. We need a way to identify the underlying file system as well as its properties (e.g., default stripe size and width). This will enable optimizations specific to each file system.

For example, we'd like to know whether a target file system is Lustre, GPFS, or NFS, and in the case of Lustre, we'd like to know how many object servers it has. This info will help to maximize parallelism and determine file chunking such that we can mitigate Lustre lock thrashing.

dbcast: make stripe size an option

dbcast takes a parameter that lists the size of the segment at which to slice a file for parallel access. It makes sense to default to the stripe size of a file if the file is on a parallel file system (or otherwise default to some hard coded value). Let's add an option to use this default but allow the user to override it.

dtar build is broken

dtar isn't integrated into autotools yet.

I don't want to rip out the build before moving the tests into the common /test dir.

dbcast: add support to bcast directory recursively

Currently, dbcast broadcasts a single file. Some people want to broadcast entire directories.

A work around is to tar up a directory, bcast the tar file, and then untar in parallel. That works, but it's ugly.

We could walk the directory, and then we'd need the writer rank on each node to recreate the directory structure, from top down. We then need to handle reading and writing files in parallel (efficiently).

Fix up doxygen

Get doxygen working with all utilities.

dcmp: Change -o option to something else (reserve -o) and revisit output

Let's keep -o for writing cache files, need to come up with another option here like -f or --format.

modify dcmp to check file bytes in parallel

Change dcmp to portion large files into pieces and distribute task of checking bytes across processes. Use segmented scan to determine whether all bytes are identical.

dcp: add --dereference option as in cp to copy original files instead of links

Some have requested an option to copy the source file the link points to rather than the link.

Random file generator utility

I just wrote up the docs for a random file generator...

https://github.com/hpc/fileutils/blob/dfilemaker/doc/markdown/dfilemaker.1.md

Can anyone think of any options that I'm missing?

I'm going to be using this mostly for the test suite.

LICENSE file needs to be updated with release info.

Currently, we need some kind of confirmation from the folks at the following locations that this is ok to release.

LANL
LLNL
ORNL
DDN Japan

Please add anyone to this list who may need something added to the LICENSE file before a public release.

man pages need to be written for each tool

the test suite needs some love

We need to have a test suite that can be run by travis-ci easily.

report missing files for items listed on command line

The tools silently ignore missing files on the command line, which can lead the user to believe the command succeeded in updating a file in which they made have had a typo in the pathname. We should print errors or warnings for any items explicitly listed on command line that we can't find, as that's probably a mistake by the user..

dcp: add periodic progress message

Could do this with non-blocking MPI collectives, e.g., every 10 seconds send non-blocking bcast from rank 0 to start a non-blocking all reduce. Use non blocking allreduce to sum files and bytes copied. All procs periodically test outstanding collective calls in their work loops.

dsh: add new write command to save list to file if user forgot to start with --output

drm: don't walk every path if filter option was used

We may not need to walk every path if some can be excluded based on the --match or --exclude options that were specified. For instance, if the user is searching for *.txt files in the current working directory do we need to walk all of the subdirectories?

document procedure for releases

version number scheme
gpg signature
website updates
etc

Tweak dstripe so that it renames target file to source file after copy

Currently the user must provide a temporary name, delete the source, and then rename the target after restriping a file. It would be nice to do this automatically, and this kind of feature will be especially useful when recursively restriping files in a directory tree.

Broken build ?

All -

Clean head from clone, every dependency has been installed to default /usr/local
The build stop at the following line: (the same result from ./buildme scripts). The platform is ubuntu 12.04 LTS and 14.04 LTS ... since no one else is complaining, I am not sure ...

checking for libDTCMP... /usr/local
checking for library containing DTCMP_Init... no
configure: error: couldn't find a suitable libdtcmp, use --with-dtcmp=PATH
make: *** [config.status] Error 1

Should libbayer be an installable library?

Right now libbayer (src/common) is a noinst library. Is there any reason to install this as a real library?

github email hook for notification

hi Adam/Jon,

Right now, by watching a github repo, you get notified when issues are opened or comments are made, but not when a new commit is made to the code repo, which I'd like know to properly rebase the code if needed.

Github provides such as hook to enable "git diff" goes to a mailing list. Do you think you can enable this to a -dev mailing list? I hope I am not the only one who likes to get notified on this :-)

Thanks

Feiyi

If you made a commit, add your name to the AUTHORS file

If you've written code or documentation and have committed it to the repo, please add your name to the AUTHORS file. A single line of code is enough to have your name in the AUTHORS file.

Please list it in alphabetical order.

If you have a first and last name, please keep the file in this format:

Lastname, Firstname [email protected]

Otherwise, put it in whatever format you feel is right.

This issue may be closed after the following folks are listed or have confirmed that they don't want their name listed:

We should have zero warnings in the build.

We should get rid of all warnings with gcc and clang.

Check whether umask is needed in dchmod

I think chmod takes umask into account in some cases. If possible we should try to emulate that.

buildme dependencies failures

I just ran into the following build issues:

--prefix=/ccs/home/fwang2/fileutils/install
./buildme_dependencies: line 71: --prefix=/ccs/home/fwang2/fileutils/install: No such file or directory
'[' 127 -ne 0 ']'
echo 'failed to configure, build, or install libcircle'
failed to configure, build, or install libcircle
exit 1

documentation for contributions

We need documentation for how contributions should be accepted. This will mostly be instructions on how to create a branch, run tests, create a merge request, etc.

When saving a cache file to Lustre use multiple stripes

Currently, we use the default striping when writing a cache file. Let's change this to default to be fully striped if we're writing to a parallel file system, since these files can be large.

Add llapi calls to dstripe report current Lustre striping of file

Let's add a --report option or something like that so that dstripe --report will report the current striping of a file.

fix problem with configure requiring export CC=mpicc in order to detect DTCMP

For the travisci build, we need to export CC=mpicc for configure to detect DTCMP. Otherwise, it fails with undefined references to MPI symbols.

dchmod: add tests for umask in dchmod

dsh: request to filter ls and rm options

For directories that have lots of files, it would be nice to filter entries with wildcarding/regex or allowing the user to limit the number of items printed. We might also provide different sorting methods.

dcp: copy lustre striping params, even if -p not thrown

Currently we only copy lustre stripe params if user adds -p, but it's likely they want this, even if they don't want to update timestamps and permissions. Enable lustre striping by default.

Support --exclude=PATTERN option as found in tar command

People want to run drm and whack all files except those that match some given regular expression

Build error on Cray platform

The following build error seems to be specific to Cray XK login nodes (SUSE-based I hink). I am not sure if their build environment any different, I explicitly switched to GNU tool chain instead of default PGI, and still experience the errors. Any idea?

make[2]: Entering directory /autofs/na3_techint/home/fwang2/fileutils-atlas/build/src/common' mpicc -DHAVE_CONFIG_H -I. -I../../../src/common -I../.. -I/opt/sw/xk6/ompi/1.7.1/sles11.1_gnu4.7.2/include/openmpi/opal/mca/hwloc/hwloc151/hwloc/include -I/opt/sw/xk6/ompi/1.7.1/sles11.1_gnu4.7.2/include/openmpi/opal/mca/event/libevent2019/libevent -I/opt/sw/xk6/ompi/1.7.1/sles11.1_gnu4.7.2/include/openmpi/opal/mca/event/libevent2019/libevent/include -I/opt/sw/xk6/ompi/1.7.1/sles11.1_gnu4.7.2/include -I/opt/sw/xk6/ompi/1.7.1/sles11.1_gnu4.7.2/include/openmpi -I/opt/cray/xe-sysroot/4.1.40/usr/include -I/opt/cray/xe-sysroot/4.1.40/usr/include -I/ccs/techint/home/fwang2/fileutils-atlas/install/include/ -std=gnu99 -ggdb -W -pedantic -Wall -Wextra -Wconversion -Wformat=2 -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wuninitialized -Wunknown-pragmas -Wstrict-aliasing -Wfloat-equal -Wundef -Wbad-function-cast -Wcast-qual -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wredundant-decls -Winline -Wdisabled-optimization -Wshadow -Wwrite-strings -I/ccs/techint/home/fwang2/fileutils-atlas/src/common -I/ccs/techint/home/fwang2/fileutils-atlas/install/include -MT libfileutils_common_a-bayer_param_path.o -MD -MP -MF .deps/libfileutils_common_a-bayer_param_path.Tpo -c -o libfileutils_common_a-bayer_param_path.otest -f 'bayer_param_path.c' || echo '../../../src/common/'`bayer_param_path.c
In file included from ../../../src/common/bayer_util.h:17:0,
from ../../../src/common/bayer.h:19,
from ../../../src/common/bayer_param_path.c:1:

cc1: warning: -Wuninitialized is not supported without -O
cc1: warning: -funit-at-a-time is required for inlining of functions that are only called once
In file included from ../../../src/common/bayer_util.h:17,
from ../../../src/common/bayer.h:19,
from ../../../src/common/bayer_param_path.c:1:
/opt/cray/xe-sysroot/4.1.40/usr/include/stdio.h:427: warning: redundant redeclaration of ‘fscanf’
/opt/cray/xe-sysroot/4.1.40/usr/include/stdio.h:430: warning: redundant redeclaration of ‘scanf’
/opt/cray/xe-sysroot/4.1.40/usr/include/stdio.h:432: warning: redundant redeclaration of ‘sscanf’
/opt/cray/xe-sysroot/4.1.40/usr/include/stdio.h:478: warning: redundant redeclaration of ‘vfscanf’
/opt/cray/xe-sysroot/4.1.40/usr/include/stdio.h:483: warning: redundant redeclaration of ‘vscanf’
/opt/cray/xe-sysroot/4.1.40/usr/include/stdio.h:486: warning: redundant redeclaration of ‘vsscanf’
../../../src/common/bayer_param_path.c:12: warning: function declaration isn’t a prototype
../../../src/common/bayer_param_path.c: In function ‘bayer_stat_pack’:
../../../src/common/bayer_param_path.c:38: warning: conversion to ‘size_t’ from ‘long int’ may change the sign of the result
../../../src/common/bayer_param_path.c: In function ‘bayer_stat_unpack’:
../../../src/common/bayer_param_path.c:72: error: ‘blksize_t’ undeclared (first use in this function)
../../../src/common/bayer_param_path.c:72: error: (Each undeclared identifier is reported only once
../../../src/common/bayer_param_path.c:72: error: for each function it appears in.)
../../../src/common/bayer_param_path.c:72: error: expected ‘;’ before ‘val’
../../../src/common/bayer_param_path.c:95: warning: conversion to ‘size_t’ from ‘long int’ may change the sign of the result
make[2]: *** [libfileutils_common_a-bayer_param_path.o] Error 1
make[2]: Leaving directory /autofs/na3_techint/home/fwang2/fileutils-atlas/build/src/common' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory/autofs/na3_techint/home/fwang2/fileutils-atlas/build/src'
make: *** [install-recursive] Error 1

an rpm spec file needs to be written

An RPM spec file will be necessary at some point, but everything is changing so fast it doesn't make sense right now.

Create spack package

Get mpifileutils in spack

How to set it up

I've cloned the repo into a Linux red hat machine. I do not know how to set it up and make it run.

Please advise. or point me to some documentation.

Thanks,

Creation of v0.0.1-alpha.2 release

Please use this issue to manage the release v0.0.1-alpha.2.

The bar for a pre-release does not have to be high.

"because I want to" is probably valid enough reason for a pre-release.

mpirun not specified for dcp documentation example

mpirun/mpiexec should be specified with a sane example in the dcp documentation

all tools need to add tests to nose2

Some tools have tests already added to nose2, but most don't. And even the ones that do need more to reflect updated functionality.

update mpiFileUtils spackage to v0.7

Install tools on CORAL EA systems

Verify that mpiFileUtils builds and runs on CORAL EA systems (like ray).

Then prepare a public install in either /usr/global or via a TCE package using 0.6 release.

Update package for 0.7 release when it's ready.

Help users find and restripe files in dstripe

Recursively search directory tree, report, and optionally restripe:

any files that are larger than some threshold that only live on one stripe
any directories in which there are many smallish files which are not well balanced across OSTs -- consider a checkpoint directory in which each process writes a file, roughly some percentage of memory. For this set of files, we should likey ensure each file is on one OST and make sure the set of files in the directory are well balanced across the available OSTs

It'd be nice if dstripe can help users detect and fix up these problems if they exist.

libmfu: fix mfu_path to properly handle root directory

/g not recognized as child of /
don't allow user to pop past / with .. in relative path

libmfu: Develop cache file format using variable length fields

The current cache file format uses fixed length records. It encodes the full path to each file, and to produce a fixed length field, it uses the longest path name of any file. So when writing the cache file for a large number of files in which there may be a really long file name, the output file size is inflated.

To conserve space, let's store this using variable length records. To support that efficiently, we'll either need a table in the header describing the offset of each record, or we'll need to add some end-of-record marker between records.

hpc / mpifileutils Goto Github PK

mpifileutils's Introduction

mpiFileUtils

DAOS Support

Contributors

Copyrights

Build Status

mpifileutils's People

Contributors

Stargazers

Watchers

Forkers

mpifileutils's Issues

Recommend Projects

Recommend Topics

Recommend Org