Giter VIP home page Giter VIP logo

hpc-io / pdc Goto Github PK

View Code? Open in Web Editor NEW
13.0 8.0 12.0 7.99 MB

Proactive Data Containers (PDC) software provides an object-centric API and a runtime system with a set of data object management services. These services allow placing data in the memory and storage hierarchy, performing data movement asynchronously, and providing scalable metadata operations to find data objects.

Home Page: https://pdc.readthedocs.io

License: Other

C 95.62% Shell 1.25% CMake 2.48% Python 0.29% Dockerfile 0.25% Julia 0.11%
pdc data-management object-centric runtime-system

pdc's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdc's Issues

Lustre parameters auto-configuration

Currently some of the Lustre parameters like stripe size and stripe count are configured in a static way that is specific to Cori.
We should be able to automatically detect the system information (.e.,g number of OSTs) and configure those parameters dynamically.

Server race condition when data exceeds cache size limit

The bug was initially discovered in the execution of the llsm_importer with server cache functionality being enabled on a maximum 1.5GB server cache setting. The server cache flush background thread and the main thread has a race condition on the mutex, causing a deadlock.
Increasing the cache limit to 32GB is the current workaround to run llsm_importer.

obj_round_robin_io_all.c test code

lines 111-113 look incorrect

111 mydata = (char **)malloc(size * WRITE_REQ_SIZE);
112 mydata[0] = (char *)malloc(my_data_size * type_size);
113 mydata[1] = mydata[0] + my_data_size * type_size;

Cannot access metadata for objects created remotely

This issue comes with a MPI test. See https://github.com/hpc-io/pdc/blob/qiao_develop/src/tests/open_obj_round_robin.c. You only need two client processes to reproduce this issue. Processes collectively create a container. Then they open objects created by other processes in a round-robin way. The open and close are successful, but rank 1 cannot get metadata for objects created by rank 0 with error "# PDC_obj_get_info(): cannot locate object".
To reproduce: use command "./mpi_test.sh ./open_obj_round_robin mpiexec 2 2" after installing tests.

cmake find Mercury and MPI

We have two FindMercury files in cmake. However, in a fresh Ubuntu install, neither seem to be working as they should. I got no configuration errors but during build time it does not find the mercury.h and mpi.h headers. We should fix this so that the headers are checked during configure time and handle that appropriately.

@zhangwei217245 I saw you created this second find Mercury file. Could you please confirm why we have two there?

Segmentation falut on server when runinng the same client multiple times

I was testing a simple Montage workflow but consistently getting a seg fault on the server side.
The issue happens when running the same client multiple times but with an increasing number of processes.

I was able to reproduce the bug with a simple example, e.g., the pdc_init program.
Here's how to reproduce the bug:

srun -N 1 -n 1 -c 2 --mem=25600 --cpu_bind=cores --gres=craynetwork:1 --overlap ./bin/pdc_server.exe &
srun -N 1 -n 2 -c 2 --mem=25600 --cpu_bind=cores --gres=craynetwork:1 --overlap ./examples/pdc_init
srun -N 1 -n 4 -c 2 --mem=25600 --cpu_bind=cores --gres=craynetwork:1 --overlap ./examples/pdc_init

The first pdc_Init runs on 2 processes, then it runs on 4 processes. The second run will always crash the server.
The pdc_init just opens and then closes the pdc connect. It doesn't do any I/O. The bug seems to be triggered at the PDCinit() time.
It works fine if you run it multiple times with the same number of processes.

I tried this both on Cori and Catalyst, both have the same issue. So I think there's a bug on the server side. I spent quite some time but couldn't figure out where's the issue.

A few cmake issues

  1. "-Wimplicit-fallthrough=3" is not compatible with older gcc versions.
  • It will report this error gcc: error: unrecognized command line option '-Wimplicit-fallthrough=3'
  1. cmake .. -DCMAKE_INSTALL_PREFIX=/path/to/install
  • CMAKE_INSTALL_PREFIX option seems does not work, bin/lib/include are still installed in the build directory.
  1. close_server binary is not built unless BUILD_TESTING is set to ON.
  • close_server and pdc_server should always be built together
  • Suggestion: use pdc_server start and pdc_server close in stead of a dedicated close_server command.
  1. PDC_DISABLE_CHECKPOINT=ON does not disable checkpointing.
  • Is it better to name it PDC_ENABLE_CHECKPOINT to make it consistency with other options such as: PDC_ENABLE_APP_CLOSE_SERVER, PDC_ENABLE_FASTBIT:BOO, PDC_ENABLE_LUSTRE, etc.

DART Integration

DART Integration

Desc: PR should include complete feature of DART, including the benchmarks and test cases of DART.

The benchmark/test cases that will be included:

  1. dart_func_test : a program to test the basic functionality of DART
  2. dart_test: a program to run the test cases we had back in 2018 for DART paper
  3. dart_attr_dist_test: a program to run the test cases for DART queries against different attribute-object associativity.
  4. dart_sim: a program that will simulate the DART indexing procedure and provide basic analysis on index distribution.

Issues: unsafe compiler warnings from pdc_tools

  1. server cache size cannot be set via CMake correctly (DONE by #142)
  2. unsafe compiler warnings from pdc_tools:
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c: In function 'scan_group':
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:384:26: warning: declaration of 'i' shadows a previous local [-Wshadow]
  384 |                 for (int i = 0; i < container_names->length; i++) {
      |                          ^
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:325:13: note: shadowed declaration is here
  325 |     int     i;
      |             ^
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c: In function 'do_dset':
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:431:40: warning: declaration of 'size' shadows a global declaration [-Wshadow]
  431 |     uint64_t               offset[10], size[10];
      |                                        ^~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:77:19: note: shadowed declaration is here
   77 | int     rank = 0, size = 1;
      |                   ^~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:487:15: warning: ordered comparison of pointer with integer zero [-Wextra]
  487 |     if (check > 0) {
      |               ^
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c: In function 'do_dtype':
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:607:17: warning: declaration of 'size' shadows a global declaration [-Wshadow]
  607 |     hsize_t     size, attr_len;
      |                 ^~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:77:19: note: shadowed declaration is here
   77 | int     rank = 0, size = 1;
      |                   ^~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:663:1: warning: control reaches end of non-void function [-Wreturn-type]
  663 | }
      | ^
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c: In function 'add_tag':
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:111:5: warning: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Wstringop-truncation]
  111 |     strncpy(tags_ptr_g, str, str_len);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_import.c:110:15: note: length computed here
  110 |     str_len = strlen(str);
      |               ^~~~~~~~~~~
[ 98%] Linking C executable ../../bin/kvtag_query_mpi
[ 98%] Built target delete_obj
[100%] Linking C executable ../../bin/region_transfer_all_append
[100%] Built target region_transfer_status
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_ls.c: In function 'pdc_ls':
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_ls.c:690:22: warning: declaration of 'buf' shadows a previous local [-Wshadow]
  690 |                 char buf[12];
      |                      ^~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_ls.c:681:18: note: shadowed declaration is here
  681 |             char buf[12];
      |                  ^~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_ls.c:728:22: warning: declaration of 'buf' shadows a previous local [-Wshadow]
  728 |                 char buf[12];
      |                      ^~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_ls.c:719:18: note: shadowed declaration is here
  719 |             char buf[12];
      |                  ^~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_ls.c:771:21: warning: statement with no effect [-Wunused-value]
  771 |                 dims[cur_region->ndim];
      |                 ~~~~^~~~~~~~~~~~~~~~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_ls.c:805:66: warning: passing argument 1 of 'cJSON_CreateStringArray' from incompatible pointer type [-Wincompatible-pointer-types]
  805 |         cJSON *all_names_json = cJSON_CreateStringArray(obj_names->items, obj_names->length);
      |                                                         ~~~~~~~~~^~~~~~~
      |                                                                  |
      |                                                                  char **
In file included from /global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_ls.c:13:
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/cjson/cJSON.h:225:66: note: expected 'const char * const*' but argument is of type 'char **'
  225 | CJSON_PUBLIC(cJSON *) cJSON_CreateStringArray(const char *const *strings, int count);
      |                                               ~~~~~~~~~~~~~~~~~~~^~~~~~~
[100%] Linking C executable ../../bin/pdc_import
[100%] Linking C executable ../../bin/kvtag_query_scale
[100%] Built target region_transfer_3D
[100%] Built target region_transfer_overlap_2D
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_export.c: In function 'pdc_ls':
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_export.c:695:18: warning: declaration of 'buf' shadows a previous local [-Wshadow]
  695 |             char buf[20];
      |                  ^~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_export.c:676:14: note: shadowed declaration is here
  676 |         char buf[20];
      |              ^~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_export.c:707:23: warning: declaration of 'cur_group_id' shadows a previous local [-Wshadow]
  707 |                 hid_t cur_group_id = H5Gopen(file_id, cur_path, H5P_DEFAULT);
      |                       ^~~~~~~~~~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_export.c:702:19: note: shadowed declaration is here
  702 |             hid_t cur_group_id = group_id;
      |                   ^~~~~~~~~~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_export.c:729:30: warning: declaration of 'size' shadows a global declaration [-Wshadow]
  729 |         uint64_t offset[10], size[10];
      |                              ^~~~
/global/cfs/cdirs/m2621/wzhang5/perlmutter/source/pdc/src/tools/pdc_export.c:27:19: note: shadowed declaration is here
   27 | int     rank = 0, size = 1;
      |                   ^~~~
[100%] Built target region_transfer_all_3D

object write fails with create_obj_mpi for some PDC data types.

The corresponding test is write_obj_shared. Multiple processes can write objects to a single shared file. Even with only 1 object, the test fails with the input datatype is not 4 byte lenght (such as uint_64 and double). The errors message specify that the lock release call has incompatible datatype. When create_obj_mpi is replaced with regular create_obj function, this bug disappear.

(base) qkang@data6:~/pdc_develop/pdc/src/build/bin$ ./mpi_test.sh ./write_obj_shared mpiexec 1 1 o 1 double
Input arguments are the followings
o 1 double
testing: ./write_obj_shared
mpiexec -n 1 ./pdc_server.exe &

==PDC_SERVER[0]: using [./pdc_tmp/] as tmp dir. 0 OSTs per data file, 0% to BB
==PDC_SERVER[0]: using ofi+tcp
==PDC_SERVER[0]: without multi-thread!
==PDC_SERVER[0]: Read cache enabled!
==PDC_SERVER[0]: Successfully established connection to 0 other PDC servers
==PDC_SERVER[0]: Server ready!

mpiexec -n 1 ./write_obj_shared o 1 double
Writing a 1 MB object [o] with 1 clients.
==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 1 PDC clients
==PDC_CLIENT: using ofi+tcp
==PDC_CLIENT[0]: Client lookup all servers at start time!
==PDC_CLIENT[0]: using [./pdc_tmp] as tmp dir, 1 clients per server
create a new pdc
my_data_size at rank 0 is 1048576
rank 0 offset = 0, length = 1048576, unit size = 8
Failed to release lock for region
Error in /home/qkang/pdc_develop/pdc/src/api/pdc_client_server_common.c:2364

region_release_cb(): ===PDC SERVER: HG_TEST_RPC_CB(region_release, handle) local and remote bulk size does not match

==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 1 PDC clients
==PDC_CLIENT: using ofi+tcp
==PDC_CLIENT[0]: Client lookup all servers at start time!
==PDC_CLIENT[0]: using [./pdc_tmp] as tmp dir, 1 clients per server
(base) qkang@data6:~/pdc_develop/pdc/src/build/bin$ ==PDC_SERVER[0]: error with HG_Context_destroy

PDC spack recipes not working on MacOS

PDC spack recipes are not working on MacOS. I got different fails for stable, develop, and 0.3.

For 0.3: ./spack install [email protected] ^libfabric fabrics=sockets,tcp,udp,rxm

  >> 171    /private/var/folders/rk/35_wlhq12b35gv8ch8z58cqc0000gn/T/jlbez/spack-stage/spack-stage-pdc-0.3-2tqeinvjd7s2kcpy7rhvrgru6qtc52bz/spack-src/src/server/pdc_server.c:782:45: error: use of undeclared identifier 'NA_TR
            UE'
     172        *hg_class = HG_Init_opt(na_info_string, NA_TRUE, &init_info);
     173                                                ^
     174    1 error generated.
  >> 175    make[2]: *** [server/CMakeFiles/pdc_server.exe.dir/build.make:79: server/CMakeFiles/pdc_server.exe.dir/pdc_server.c.o] Error 1

For develop: ./spack install pdc@develop ^libfabric fabrics=sockets,tcp,udp,rxm
For stable: ./spack install pdc@stable ^libfabric fabrics=sockets,tcp,udp,rxm

1 error found in build log:
     3    CMake Warning:
     4      Ignoring extra path from command line:
     5    
     6       "/private/var/folders/rk/35_wlhq12b35gv8ch8z58cqc0000gn/T/jlbez/spack-stage/spack-stage-pdc-develop-fi7q3uihetwodksemxnskuqzlhmx5jxx/spack-src/src"
     7    
     8    
  >> 9    CMake Error: The source directory "/private/var/folders/rk/35_wlhq12b35gv8ch8z58cqc0000gn/T/jlbez/spack-stage/spack-stage-pdc-develop-fi7q3uihetwodksemxnskuqzlhmx5jxx/spack-src/src" does not appear to contain CMake
          Lists.txt.

Incorrect path in pdc-config.cmake to pdc-targets.cmake

In the current version of the develop branch, the current auto generated path to pdc-targets.cmake is incorrect and points to where pdc-targets.cmake is located in the current stable branch. The current path is include(${SELF_DIR}/api/pdc-targets.cmake) but it should be include(${SELF_DIR}/src/api/pdc-targets.cmake). This issue results in external projects that depend on an already built PDC to error when compiling.

PDCpy requires some header files that are not installed by PDC

List of header files that PDC doesn't install and the structs and functions needed by PDCpy:

  • pdc_id_pkg.h: _pdc_id_info
  • pdc_cont_pkg.h: _pdc_cont_info
  • pdc_obj_pkg.h: _pdc_obj_info
  • pdc_prop_pkg.h: _pdc_obj_prop, _pdc_cont_prop, PDC_obj_prop_get_info
  • pdc_malloc.h: PDC_free
  • pdc_private.h: _pdc_class

Lock release for write fails on Cori with more than 1 client

To reproduce, follow my steps to install PDC on Cori. Then goto the bin folder in the install folder. Run the following. You can see the segmentation fault.
./mpi_test.sh ./write_obj srun 1 2 o 1 int

qkt561@nid00009:/global/cscratch1/sd/qkt561/FS_1M_169/bin> ./mpi_test.sh ./write_obj srun 1 2 o 1 int
Input arguments are the followings
o 1 int
testing: ./write_obj
srun -n 1 ./pdc_server.exe &
srun -n 2 ./write_obj o 1 int
Writing a 1 MB object [o_0] with 2 clients.
Writing a 1 MB object [o_1] with 2 clients.
==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Config file from default location [./pdc_tmp/server.cfg] not available, waiting 1 seconds
==PDC_CLIENT[0]: Config file from default location [./pdc_tmp/server.cfg] not available, waiting 2 seconds
==PDC_CLIENT[0]: Config file from default location [./pdc_tmp/server.cfg] not available, waiting 4 seconds
==PDC_CLIENT[0]: Config file from default location [./pdc_tmp/server.cfg] not available, waiting 8 seconds
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process

==PDC_SERVER[0]: using [./pdc_tmp/] as tmp dir. 0 OSTs per data file, 0% to BB
==PDC_SERVER[0]: using ofi+tcp
==PDC_SERVER[0]: without multi-thread!
==PDC_SERVER[0]: Read cache enabled!
==PDC_SERVER[0]: Successfully established connection to 0 other PDC servers
==PDC_SERVER[0]: Server ready!


==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 2 PDC clients
==PDC_CLIENT: using ofi+tcp
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
==PDC_CLIENT[0]: Client lookup all servers at start time!
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
==PDC_CLIENT[0]: using [./pdc_tmp] as tmp dir, 2 clients per server
create a new pdc
create a new pdc
my_data_size at rank 0 is 524288
my_data_size at rank 1 is 524288
rank 0 offset = 0, length = 524288, unit size = 4
rank 1 offset = 524288, length = 524288, unit size = 4
Error in /global/homes/q/qkt561/test_install/pdc/src/api/pdc_client_server_common.c:1946
 # buf_map_region_release_bulk_transfer_cb(): Error in region_release_bulk_transfer_cb()
srun: error: nid00009: task 0: Segmentation fault
srun: Terminating job step 40407325.8
srun: error: nid00009: task 1: Segmentation fault
srun: Terminating job step 40407325.7
slurmstepd: error: *** STEP 40407325.7 ON nid00009 CANCELLED AT 2021-03-07T18:38:44 ***
srun: error: nid00009: task 0: Killed
srun: Force Terminated job step 40407325.7
==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 1 PDC clients
==PDC_CLIENT: using ofi+tcp
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
==PDC_CLIENT[0]: Client lookup all servers at start time!
srun: error: nid00009: task 0: Segmentation fault
srun: Terminating job step 40407325.9

PDC installation in MacOS from source gives mercury related errors

PDC develop/stable version in MacOS Sonoma:

[ 10%] Built target pdc_commons
[ 11%] Built target dart_core_test
[ 12%] Built target query_utils_test
[ 13%] Linking C shared library ../../bin/libpdc.dylib
Undefined symbols for architecture x86_64:
  "_hg_thread_mutex_init", referenced from:
      _PDC_register_type in pdc_interface.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [bin/libpdc.0.1.0.dylib] Error 1
make[1]: *** [src/api/CMakeFiles/pdc.dir/all] Error 2
make: *** [all] Error 2

Getting a non-existent container returns a non-null value

code to reproduce:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pdc.h"

int
main(int argc, char **argv)
{
    pdcid_t pdc_id, cont_id, cont_prop, cont_id2;
    int     rank = 0, size = 1;
    int     ret_value = 0;

    // create a pdc
#ifdef ENABLE_MPI
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
#endif

    pdc_id = PDCinit("pdc");
    cont_prop = PDCprop_create(PDC_CONT_CREATE, pdc_id);
    if (cont_prop <= 0)
        printf("Fail to create container property @ line  %d!\n", __LINE__);

    /*
    // create a container
    cont_id = PDCcont_create("VPIC_cont", cont_prop);
    if (cont_id <= 0)
        printf("Fail to create container @ line  %d!\n", __LINE__);

    if (PDCcont_del(cont_id) != 0)
        printf("Fail to delete container @ line  %d!\n", __LINE__);
    
    if (PDCcont_close(cont_id) != 0)
        printf("Fail to close container @ line  %d!\n", __LINE__);
    */
    
    cont_id = PDCcont_open("VPIC_cont", pdc_id);
    if (cont_id <= 0)
        printf("Fail to open container @ line  %d!\n", __LINE__);
    
#ifdef ENABLE_MPI
    MPI_Finalize();
#endif
    return ret_value;
}

uncomment the multi-line comment to test the get-after-delete scenario

Restart with different number of servers

Currently, PDC can only restart with the same (or more) number of servers as the previous run when the checkpoint files are written. Need to support more flexibility with the restart.

HDF5 cannot be found for pdc_import.c and pdc_export.c when BUILD_TOOLS=ON

When compiling PDC project, if BUILD_TOOLS is set to ON, the compilation of PDC may fail since HDF5 is not able to be found. Proper use of FindHDF5 needs to be included in the CMakeLists.txt under src/tools so that those two artifacts gets compiled.

Command tried:

cmake -DBUILD_MPI_TESTING=ON -DBUILD_SHARED_LIBS=ON -DPDC_SERVER_CACHE=ON -DBUILD_TESTING=ON -DCMAKE_INSTALL_PREFIX=$PDC_DIR -DPDC_ENABLE_MPI=ON -DMERCURY_DIR=$MERCURY_DIR -DCMAKE_C_COMPILER=mpicc -DMPI_RUN_CMD="srun -A m2621 --qos=debug --constraint=cpu --tasks-per-node=64" -DCMAKE_INSTALL_PREFIX=$PDC_DIR -DBUILD_TOOLS=ON ../

SQLite Support for Metadata Tags and Metadata Search

To compare with what we have for metadata indexing, we should compare with SQLite - the database used in many state-of-the-art literatures.

With this integration, we should be able to switch between PDC_native_metadata, PDC_rocksdb_metadata and PDC_sqlite_metadata options.

The local unit test should be able to test against 3 different functions. With proper switch enabling the support of SQLite, the unit test on SQLite should be enabled as well.

There would be no need for another client-side test for SQLite feature as eventually these metadata storage supports will become alternatives to each other and hence sharing the same API.

What will be included:

  1. SQLite implementation of metadata and relative metadata search functionality
  2. Runtime option for metadata & metadata search functions to call different implementations (passed by RPC parameters)
  3. CMake support for enabling/disabling SQLite (possibly RocksDB). If runtime flag is calling a disabled support, just print do not support and return empty result.
  4. Documentation on How metadata is stored in SQLite and how metadata search is performed, including: where the db file is stored: tmpfs, how metadata tag is stored: (table creation statement) . How metadata search is done: "query statement".

Attempting to create a container with a duplicate name will return the existing container's id instead of returning 0

code to reproduce:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pdc.h"

int
main(int argc, char **argv)
{
    pdcid_t pdc_id, cont_id, cont_prop, cont_id2;
    int     rank = 0, size = 1;
    int     ret_value = 0;

    // create a pdc
#ifdef ENABLE_MPI
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
#endif

    pdc_id = PDCinit("pdc");
    cont_prop = PDCprop_create(PDC_CONT_CREATE, pdc_id);
    if (cont_prop <= 0)
        printf("Fail to create container property @ line  %d!\n", __LINE__);

    // create a container
    cont_id = PDCcont_create("VPIC_cont", cont_prop);
    if (cont_id <= 0)
        printf("Fail to create container @ line  %d!\n", __LINE__);
    
    cont_id2 = PDCcont_create("VPIC_cont", cont_prop);
    if (cont_id2 <= 0)
        printf("Fail to create container @ line  %d!\n", __LINE__);
    
    PDCclose(pdc_id);
    
#ifdef ENABLE_MPI
    MPI_Finalize();
#endif
    return ret_value;
}

Deleting a container tag segfaults

Code to reproduce:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pdc.h"

int
main(int argc, char **argv)
{
    pdcid_t pdc_id, cont_id, cont_prop, cont_id2;
    int     rank = 0, size = 1;
    int     ret_value = 0;

    // create a pdc
#ifdef ENABLE_MPI
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
#endif

    pdc_id = PDCinit("pdc");
    // create a container
    cont_id = PDCcont_create("cont", cont_prop);
    if (cont_id <= 0)
        printf("Fail to create container @ line  %d!\n", __LINE__);

    //put tag
    if (PDCcont_put_tag(cont_id, "a", "b", 1) != 0)
        printf("Fail to put tag @ line  %d!\n", __LINE__);

    //delete tag
    PDCcont_del_tag(cont_id, "a"); //SEGFAULT HERE

    PDCcont_close(cont_id);
    
    PDCclose(pdc_id);
    
#ifdef ENABLE_MPI
    MPI_Finalize();
#endif
    return ret_value;
}

I think this is because PDCcont_del_tag delegates to PDCobj_del_tag

RocksDB Support for Metadata and Metadata Search

To compare with what we have for metadata indexing, we should compare with RocksDB - the database used in some state-of-the-art literatures.

With this integration, we should be able to switch between PDC_native_metadata and PDC_rocksdb_metadata options

The local unit test should be able to test against 3 different functions. With proper switch enabling the support of RocksDB, the unit test on RocksDB should be enabled as well.

There would be no need for another client-side test for RocksDB feature as eventually these metadata storage supports will become alternatives to each other and hence sharing the same API.

What will be included:

  1. RocksDB implementation of metadata and relative metadata search functionality
  2. Runtime option for metadata & metadata search functions to call different implementations (passed by RPC parameters)
  3. CMake support for enabling/disabling RocksDB . If runtime flag is calling a disabled support, just print do not support and return empty result.
  4. Documentation on How metadata is stored in RocksDB and how metadata search is performed, including: where the db file is stored: tmpfs, how metadata tag is stored: rock_key = ObjID_attrKey rock_value=attrVal. How metadata search is done: a full scan across all key-value records in RocksDB.

Unifying the use of MACROs and APIs in pdc_malloc.h all over the project

In pdc_malloc.h, we provided both APIs and MACROs for memory allocation, but it seems that MACROs are not necessary.

We need to take some time to evaluate whether we should take only APIs or only MACROs when it is in use. Once decided, we need to unify the use of them either in APIs or MACROs, throughout the project.

brute-force implementation of metadata queries

Brute-force implementation of metadata queries

Desc: This implementation will simply send queries to all servers and retrieve the IDs there for given metadata queries. A response merge procedure is needed.

Upon receiving the request, the server program will just when through all objects it holds, and match the kvtags with the queries and collect the matching result.

A Unifying RPC function interface for transferring arbitrary binary buffer

A Unifying RPC function interface for transferring arbitrary binary buffer

Desc:

Currently we are suffering from the RPC function/stub explosion and hence the difficulties of adding new RPC functions/stubs

This task will provide a unifying RPC interface that transfers an arbitrary binary buffer between origin and target (hopefully there will be no need to differentiate between client and server). The developer can simply compile all the data into a big chunk of data and performs the unifying RPC for any purpose.

S3 backend

Include support for multiple backends, including AWS S3

Memory leak in function PDCobj_get_info

The leak is in these lines:

    ret_value = PDC_CALLOC(struct pdc_obj_info);
    if (!ret_value)
        PGOTO_ERROR(NULL, "failed to allocate memory");

    ret_value = tmp->obj_info_pub;

ret_value is allocated, then reassigned

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.