Giter VIP home page Giter VIP logo

irods_resource_plugin_rados's Introduction

irods_resource_plugin_rados

Cacheless, direct Ceph/rados resource plugin for iRODS.

First presentation at iRODS user group meeting 2014 / Boston, MA - http://www.slideshare.net/mgrawinkel/development-of-the-irods-rados-plugin

TL;DR

  • iadmin mkresc radosResc irados rs-host:/path "cluster_name|pool_name|client_name"
  • No superfluous cache/archive tier
  • Parallel, direct, high performance access to your data!
  • Multiple rados pools from one resource server

Introduction

This iRODS plugin implements a direct access to Ceph/rados in the most efficient manner. Files in the iRODS namespace are mapped to objects in the rados key-blob store. In contrast to other plugins, the irados resource plugin does not need to cache or stage files, but gives you direct and parallel access to data. Internally, the plugin maps the POSIX like open, read, write, seek, unlink, stat, and close calls to the librados client's operations. To fully use the inherent rados cluster parallelity, irods files are split to multiple 4 MB files and uploads of large files open multiple parallel transfer threads.

The plugin assumes that file's ACLs as well as its namespace and metadata is fully managed by iRODS. Rados stores the bytes of the file with the options of the target pool.

For every new file, a unique uuid is generated as the primary access key in rados. This uuid is set as the physical path of the file in the iRODS icat. Files are spread to 4 MB blobs that are named by an incrementing identifier suffix. The first object is called by its uuid and contains extended attributes to store the actual size of the file and the number of blobs that make it up. All following files are named as uuid-1, uuid-2, ...

Requirements

  • Tested on Ubuntu / CentOS
  • Requires iRODS >= 4.0.3

Installation

Currently, there are no prebuilt packages, but Ubuntu 12.04 and CentOS6.5 have been successfully tested.

Prerequisites for Ubuntu:

Follow the steps at http://docs.ceph.com/docs/master/start/quick-start-preflight/#advanced-package-tool-apt to add the official ceph repositories that match your running cluster's version.

sudo apt-get install uuid-dev libssl-dev build-essential
sudo apt-get install librados2 librados-dev
sudo apt-get install -f

Prerequisites for CentOS:

Follow the steps at http://docs.ceph.com/docs/master/start/quick-start-preflight/#rhel-centos to add the official ceph repositories that match your running cluster's version.

yum install librados2 librados2-devel libuuid-devel openssl-devel cmake3 irods-devel irods-externals-clang3.8-0
yum groups install "Development Tools"

Then checkout and build the plugin package on the resource server:

git clone https://github.com/irods/irods_resource_plugin_rados.git
mkdir build_irods_resource_plugin_rados
cd build_irods_resource_plugin_rados
cmake ../irods_resource_plugin_rados # or cmake3 on CentOS
make package

Then install the newly created package via dpkg/gdebi or yum.

Setup

Create an irods pool on ceph, i.e.

ceph osd pool create irods 128 
ceph auth get-or-create client.irods osd 'allow rw pool=irods' mon 'allow r' > /etc/ceph/client.irods.keyring

N.B: 128 is the "Placement Group", see http://docs.ceph.com/docs/mimic/rados/operations/placement-groups/

Copy the key from the newly created keyring and create the ceph config files on the resource server. You can have multiple pools with different clients & capabilities.

touch /etc/irods/irados.config && chown irods: /etc/irods/irados.config && chmod 600 /etc/irods/irados.config

[global]
    mon host = ceph-mon

[client.irods]
        key = AQD7pVhUSMx1JRAA1eqDfSynx4qQBe9DHt79Ow==

[client.irods2]
        key = AQB3xHVUAPS+HxAA6PlML8jmcDMkX+5SP7Y6lw==

The cluster_name, pool_name, and user_name to connect to a rados pool are configured in the resource context on resource creation.

If no context like :/tmp/ is provided, the plugin does not work correctly. Nevertheless, the context is not used at all.

iadmin mkresc radosResc irados rs-host:/path "cluster_name|pool_name|client_name"

Then upload files with:

iput -R radosResc files/

Scale out

All traffic from clients to rados is routed through the resource server. If it becomes a bottleneck, just add more!

iadmin mkresc radosRandomResc random
iadmin mkresc child_01 irados rs-01.local:/path "ceph|poolname|client.irods"
iadmin mkresc child_02 irados rs-02.local:/path "ceph|poolname|client.irods"
...
iadmin addchildtoresc radosRandomResc child_01
iadmin addchildtoresc radosRandomResc child_02
...

The radosRandomResc will then distribute the load over all resource servers.

irods_resource_plugin_rados's People

Contributors

alanking avatar grawinkel avatar kwaegema avatar meatz avatar sviscapi avatar tempoz avatar trel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

irods_resource_plugin_rados's Issues

stacktrace when irados.config is missing

should definitely not stacktrace...

more completely, we should look to remove the need for a separate config file - these values should be in the context string, per plugin instance.

Wrong installation prefix with cmake3 on CentOS 7.6

Hi,

I had a similar issue to this already closed one:

#5

Cmake3 installation prefix is still /usr/local on CentOS 7.6, installing the RADOS plugin to the wrong place (i.e /usr/local/usr/lib...). It's no big deal though, I just removed that prefix and re-compiled the plugin.

Could you please have a look ?

Cheers, Samuel from CINES (Montpellier, France)

https://www.cines.fr/en/

Pre-release activities

  • Bump iRODS version number to latest release (4.2.5 at time of writing)
  • Bump plugin version number
  • Add build/test hooks for CI

Plugin installed to wrong location

I installed the plugin and set up the resources as in the readme on a resource server, but when I try to put a file in, I get:

remote addresses: 157.193.231.206 ERROR: putUtil: put error for /ZONE/home/kwaegema/myfile3, status = -1818000 status = -1818000 INVALID_ACCESS_TO_IMPOSTOR_RESOURCE

On the resource server, I see this is the log

Nov 22 15:25:04 pid:46364 NOTICE: rodsServer Release version rods4.2.1 - API Version d is up
Nov 22 15:25:04 pid:46364 NOTICE: >>> control plane :: listening on port 1248
Nov 22 15:25:20 pid:46418 remote addresses: 157.193.231.206 ERROR: [-]  /home/irodsbuild/irods/server/api/src/rsFileCreate.cpp:220:int _rsFileCreate(rsComm_t *, fileCreateInp_t *, rodsServerHost_t *, fileCreateOut_t **) :  status [INVALID_ACCESS_TO_IMPOSTOR_RESOURCE]  errno [] -- message [UNHANDLED fileCreate for [//home/kwaegema/myfile3]]
        [-]     /home/irodsbuild/irods/server/drivers/src/fileDriver.cpp:38:irods::error fileCreate(rsComm_t *, irods::first_class_object_ptr) :  status [INVALID_ACCESS_TO_IMPOSTOR_RESOURCE]  errno [] -- message [failed to call 'create']
                [-]     /home/irodsbuild/irods/server/core/src/irods_resource_plugin_impostor.cpp:555:static irods::error irods::impostor_resource::report_error(irods::plugin_context &) :  status [INVALID_ACCESS_TO_IMPOSTOR_RESOURCE]  errno [] -- message [NOTE :: Direct Access of Impostor Resource [radosResc1] of Given Type [irados]]

How can I debug more what is going on here?

Thanks!

system librados incompatibility due to clang/gcc differences

Dear maintainer,

We are installing iRODS on Debian Linux. And we want to rely as little as possible external packages. Several packages in the "external" repository are available as standard Debian packages. However, they are compiled with gcc and not clang what makes them incompatible especially with respect to libboost (https://stackoverflow.com/questions/8454329/why-cant-clang-with-libc-in-c0x-mode-link-this-boostprogram-options-examp). Until here, I still could use your "external" software repository, which compile with clang.

Nevertheless, as a backend storage we want to use an existing CEPH storage and have to use the Debian librados, which is compiled with GCC and again this irods_resource_plugin_rados repository here is compiled with clang by default resulting in runtime errors when transfering files (log.txt).

Before I invest more time solving my issue, I'd like to ask if you are you aware of anyone compiling the whole iRODS suits with gcc, which I really would appreciate and if so could you bring me in contact. Or do you plan to provide an external rados package, which will also be compiled with clang?

Thanks and kind regards
Jörg

P.S.: Maybe this issue could also be moved to the irods repository.

Compile error with IRADOS_DEBUG

When DEBUG messages are turned on by defining IRADOS_DEBUG I get a compile time error. Solved by changing as follows: Line 650 in libirados.cpp should be "blob_oid.str().c_str()" instead of "blob_oid.c_str()" in function int get_next_fd().

plugin fails to build on 4-2-stable

The plugin fails to compile against 4-2-stable

root@6b9449c40a68:/build# make package
[ 50%] Building CXX object CMakeFiles/irados.dir/irados/libirados.cpp.o
/irods_resource_plugin_rados/irados/libirados.cpp:1201:48: error: no member named 'is_dirty' in 'irods::physical_object'
                        bool is_dirty = ( itr->is_dirty() != 1 );
                                          ~~~  ^
1 error generated.
CMakeFiles/irados.dir/build.make:62: recipe for target 'CMakeFiles/irados.dir/irados/libirados.cpp.o' failed
make[2]: *** [CMakeFiles/irados.dir/irados/libirados.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/irados.dir/all' failed
make[1]: *** [CMakeFiles/irados.dir/all] Error 2
Makefile:151: recipe for target 'all' failed
make: *** [all] Error 2

can't remove files from radosResc

Hi,
While testing the plugin, I ran into this issue:

[kwaegema@vulpix01 ~]$ iput -R radosRescLum radostest44
[kwaegema@vulpix01 ~]$ irm radostest44 
remote addresses: xxx.xxx.xxx.xxx ERROR: rmUtil: rm error for /UGent/home/kwaegema/radostest44, status = -1803000 status = -1803000 HIERARCHY_ERROR

putting files is no problem, they are successfully written to the ceph pool. Deleting files from other resources are no problem.

client.irads1
	caps: [mon] allow r
	caps: [osd] allow rw pool=irads1

Direct Access of Impostor Resource Error

Following the README, We setup an iCAT and an iRES, both pointing to the same ceph pool (we run Jewell, 10.2.3 if thats helpful) ;

irods@irods-ceph-test-ires:/tmp$ ilsresc
demoResc
irods-ceph-test-iresResource
rrirados:roundrobin
├── irods-ceph-test-ires-radosResc:irados
└── radosResc:irados

..

irods@irods-ceph-test-ires:/tmp$ iadmin lr radosResc
resc_id: 10005
resc_name: radosResc
zone_name: tempZone
resc_type_name: irados
resc_net: irods-ceph-test
resc_def_path: /
free_space: 
free_space_ts Never
resc_info: 
r_comment: 
resc_status: 
create_ts 2017-05-31.14:43:51
modify_ts 2017-05-31.15:51:01
resc_children: 
resc_context: ceph|test-irods|client.irods-test
resc_parent: rrirados
resc_objcount: 2

...

irods@irods-ceph-test-ires:/tmp$ iadmin lr irods-ceph-test-ires-radosResc
resc_id: 10010
resc_name: irods-ceph-test-ires-radosResc
zone_name: tempZone
resc_type_name: irados
resc_net: irods-ceph-test-ires
resc_def_path: /
free_space: 
free_space_ts Never
resc_info: 
r_comment: 
resc_status: 
create_ts 2017-05-31.15:37:40
modify_ts 2017-05-31.15:50:03
resc_children: 
resc_context: ceph|test-irods|client.irods-test
resc_parent: rrirados
resc_objcount: 0

Putting files in goes OK until we try and put a file in that is the same but with a different name;

irods@irods-ceph-test-ires:/tmp$ ils -L
/tempZone/home/rods:
  rods              0 rrirados;radosResc   2147483648 2017-05-31.15:51 & 2Gfile
    sha2:p8dEwTzBAe1mwp9nL5JFVUeInMWGzm1E/naugklY6lE=    generic    862c10b7-213c-4dfd-81ac-8cf1121aced4
  rods              0 rrirados;radosResc          476 2017-05-31.14:43 & VERSION.json
    sha2:bD30R9IqOq5fqrE/iw1KPbwICHd7QjZSxqCxVCLnF0A=    generic    e365021b-36ec-45f4-878c-354698c0656e

...

irods@irods-ceph-test-ires:/tmp$ time iput  -R rrirados /tmp/2Gfile_1 
ERROR: putUtil: put error for /tempZone/home/rods/2Gfile_1, status = -24000 status = -24000 SYS_INTERNAL_NULL_INPUT_ERR
NOTE :: Direct Access of Impostor Resource [irods-ceph-test-ires-radosResc] of Given Type [irados]

real	0m0.128s
user	0m0.010s
sys	0m0.000s

system log is;

Jun  1 08:29:04 pid:14302 ERROR: [-]    iRODS/server/api/src/rsFileCreate.cpp:220:_rsFileCreate :  status [INVALID_ACCESS_TO_IMPOSTOR_RESOURCE]  errno [] -- message [UNHANDLED fileCreate for [//home/rods/2Gfile_1]]
        [-]     iRODS/server/drivers/src/fileDriver.cpp:38:fileCreate :  status [INVALID_ACCESS_TO_IMPOSTOR_RESOURCE]  errno [] -- message [failed to call 'create']
                [-]     iRODS/server/core/src/irods_resource_plugin_impostor.cpp:525:report_error :  status [INVALID_ACCESS_TO_IMPOSTOR_RESOURCE]  errno [] -- message [NOTE :: Direct Access of Impostor Resource [irods-ceph-test-ires-radosResc] of Given Type [irados]]

Jun  1 08:29:04 pid:14302 NOTICE: dataCreate: l3Create of //home/rods/2Gfile_1 failed, status = -1818000
Jun  1 08:29:04 pid:14302 NOTICE: rsDataObjCreate: Internal error
Jun  1 08:29:04 pid:14302 NOTICE: readAndProcClientMsg: received disconnect msg from client
Jun  1 08:29:04 pid:14302 NOTICE: Agent exiting with status = 0

I found this irods-chat post

Both servers have this in their /etc/hosts;

172.27.85.133 irods-ceph-test
172.27.85.183 irods-ceph-test-ires

but those hosts are not present in DNS (this is a virtual environment - our openstack), so could this be a gethostbyname not checking /etc/hosts issue, or something else? I was looking into specifying the names in /etc/irods/hosts_config.json, but couldn't work out the format.

iput/iget of large file sometimes hangs

Running the following sometimes works:

$ iput -vR radosResc /tmp/50MiB.bin foo001
foo001                         50.000 MB | 14.904 sec | 13 thr |  3.355 MB/s
$ ils -l foo
  rods              0 radosResc     52428800 2019-03-13.16:41 & foo
$ iget foo                                                                                                                                                        
$ ls -l foo
-rw-r----- 1 irods irods 52428800 Mar 13 16:41 foo

Most of the time, however, this hangs on the rados_write or rados_read call. The hang seems to be indicative of Ceph no longer responding.

The issue appears to be resolved by acquiring the propmap_guard_ lock before calling rados_write or rados_read. This indicates that the Rados IO context, which is locked before modifying everywhere else in the code, is being affected by other threads.

Please investigate and ensure that proper locking is in place for critical sections of code.

Cmake3 find_package error with iRODS 4.2.4

Hi all,

I had to edit irods_resource_plugin_rados/CMakeLists.txt in order to make it compatible with the latest available iRODS version:

find_package(IRODS 4.2.4 REQUIRED)

But according to the official cmake documentation, (IRODS 4.2.2 REQUIRED) should still be working with any superior version. If I'm not mistaken, "REQUIRED" actually sets the minimun required version of that package...

https://cmake.org/cmake/help/v3.12/manual/cmake-packages.7.html

What am I doing wrong ?

Cheers, Samuel from CINES (Montpellier, France)

https://www.cines.fr/en/

How is data integrity verification handled?

More of a question than a bug report but might turn into a feature request...

TL;DR; how is checksumming expected to work in this plugin, both on upload and using the ichksum command later to verify, given the chunked nature of objects within the resource?

At the moment, as I understand it, an replica of an object is stored in 4MB chunks across the Rados 'bucket'. Therefore, to perform a checksum, the file must be downloaded and reassembled before ichksum can be usefully run against it.

Is that correct? If so, how would ichksum -a be expected to work on a tree with a replication node, meaning that there are more than one copies and one of them is held on the librados back end? Foe that matter, are tools like iscan and ifsck supported?

I can see that irods/irods#2796 would be useful here, but wondering if there were any other thoughts for ways to ensure data integrity without having to read every file back from the bucket!

Cheers

John

undefined symbols

Went to try this out in our 4.2.1 iRODS environment and am getting the following in the irods log

Aug 15 16:44:50 pid:114854 remote addresses: 10.20.29.101, 10.20.29.102 ERROR: [-]	/home/irodsbuild/irods/server/core/src/irods_resource_manager.cpp:669:irods::error irods::resource_manager::process_init_results(genQueryOut_t *) :  status [PLUGIN_ERROR]  errno [] -- message []
	[-]	/home/irodsbuild/irods/server/core/src/irods_resource_manager.cpp:122:irods::error irods::resource_manager::load_resource_plugin(resource_ptr &, const std::string, const std::string, const std::string) :  status [PLUGIN_ERROR]  errno [] -- message []
		[-]	/home/irodsbuild/irods/lib/core/include/irods_load_plugin.hpp:176:irods::error irods::load_plugin(PluginType *&, const std::string &, const std::string &, const std::string &, const std::string &) [PluginType = irods::resource] :  status [PLUGIN_ERROR]  errno [] -- message [failed to open shared object file [/usr/lib/irods/plugins/resources/libirados.so] :: dlerror: is [/usr/lib/irods/plugins/resources/libirados.so: undefined symbol: _ZN5irods8resource9add_childERKSsS2_N5boost10shared_ptrIS0_EE]]

The machine is Ubuntu 14.04 with packages from 4.2.1 of iRODS and 10.2.9 of Ceph.

The irados/Makefile needed to be changed, since the static libirods_client.a file doesn't exist.

#EXTRALIBS = /usr/lib/irods/libirods_client.a -lrados 
EXTRALIBS = -lirods_client  -lrados

Also had to add -std=gnu++11 to GCC in order to allow compilation, due to warning from /usr/include/c++/4.8/bits/c++0x_warning.h about ISO C++ 2011 standards.

Copying the resulting libirados.so to /usr/lib/irods/plugins/resources/libirados.so is fine, but when restarting the irods server or attempting to iput to the resource fails with the above log entry.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.