Giter VIP home page Giter VIP logo

collectd-cvmfs's Introduction

Build Status Documentation Status DOI

CernVM-File System (CernVM-FS)

The CernVM-File System provides a scalable, reliable and low-maintenance software distribution service. It was developed to assist High Energy Physics (HEP) collaborations to deploy software on the worldwide-distributed computing infrastructure used to run data processing applications. CernVM-FS is implemented as a POSIX read-only file system in user space (a FUSE module). Files and directories are hosted on standard web servers and mounted in the universal namespace /cvmfs. Internally, CernVM-FS uses content-addressable storage and Merkle trees in order to maintain file data and meta-data. CernVM-FS uses outgoing HTTP connections only, thereby it avoids most of the firewall issues of other network file systems. It transfers data and meta-data on demand and verifies data integrity by cryptographic hashes.

By means of aggressive caching and reduction of latency, CernVM-FS focuses specifically on the software use case. Software usually comprises many small files that are frequently opened and read as a whole. Furthermore, the software use case includes frequent look-ups for files in multiple directories when search paths are examined.

Content is published into /cvmfs by means of dedicated "release manager machines". The release manager machines provide a writable CernVM-FS instance by means of a union file system (e.g., overlayfs) on top of the read-only client. When publishing, the CernVM-FS server tools process new and modified data from the union file system's writable branch and transform the data into the CernVM-FS storage format.

CernVM-FS is actively used by small and large scientific collaborations. In many cases, it replaces package managers and shared software areas on cluster file systems as means to distribute the software used to process experiment data.

Non-exhaustive List of Resources Related to CernVM-FS

  • Official Documentation
    • Aimed at maintainers of CernVM-FS instances, users and developers. Contains many in-depth explanations and a complete list of all (client) configuration parameters in the appendix.
  • Quickstart Guide for Developers
    • Aimed at developers. Includes how to contribute, code style, many short coding examples how to set up a working CernVM-File System, and how to test and debug.
  • cvmfs-contrib
    • Community-contributed packages related to CernVM-FS but not maintained by the CernVM-FS developer team.

How to Get in Touch With Us

  • CernVM Forum For support questions, or problems you encounter.
  • GitHub Issues For bug reporting or feature requests. This issue tracker is used for all CernVM-FS-related repositories.

collectd-cvmfs's People

Contributors

luisfdez avatar traylenator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

luisfdez toteva

collectd-cvmfs's Issues

collectd could get stuck in some cases if autofs hangs while mounting the repository

On SLC6 nodes I have spotted some scenarios where collectd could get stuck while trying to the the MountTime for a cvmfs repository.

In the affected nodes, you would see an autofs in a state like this:

root      7065  0.0  0.0 606552  1316 ?        Ssl  Oct10   1:29 automount --pid-file /var/run/autofs.pid
root     31188  0.0  0.0 111756   544 ?        S    Oct17   0:00  \_ /bin/mount -t cvmfs ilc.desy.de /cvmfs/ilc.desy.de
root     31189  0.0  0.0  16224   748 ?        S    Oct17   0:00      \_ /sbin/mount.cvmfs ilc.desy.de /cvmfs/ilc.desy.de -o rw
cvmfs    31209 99.9  0.0  68504  1056 ?        R    Oct17 9685:47          \_ /usr/bin/cvmfs2 -o rw,fsname=cvmfs2,allow_other,grab_mountpoin

In this state, os.listdir will hang forever and not event collectd will be able to kill it after an interval.

I have tried an alternative implementation using thread to run listdir but the start() call to the thread hangs as well and it cannot reach the next step: to join the thread with a timeout.

After some tests, it was found that the scandir package does a better job, being able to run it in a thread and kill it.

Even if collectd is able to kill scandir after a problematic interval, I think a more sensible approach would be to define a timeout for attempts to mount a given repo. Something like this:

    import scandir
    def async_scandir(self, repo_mountpoint, timeout):
        contents = []
        t = threading.Thread(target=lambda: contents.extend(scandir(repo_mountpoint)))
        t.daemon = True
        t.start()
        t.join(timeout)
        if t.is_alive():
            raise Exception('Scandir timed out')
        return contents

collectd not able to mount cvmfs if selinux is in place

  • Version python2-collectd_cvmfs-1.0.2-1.el7.1
  • CentOS 7
  • cvmfs-2.4.4-1.el7.centos

We will need some extra seliinux permissions to allow collectd service to access cvmfs.

# grep avc /var/log/audit/audit.log | audit2allow   -a
#============= collectd_t ==============
allow collectd_t fusefs_t:dir read;

and probably others once mounted.

The plugin shouldn't redefine the memory type

The current implementation of the plugin is defining a type called memory that clashes with the collectd memory type built-in and it is absolutely unnecessary.

It should be removed.

Symptoms: When the type is redefined, the datasource is changed and memory values are reported with a different datasource name.

crashed cvmfs is not noticed

Steps to reproduce:

cd /cvmfs/alice.cern.ch
# Kill the cvmfs process
kill -9 1234 

Results in:

ls /cvmfs/alice.cern.ch
ls: cannot access /cvmfs/alice.cern.ch: Transport endpoint is not connected

Restart collectd just so we know we have fresh results:

# collectdctl getval lxplus790.cern.ch/cvmfs-alice.cern.ch/mounttime
value=1.172495e-02
# collectdctl getval lxplus790.cern.ch/cvmfs-alice.cern.ch/mountok
value=1.000000e+00

so basically we are not detecting this error state.

The python is:

<ipython-input-3-0950e51420db> in <module>()
----> 1 scandir('/cvmfs/alice.cern.ch')

OSError: [Errno 107] Transport endpoint is not connected: '/cvmfs/alice.cern.ch'

compared to good repository

In [4]: scandir('/cvmfs/cms.cern.ch')                                                                                                                                                        
Out[4]: <scandir.ScandirIterator at 0x7fdeb5cc57b0>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.