Giter VIP home page Giter VIP logo

probes's Introduction

Rucio - Scientific Data Management

Rucio is a software framework that provides functionality to organize, manage, and access large volumes of scientific data using customisable policies. The data can be spread across globally distributed locations and across heterogeneous data centers, uniting different storage and network technologies as a single federated entity. Rucio offers advanced features such as distributed data recovery or adaptive replication, and is highly scalable, modular, and extensible. Rucio has been originally developed to meet the requirements of the high-energy physics experiment ATLAS, and is continuously extended to support LHC experiments and other diverse scientific communities.

Documentation

General information, API/REST description and guides can be found in our documentation or on our webpage.

Try it out

We provide a dockerized environment which serves both as a demo environment and a development environment. It includes all the necessary preconfigured components for multiple storage and transfers developments.

Developers

For information on how to contribute to Rucio, please refer and follow our CONTRIBUTING guidelines. We strongly recommend to use the dockerized environment for development.

Operators

To learn how to deploy and configure Rucio, consult the documentation available online.

Getting Support

If you are looking for support, please contact us via one of our official channels.

probes's People

Contributors

agbogdan avatar arisfkiaras avatar bari12 avatar cserf avatar davidgcameron avatar dchristidis avatar ericvaandering avatar faluchet avatar fernandogarzon avatar gumond avatar hahahannes avatar jwackito avatar mlassnig avatar nikmagini avatar panos512 avatar tbeerman avatar tomasjavurek avatar vigne avatar vingar avatar voetberg avatar vokac avatar wguanicedew avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

probes's Issues

Modify voms collector to ban identities

Motivation

One identity can map to multiple accounts. Therefore banning just one account is not enough.
Identities that correspond to people banned in voms must be removed.

Migrated from JIRA

Common probes to use common queries

Both

https://github.com/rucio/probes/blob/master/common/check_obsolete_replicas

and

https://github.com/rucio/probes/blob/master/common/check_expired_locked_rules

Explicitly reference atlas tables that may or may not be there for all VO's. As these are common probes, this prevents them from running on all vo's.

To fix either:

  • Replace the query with something using the rucio data model (continuing from #109),
  • Update it to not reference specific tables (ex: atlas_rucio.rules -> {schema}.rules)

Probes are hard coded for ATLAS

@dchristidis CMS would like to use some of the ATLAS probes and I've verified that at least one of them works perfectly for us except for this line:

FROM atlas_rucio.requests

Can these probes be parameterized so that we can either supply our own string for "atlas_rucio" or leave it off entirely? (Leaving it off works for us)

If you just want to parameterize one of them and leave the rest for us as we adopt them, that's OK. Whatever is easiest for you.

check_lost_files: issues

I noticed the next issues while I was working on improvements of the look of the reports from the check_lost_files script:

  1. Lost files may not be reported or duplicated in the reports.
    The script can take lost-files-info info from two sources: the ready dump of lost files on the web,
    and if it fails (sometimes 404 "not found" error occurs), directly from the rucio. The problem is that these two scenarios provides lost files from two different time intervals.
    The dump gives lost files in the interval Mon-Sun from the previous week, but the rucio request gives files of 7 days from now. If different sources are used, two sequential intervals may intersect (cause to duplicated info) or have a gap between (cause to lost info):
    E.g. the script on 2018-08-08 could not get the dump, so it used last 7days interval from
    [2018-08-01, 2018-08-08]. In previous run 2018-08-01 it used the Mon-Sun dump from
    [2018-07-22, 2018-07-29]. In the next run 2018-08-15 it used Mon-Sun dump from
    [2018-08-06, 2018-08-12] => it means that lost files in [2018-07-30, 2018-08-01] are not reported, lost files in [2018-08-06, 2018-08-08] are double reported.
    Another problem is that the time of selection "from now" is not accurate and may vary a little in different script runs, so lost files on the edges of interval may not be reported/duplicated in the reports even with the same source of lost files lists. This is the reason to review the dump creator on the web also, it may have the same (select from now) issue.

  2. Select optimization. For some reason, the lost files info (from dump and from rucio both) is not the final that is used in the script. After receiving the selection result, the script makes selection itself: not like 'panda.%' and not like '%_sub%' and removes duplicates of the same scope:filenames (the next issue). It would be logical to receive the final lost files list, without extra selections by means of python. This optimization of SQL select can be done without substantial time increase of the request execution time.

  3. Same files with different RSE/Datasets are ignored. The script considers only the first entries of "scope:file_name" in the lost files list. Other entries from different RSE-s or from different datasets wont be included in the e-mail reports. As the reports contain datasets and RSEs, or are split by RSE, it is look like a bug.
    E.g. the tomorrow reports will not contain info about the files:
    data17_13TeV:DAOD_SUSY1.19820405._000357.pool.root.1:BNL-OSG2_DATADISK
    mc16_13TeV:EVNT.19802431._001314.pool.root.1:PIC_DATADISK
    etc
    because it will already contain the next info:
    data17_13TeV:DAOD_SUSY1.19820405._000357.pool.root.1:CERN-PROD_DATADISK
    mc16_13TeV:EVNT.19802431._001314.pool.root.1:UKI-SCOTGRID-GLASGOW_DATADISK
    etc

  4. Keep the history of lost files. Now the script does not care about lost files history. The request to of weekly lost files from rucio takes about 1/2 hour to complete. If this info is needed and not stored somewhere else, then it can make sense to zip and store it by the script.

CRIC probe

Implement a simple probe to pull RSE data from CRIC to rucio

Probes using SQLAlchemy don't work in 1.31

Probes meeting this description generate stack traces with the 1.31 rucio code. They are fine in Rucio 1.30.8

I see we freeze the version of the oracle client in the probes build, so that may be the issue. But need to further investigate.

Probe corrupting ranking value of distances

There seems to be a probe which corrupts the ranking values in the distances table.
The value should not be negative and also not crazily high, but on ATLAS we are seeing ranges from -1000 to +1000

ATLAS : Fix check_site_status

After the introduction of rucio/rucio#5664 , the probe check_site_status needs to be updated to 1.29.0 to use availability_read, availability_write and availability_delete instead of availability. For this the probe needs to be updated to API + python3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.