Giter VIP home page Giter VIP logo

sitestat's Introduction

sitestat

Build Status GoDoc

sitestat tool

sitestat tool designed to catch statistics from various CMS sites. The underlying process follow these steps:

  • Fetch all site names from SiteDB
  • loop over specific time range, e.g. last 3m
    • create dates for that range
  • Use popularity API (DSStatInTImeWindow) to get summary statistics. The API returns various information about dataset usage on sites.
  • Organize data in number of access bins
  • For every bin collect dataset names
  • Call DBS APIs to get dataset statistics via blocksummaries API.
  • sum up info about file_size which will give total size used by specific site.

Here is example of sitestat tool usage

Usage of ./sitestat:
  -bins string
    	Comma separated list of bin values, e.g. 0,1,2,3,4 for naccesses or 0,10,100 for tot cpu metrics
  -blkinfo
    	Use block information for finding statistics, by default use dataset info
  -breakdown string
    	Breakdown report into more details (tier, dataset)
  -chunkSize int
    	chunkSize for processing URLs (default 100)
  -dbsinfo
    	Use DBS to collect dataset information, default use PhEDEx
  -format string
    	Output format type, txt or json (default "txt")
  -metric string
    	Popularity DB metric (NACC, TOTCPU, NUSERS) (default "NACC")
  -pbrdb string
    	Name of PBR db (see PhedexReplicaMonitoring project)
  -phgroup string
    	Phedex group name (default "AnalysisOps")
  -profile
    	profile code
  -site string
    	CMS site name, use T1, T2, T3 to specify all Tier sites
  -tier string
    	Look-up specific data-tier
  -trange string
    	Specify time interval in YYYYMMDD format, e.g 20150101-20150201 or use short notations 1d, 1m, 1y for one day, month, year, respectively (default "1d")
  -verbose int
    	Verbose level, support 0,1,2

Examples

In all examples below we use T2_XX_Abc as a site name.

# list site statistics for last month
sitestat -site T2_XX_Abc -trange 1m

# list site statistics for specific time range
sitestat -site T2_XX_Abc -trange 20150201-20150205

# list site statistics for last 3 months
sitestat -site T2_XX_Abc -trange 3m

# list site statistics for last month and only count AOD data-tier
sitestat -site T2_XX_Abc -trange 1m -tier AOD

# list site statistics for last month with breakdown for all data-tiers
sitestat -site T2_XX_Abc -trange 1m -breakdown tier

# list site statistics for last month with breakdown for all datasets
sitestat -site T2_XX_Abc -trange 1m -breakdown dataset

# list site statistics for last month with breakdown for all data-tiers and look for NUSERS metric
sitestat -site T2_XX_Abc -trange 1m -metric NUSERS -breakdown tier

# by default sitestat relies on PhEDEx data-service to collect
# dataset information on site, but we may use DBS instead
sitestat -site T2_XX_Abc -trange 1m -dbsinfo

# return information in json data format
sitestat -site T2_XX_Abc -trange 1m -format json

Tools

The tools directory contains useful scripts to use PhedexReplicaMonitoring which allows to obtained weighted datasets size on sites from PhEDEx DB by running pbr script from PhedexReplicaMonitoring repository.

  • pbr_avg.sh script can be used to submit Spark job to calculate average size of datasets
  • pbr_db.py script can be used to convert HDFS output from pbr_avg.sh and convert it into SQLiteDB. The later can be used by sitestat tool
  • plot.R an R script to produce size vs bins (#accesses) plot.

sitestat's People

Contributors

vkuznet avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.