Giter VIP home page Giter VIP logo

hdfs-consume's Introduction

HDFS Consume

The script recursively scans hdfs directories with "hdfs dfs -du" command (you can change this) and prints the list of directories bigger then some threshold (300G by default)

Example run:

python hdfs-consume.py -o consume.out /

Output:

311.38T   /dir1/subdir1/subdir21/subdir111
134.56T   /dir2/subdir2/subdir33/file1.db
125.39T   /dir1/subdir1/subdir22/subdir122
91.57T    /dir2/subdir3/subdir33/file2.db
89.6T     /dir1/subdir1/subdir53/subdir312
78.49T    /dir1/subdir1/subdir31/subdir521
77.85T    /dir1/subdir3/subdir21/subdir192

Arguments:

Using Description Default Required
--threshold THRESHOLD, -t THRESHOLD Minimum dir size to show (in bytes) 322122547200 (300G) no
--depth DEPTH, -d DEPTH Max directory level to scan 3 no
--log FILENAME, -l FILENAME Logfile name /tmp/hdfs-consume.log no
--verbosity, -v Logging level, use -vvvv to debug 1 (ERROR) no
--cmd CMD Command string for running "hdfs -du" as subprocess "sudo -u hdfs hdfs dfs -du" no
--output FILENAME, -o FILENAME Name of file to print result list yes
path Path to start scanning '/' yes

hdfs-consume's People

Watchers

Nikita Kutselev avatar James Cloos avatar

Forkers

pstoll

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.