Giter VIP home page Giter VIP logo

idgrep's Introduction

idgrep

Tool for finding identifiers in files. Supports recursive globbing and boolean arbitrary complex queries.

Examples

  • Find all files that contains cats, dogs or both.

    $ idgrep -- 'cats | dogs'
  • Find all files that contains either

    • fish without cats or dogs
    • cats or dogs without fish
    $ idgrep -- 'fish ^ (cats | dogs)'
  • List identifier count in all small (< 200 bytes) files

    $ idgrep -l 200 --file-id-count
  • List all identifiers beginning with a or A, sort by size.

    $ idgrep -i --sort-by-size -- 'a*'
  • Find all python (.py) files in current directory without descending into directories, looking for cat.

    $ idgrep -p '*.py' -- cat
  • Find all python (.py) files in current directory and descending into directories, looking for anything not being cat.

    $ idgrep -p '**/*.py' -- '~cat'
  • Find files with identifiers including data, group by identifier

    $ idgrep --group-by-id -- '*data*'

Notes about shells

Note that in many shells the following characters needs escaping: !, (, ), &, |, *, ?

The easiest option is to put the entire query within single (') or double (") quotation marks.

Usage

usage: idgrep [-p FILE_PATTERN] [-l LIMIT_SIZE] [-d LIMIT_IDENTIFIERS] [-i]
              [--help | --file-id-count] [--group-by-id]
              [--sort-by-name | --sort-by-count | --sort-by-size]
              [--ascending | --descending] [paths ...] -- query

positional arguments:
  [paths] -- query

options:
  -p FILE_PATTERN, --file-pattern FILE_PATTERN
  -l LIMIT_SIZE, --limit-size LIMIT_SIZE
  -d LIMIT_IDENTIFIERS, --limit-identifiers LIMIT_IDENTIFIERS
  -i
  --help
  --file-id-count
  --group-by-id
  --sort-by-name
  --sort-by-count
  --sort-by-size
  --ascending, --asc
  --descending, --desc

Details

-p | --file-pattern

The file pattern is processed using the glob1 function of pathlib.

-l | --limit-size

Sets a maximum size limit for files to prevent processing large files. This is enabled by default and set to one megabyte (1M).

The following suffixes are recognized:

suffix size
k 2¹⁰ bytes
m 2²⁰ bytes
g 2³⁰ bytes
t 2⁴⁰ bytes

-d | --limit-identifiers

Sets the maximum identifier count limit for files to prevent processing files with too many identifiers. This is enabled by default and set to 1k.

The following suffixes are recognized:

suffix count
k 10³ identifiers
m 10⁶ identifiers

-i

Ignore case in matching.

--help

Shows the output shown above under Usage.

--file-id-count

Counts number of identifiers in all matching files. Any query entered will be ignored.

--group-by-id

Group output by identifier. If a file is matching multiple identifiers it will be listed multiple times.

--sort-by-name | --sort-by-count | --sort-by-size

Select sort key for output.

--ascending | --asc | --descending | --desc

Select sort order.

paths

The paths to perform search in. Currently exclusion is not supported but will be added in a later version.

query

The query is added last after a double dash. This double dash is used to separate file arguments from the query. If you wish to specify a file named double dash, use ./-- for it.

Query Format

  • ~identifier

    Do not match this identifier (logical not)

  • !identifier

    Do not match this identifier (logical not)

  • expr_1 & expr_2

    Match both expr_1 and expr_2 (and)

  • expr_1 | expr_2

    Match either expr_1 or expr_2 or both together (inclusive or)

  • expr_1 ^ expr_2

    Match either expr_1 or expr_2 but not both together (exclusive or)

  • (subexpression)

    Use parenthesis to define subexpressions. This is useful for queries such as:

    (fish | birds) & (chips | bees)

Footnotes

  1. See https://docs.python.org/3/library/pathlib.html?highlight=path#pathlib.Path.glob for more information.

idgrep's People

Contributors

mikael-lovqvist avatar

Stargazers

Johnny Amos avatar

Watchers

 avatar

Forkers

jlamos

idgrep's Issues

Add configuration support

All switches and directories and so on should be configurable via a configuration file that can be specified. This makes it easier to setup project specific search profiles. There should also be a method to have default settings for both system wide and per user configuration.

Add support for tab completion

Here is a note that is related to research that was done into this a long while back:

function _idgrep_complete { IFS=$'\n' COMPREPLY=( $(idgrep --complete $*) ); }; complete -r idgrep; complete -F _idgrep_complete idgrep;```

Explain query language in help

Currently the query language is only explained in README.md, it would be useful to have it included into the help output.

Process indication support

For instance when working in process_directory and have done that for a bit it would be useful to be able to let the user know we are working on it. This should also be configurable since it may mess with terminals depending on how this is implemented.

Decide if and when logical_not returns an empty set

Currently logical_not returns an empty set but it could also return a full set of identifiers. Figure out if this is useful, and if so, add a switch for enabling that behavior. Also make sure whatever decision is being made here is reflected in #1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.