Giter VIP home page Giter VIP logo

sqs-grep's Introduction

AWS SQS grep

npm version Build Status Coverage Status Known Vulnerabilities License: MIT

Powerful command-line tool used to scan thru an AWS SQS queue and find messages matching a certain criteria. It can also delete the matching messages, copy/move them to another SQS queue and publish them to an SNS topic.

Installation

  • Download pre-built binaries here. The sqs-grep tool is distributed as a single executable, so feel free to extract it anywhere and use it from there.
  • If you use NPM, you can also install it using the following command: npm i -g sqs-grep

Features

  • Find messages matching (or NOT matching) a regular expression
  • Search by message attributes
  • Silent mode if you just want to count the number of matched messages
  • Dump matched messages to file, which can later be used for offline processing and archival
  • Move/copy matched messages to another SQS queue
  • Publish matched messages to an SNS topic (or re-publish to the original topic if the message originally came from SNS)
  • Delete matched messages
  • Parallel scan for higher throughput
  • Cross-platform, with pre-built binaries for Linux, MacOS and Windows
  • Supports FIFO queues for both sources and targets
  • Custom processing scripts

Usage examples

Find messages containing the text 'Error' in the body:

$ sqs-grep --queue MyQueue --body "Error"

Find messages NOT containing any three-digit numbers in the body:

$ sqs-grep --queue MyQueue --negate --body "\\d{3}"

Find messages containing a string attribute called 'Error' and that attribute does NOT contain any three-digit numbers in its value:

$ sqs-grep --queue MyQueue --negate --attribute "Error=\\d{3}"

Move all messages from one queue to another

$ sqs-grep --queue MyQueue --moveTo DestQueue --all

Delete all messages containing the text 'Error' in the body

$ sqs-grep --queue MyQueue --delete --body Error

Archives all messages from a queue into a local file, and then later copy them to another queue

$ sqs-grep --queue MyQueue --all --outputFile messages.txt
$ sqs-grep --inputFile messages.txt --all --copyTo TargetQueue

Providing credentials

By default, sqs-grep will read credentials from:

However, you also have the options below to provide credentials:

Prompting for credentials

$ sqs-grep --inputCredentials <other options>
AWS access key id:**************
AWS secret access key:****************************

Using an external credential provider

You can use an external credential provider tool as long as it outputs two separated lines containing the AWS "access key id" and "secret access key" (in that order).

$ get-aws-credentials | sqs-grep --inputCredentials <other options>

Providing credentials in the command-line (not recommended)

This option is simple, but not recommended as the credentials may be easily accessible by other processes

$ sqs-grep --accessKeyId "KEY" --secretAccessKey "SECRET" <other options>

Providing queue names or URLs

The options --queue, --moveTo, and --copyTo all support either a queue name or a queue URL.

If you provide a queue name, the URL will be automatically determined by connecting to the given AWS --region. Using queue URLs allows you to copy or move messages between regions and even accounts (as long as your credentials allow it).

In case you need to copy or move messages between accounts using different access credentials (one for the source and another for the target), you still do it in two separate steps using the --outputFile option (first download all the messages to a local file and then copy them to the target account).

Operation timeout and SQS visibility timeouts

In order to scan through the SQS queue, sqs-grep must set an appropriate "message visibility timeout" when receiving the messages (otherwise, the messages would become visible again in the queue before we finished scanning the queue).

The way that sqs-grep does that is that it will automatically determine a "safe" visibility timeout for each individual receive operation based on the --timeout option (which defaults to 1 minute). This ensures that messages will remain "in-flight" for the shortest possible timeframe that is safe. For example, if you use the default timeout of 1 minute and your scan completes in 40 seconds, you can expect all scanned messages to become visible again in approximately 20 seconds after the scan is completed.

Notice that, if the execution does not finish within the --timeout, sqs-grep will immediately stop the processing with a proper warning message.

Why doesn't sqs-grep immediatelly makes the messages visible again after completing the execution?

Good question! The AWS SQS console does that, for example, so why don't we do the same?

The fact is that sqs-grep was designed to process arbitrarily large SQS queues, and that would require storing receipt handles in memory to then later make the messages visible again. For large queues, this is simply not feasible, as we would need several GB of RAM just for that. Also, making the messages visible again is a billed API call, and it would take some time to execute after the scan is completed, which is also problematic for large queues.

Limitations

All standard SQS Quotas apply to any SQS client, including sqs-grep. The most important quota you should be aware of is the "Messages per queue (in flight)" limit of 120,000 messages for standard queues and 20,000 messages for FIFO queues.

This means that, when scanning for messages without moving or deleting them, you can easily reach this quota on large queues, and the scanning will stop after the quota is reached.

If you really need to scan more messages than the allowed "in-flight quota", you will need to use --moveTo to move the messages to a temporary queue, and then move them back to the original queue after you complete your search. You can also simply delete messages with the --delete flag if that is an option for you.

If you don't specify --moveTo nor --delete, and your source queue is larger than the allowed quota, sqs-grep will stop scanning once it reaches the SQS limit.

Options

$ sqs-grep --help

sqs-grep version 1.15.0

sqs-grep

  Command-line tool used to scan thru an AWS SQS queue and find messages        
  matching a certain criteria                                                   

Main options

  -q, --queue string            Source SQS Queue name or URL                                                  
  -r, --region string           AWS region name                                                               
  -b, --body regexp             Optional regular expression pattern to match the message body                 
  --all                         Matches all messages in the queue (do not filter anything). Setting this flag 
                                overrides --body and --attribute                                              
  -a, --attribute attr=regexp   Matches a message attribute                                                   
                                You can set this option multiple times to match multiple attributes           
  --delete                      Delete matched messages from the queue (use with caution)                     
  --moveTo string               Move matched messages to the given destination queue name or URL              
  --copyTo string               Copy matched messages to the given destination queue name or URL              
  --publishTo topic ARN         Publish matched messages to the given destination SNS topic                   
  --republish                   Republish messages that originated from SNS back to their topic of origin.    
                                This option is typically used together with the --delete option to re-process 
                                "dead-letter queues" from an SNS topic.                                       
                                Messages which are not originated from SNS will be ignored.                   
  --redrive                     Move matched messages from a dead-letter queue (DLQ) back into its original   
                                queue, based on the RedrivePolicy configuration. Only works if the DLQ has a  
                                single source queue configured via RedrivePolicy. This has the same effect as 
                                setting --moveTo, but automatically detects the original queue to move        
                                messages to.                                                                  

Credential options

  -i, --inputCredentials     Input the AWS access key id and secret access key via stdin                   
  --accessKeyId string       AWS access key id (not recommended: use "aws configure" or                    
                             "--inputCredentials" instead)                                                 
  --secretAccessKey string   AWS secret access key (not recommended: use "aws configure" or                
                             "--inputCredentials" instead)                                                 

Other options

  -n, --negate                 Negates the result of the pattern matching                                    
                               (I.e.: to find messages NOT containing a text)                                
  -t, --timeout seconds        Timeout for the whole operation to complete.                                  
                               The message visibility timeout will be calculated based on this value as well 
                               and the elapsed time to ensure that messages become visible again as soon as  
                               possible.                                                                     
  -m, --maxMessages integer    Maximum number of messages to match                                           
  -j, --parallel number        Number of parallel pollers to start (to speed-up the scan)                    
  -s, --silent                 Does not print the message contents (only count them)                         
  -f, --full                   Prints a JSON with the full message content (Body and all MessageAttributes)  
                               By default, only the message body is printed                                  
  --stripAttributes            This option will cause all message attributes to be stripped when moving,     
                               copying and publishing the message (used with --moveTo, --copyTo,             
                               --publishTo, and --republish)                                                 
  -o, --outputFile file        Write matched messages to the given output file instead of the console. Using 
                               this option automatically sets --full to have exact message reproduction,     
                               which can be later used with --inputFile                                      
  --inputFile file             Reads messages from a local file (generated using --outputFile) instead of    
                               from input queue                                                              
  --scriptFile file.js         Uses a custom user-script to process messages. See                            
                               https://github.com/rodrigozr/sqs-grep/blob/master/user-scripts.md             
  -e, --emptyReceives number   Consider the queue fully scanned after this number of consecutive "empty      
                               receives" (default: 5)                                                        
  -w, --wait seconds           Number of seconds to wait after each "empty receive" (default: 0 - do not     
                               wait)                                                                         
  --endpointUrl URL            Use a custom AWS endpoint URL                                                 
  --maxTPS number              Maximum number of messages to process per second (default: no limit)          
  --maxRetries number          Maximum number of retries for failed API calls (default: 3)                   
  --verbose                    Enables verbose logging, which will also log all individual AWS API calls     
  -h, --help                   Prints this help message                                                      
  -v, --version                Prints the application version                                                

Usage examples

  Find messages containing the text 'Error' in the body:                        
  $ sqs-grep --queue MyQueue --body Error                                       
                                                                                
  Find messages NOT containing any three-digit numbers in the body:             
  $ sqs-grep --queue MyQueue --negate --body "\\d{3}"                           
                                                                                
  Find messages containing a string attribute called 'Error' and that attribute 
  does NOT contain any three-digit numbers in its value:                        
  $ sqs-grep --queue MyQueue --negate --attribute "Error=\\d{3}"                
                                                                                
  Move all messages from one queue to another                                   
  $ sqs-grep --queue MyQueue --moveTo DestQueue --all                           
                                                                                
  Delete all messages containing the text 'Error' in the body                   
  $ sqs-grep --queue MyQueue --delete --body Error                              
                                                                                
  Archives all messages from a queue into a local file, and then later copy     
  them to another queue                                                         
  $ sqs-grep --queue MyQueue --all --outputFile messages.txt                    
  $ sqs-grep --inputFile messages.txt --all --copyTo TargetQueue                

Custom script files

sqs-grep supports custom message processing by providing a script file with the --scriptFile option.

See user-scripts.md for additional documentation on that feature.

sqs-grep's People

Contributors

dependabot[bot] avatar galdao avatar gtonioli avatar rodrigozr avatar snyk-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

gar10j97 galdao

sqs-grep's Issues

[FEATURE REQUEST] Match on FIFO Attributes

Is your feature request related to a problem? Please describe.
I would love to be able to filter messages on the MessageGroupId so I can delete only messages in a certain group on a FIFO queue. Unfortunately, the MessageGroupId and MessageDeduplicationId, as well as a handful of other attributes, like SentTimestamp, are treated differently than regular SQS message attributes, so we cannot filter on them. Can we please introduce a new option to allow us to filter on at least the MessageGroupId, if not all of these other attributes?

Describe the solution you'd like
As AWS handles message attributes and these other attributes separately, I think that having another option to allow filtering on these attributes would be ideal. However, the existing message attributes option is named "attribute." I don't think it is worth renaming it to messageAttribute and repurposing the attribute name for this, even though it might better match the distinction AWS makes, as this might break existing scripts or workflows for some users.

One option is to introduce another type of user script that allows customizing the matching code.

Adding messageGroupId, fifoAttributes, or awsAttributes (possibly with a different name) as options might be a better alternative and is my preferred solution.

Describe alternatives you've considered
I think it would be possible to do this via user scripts by temporarily altering the message, but it wouldn't be an elegant solution. A nice easy way to pass in values from the command line would be preferable and potentially used by more people.

Additional context
I don't mind writing the code myself, but I could use help deciding on which option to implement because of the naming dilemma. Thanks for writing this tool!

[FEATURE REQUEST] Logging to stdout makes the tool near unusable with other linux tools

Is your feature request related to a problem? Please describe.
The problem is

Connecting to SQS queue 'xxx' in the 'eu-central-1' region...
This queue has approximately yyyy messages at the moment.
Will match ALL messages in the queue.
Scanning...

is sent to stdout this makes common workflows such as sqs-grep | jq unsuable because data and logging have been mixed

Describe the solution you'd like

send logging to stderr and data to stdout

[BUG] ETIMEDOUT errors are not handled & retried

Describe the bug
I can't find in code error handling / retrying of failed request to sqs api.
sqs-grep just exit after failed request to sqs api.
It has failed for me, after dumping around 9000 of messages - 3 times already.
sqs-grep --queue QUEUE_URL --all --outputFile file.txt --full -j 3 -t 600 Connecting to SQS queue URL QUEUE_URL ... This queue has approximately 44103 messages at the moment. Will match ALL messages in the queue. Scanning... Error: connect ETIMEDOUT AMAZON_REGION_ENDPOINT_IP:443 at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)

To Reproduce
Steps to reproduce the behavior:

  1. Create a queue 'X' with thousands of messages
  2. Run sqs-grep with arguments
    sqs-grep --queue QUEUE_URL --all --outputFile file.txt --full -j 3 -t 600
  3. See error

Expected behavior
Calls to sqs api should be retried for example 3 times with some backof time.

Desktop (please complete the following information):

  • OS: Windows
  • Version Windows 10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.