Giter VIP home page Giter VIP logo

s3s's Introduction

s3s

s3s is a go binary instead of vast-engineering/s3select.

Features

s3s query all files lower than S3 prefix.

Available below:

  • Input JSON to Output JSON
  • Input CSV to Output JSON
  • Input Application Load Balancer Logs to Output JSON
  • Input CloudFront Logs to Output JSON

Usage

$ s3s --help
NAME:
   s3s - Easy S3 select like searching in directories

USAGE:
   s3s [global options] command [command options] [arguments...]

VERSION:
   current

COMMANDS:
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --debug        erorr check for developer (default: false)
   --help, -h     show help
   --version, -v  print the version

   AWS:

   --max-retries value, -M value   max number of api requests to retry (default: 20)
   --region value                  region of target s3 bucket exist (default: ENV["AWS_REGION"])
   --thread-count value, -t value  max number of api requests to concurrently (default: 150)

   Input Format:

   --alb-logs, --alb_logs  (default: false)
   --cf-logs, --cf_logs    (default: false)
   --csv                   (default: false)

   Query:

   --count, -c              max number of results from each key to return (default: false)
   --limit value, -l value  max number of results from each key to return (default: 0)
   --query value, -q value  a query for S3 Select
   --where value, -w value  WHERE part of the query

   Run:

   --delve               like directory move before querying (default: false)
   --dry-run, --dry_run  pre request for s3 select (default: false)

   Target:

   --duration value  from current time if alb or cf (ex: "2h3m") (default: 0s)
   --since value     end at if alb or cf (ex: "2006-01-02 15:04:05")
   --until value     start at if alb or cf (ex: "2006-01-02 15:04:05")

s3s is execution S3 Select from json to json (default).

$ s3s s3://bucket/prefix
{"time":1654848930,"type":"speak"}
{"time":1654848969,"type":"sleep"}

// $ s3s s3://bucket/prefix_A s3://bucket/prefix_B s3://bucket/prefix_C
$ s3s -q 'SELECT * FROM S3Object s WHERE s.type = "speak"' s3://bucket/prefix
{"time":1654848930,"type":"speak"}

// alternate
// $ s3s -w 's.type = "speak"' s3://bucket/prefix

s3s can execute S3 Select from csv to json when --csv option enabled.

// 122, hello
$ s3s s3://bucket/prefix
{"_1":122,"_2":"hello"}

ALB and CF logs support

--alb-logs is a format for Application Load Balancer (ALB). --cf-logs is a format for CloudFront (CF).

Each options are tagging available instead of _1, _2, etc.

And also, --where replace column names to column numbers. But --query does not replace columns for execution raw query.

// below query is same as $ s3s --alb-logs --query="'SELECT * FROM S3Object s WHERE s.`_2` = '2022-09-01T00:00:00.000000Z'" s3://prefix
$ s3s --alb-logs --where="s.`time` = '2022-09-01T00:00:00.000000Z'" s3://prefix
index ALB CF
_1 type date
_2 time time
_3 elb x-edge-location
_4 client:port sc-bytes
_5 target:port c-ip
_6 request_processing_time cs-method
_7 target_processing_time cs(Host)
_8 response_processing_time cs-uri-stem
_9 elb_status_code sc-status
_10 target_status_code cs(Referer)
_11 received_bytes cs(User-Agent)
_12 sent_bytes cs-uri-query
_13 request cs(Cookie)
_14 user_agent x-edge-result-type
_15 ssl_cipher x-edge-request-id
_16 ssl_protocol x-host-header
_17 target_group_arn cs-protocol
_18 trace_id cs-bytes
_19 domain_name time-taken
_20 chosen_cert_arn x-forwarded-for
_21 matched_rule_priority ssl-protocol
_22 request_creation_time ssl-cipher
_23 actions_executed x-edge-response-result-type
_24 redirect_url cs-protocol-version
_25 error_reason fle-status
_26 target:port_list fle-encrypted-fields
_27 target_status_code_list c-port
_28 classification time-to-first-byte
_29 classification_reason x-edge-detailed-result-type
_30 sc-content-type
_31 sc-range-start
_32 sc-range-end

Support log range when alb and cf. time format is 2006-01-02 15:04:05 as UTC.

  • --duration is a duration from now.
  • --since is start time
  • --until is end time

However, s3s stop when you target cloudfront and using --duration or --since only, because s3s hit too many keys.

-delve, like directory move before querying

search from prefix

$ s3s -delve s3://bucket/prefix

search from bucket list

$ s3s -delve
  bucket/prefix/C/
  bucket/prefix/B/
  bucket/prefix/A/        # delve more lower path than this prefix
  Query↵ (s3://bucket/prefix/) # choose and execute s3select this prefix
> ←Back upper path        # back to parent prefix
5/5
>

Querying after Enter.

{"time":1654848930,"type":"speak"}
{"time":1654848969,"type":"sleep"}

...

bucket/prefix/A/ (print path to stderr at end)

s3s's People

Contributors

dependabot[bot] avatar github-actions[bot] avatar koluku avatar mashiike avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

isgasho mashiike

s3s's Issues

ToDo for v1.0.0

compatibility with s3select

  • csv support
  • limit option
  • verbose option
  • count option
  • with_filename option
  • aws profile option
  • max_retries option

fail-safe

  • stop if s3s will scan 10gb over

fast

  • more parallel processing
  • use sync.pool ?
  • print with once flush

useful

  • dry-run option

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.