Giter VIP home page Giter VIP logo

wt_s3_signer's Introduction

wt_s3_signer Build Status

An optimized AWS S3 URL signer.

Basic usage

s3_bucket = Aws::S3::Bucket.new('shiny-bucket-name')
ttl_seconds = 7 * 24 * 60 * 60

# we suggest caching the S3 client in the application to reuse the cached credentials
s3_client = Aws::S3::Client.new
signer = WT::S3Signer.for_s3_bucket(s3_bucket, client: s3_client, expires_in: ttl_seconds)
url_str = signer.presigned_get_url(object_key: full_s3_key)
      #=> https://shiny-bucket-name.s3.eu-west-1.amazonaws.com/dir/testobject?X-Amz-Algorithm...

Why would you want to use it?

The use case is when you need to rapidly generate lots of presigned URLs to the same S3 bucket. When doing the signing, the AWS SDK works fine - but the following operations need to be performed:

  • Credential refresh
  • Bucket region discovery (in which region does the bucket reside?)
  • Bucket endpoint discovery (which hostname should be used for the request?)
  • Cleanup of the various edge cases (blacklisted signed headers and so on)

The metadata should be retrieved only once if the bucket does not change, but with the standard SDK this information might get refreshed often. And there is a substantial amount of generic code that gets called throughout the SDK call even though it is not strictly necessary.

Our signer bypasses these operations and it performs the credential discovery, as well as bucket metadata discovery, but only once - when you instantiate it. The primary usage pattern is as follows:

signer = WT::S3Signer.for_bucket(my_bucket_resource)
signed_urls = all_object_keys.map do |obj_key|
  signer.presigned_get_url(object_key: obj_key)
end

This will stay performant even if signed_urls contains tens of thousands of entries.

Additionally, we cache all the produced strings very aggressively if they do not change between calls to the signing method. We also derive the signing key only once. This optimizes the signing even more.

Here are some benchmarks we have made for comparison. The S3Signer_SDK class executed the same flow, but it reused the Aws::S3::Presigner object that it would instantiate only once, and then call repeatedly.

Warming up --------------------------------------
WT::S3::Signer#presigned_get_url
                         9.325k i/100ms
S3Signer_SDK#presigned_get_url
                       154.000  i/100ms
Calculating -------------------------------------
WT::S3::Signer#presigned_get_url
                         81.422k (±18.9%) i/s -    391.650k in   5.042435s
S3Signer_SDK#presigned_get_url
                          1.865k (± 9.3%) i/s -      9.240k in   5.009593s

Comparison:
WT::S3::Signer#presigned_get_url:  81421.7 i/s
S3Signer_SDK#presigned_get_url:     1864.9 i/s - 43.66x  slower

wt_s3_signer's People

Contributors

bochoven avatar fabioperrella avatar ilja-wetransfer avatar julik avatar jwolski2 avatar luca-suriano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

penso

wt_s3_signer's Issues

License is problematic for many

I was looking into optimizing S3 presigned_url generation too, so was very excited when Julik brought my attention to this gem, y'all have already done a bunch of solid work on it, awesome!

I'm interested in using the gem, and also potentially sending some PR's to add some features I might need. And/or forking if the features I need aren't compatible with the gem's ultra-optimization (like, I need per-url additional query params, like response_content_disposition).

Unfortunately, I'm a bit concerned about the "Hippocratic License". I understand and sympathize with the motivation to prevent sharing the fruit of your labors with entities that will use it to do harm. But I think this license would be incompatible with many projects I work on.

The license says that I can use this software only so long as I don't do harm "in violation of the United Nations Universal Declaration of Human Rights".

It also says if the software is used to provide a service to others, I have to also require any users of the service not to use the service in a way that violates human rights.

This seems to say that if I use your gem in software I provide as a service, I have to get all of my users to promise not to violate human rights. Which doesn't sound so bad -- except who decides what constitutes violating human rights? I am pretty sure the legal departments of any potential customer would be unwilling to sign such a thing.

If I write a gem which has this gem as a dependency -- then anyone using my gem to provide such a service becomes bound by this too? If I write a gem which has this gem as a dependency -- does my gem need to insist on this "do no harm" license too, "virally"?

I know that if I wanted to fork this gem to add features incompatible with it (say those custom headers ) -- I'd have to use this same 'hippocratic license' on my fork. That makes me worried about even looking at your source code anymore -- maybe I should remain ignorant of it, so I can write it from scratch based on the Amazon python example as you did, and apply a different license.

Note that the Hippocratic License is not compatible with the popular GPL. You can't combine code copied from a project licensed by 'hippocratic license' and code copied from a project licensed by GPL into a new project (or each into the other) -- the licenses are incompatible. EthicalSource/hippocratic-license#6

I understand and sympathize with the intent of this kind of license, but Idon't think it works out very well in practice. The key thing is "who gets to decide if something violates the UN Declaration of Human Rights"? Which gets especially complicated when you re-mix software into multiple projects, as we do with open source. For a tiny project it might not matter, but for lots of large/serious projects, it's not really feasible to incorporate code that requires you promise your whole project (and any of it's users!) won't do something that's pretty general/vague without being sure who decides what counts. Some projects I work on, including open-source non-commercial ones, would not allow incorporating code with such a license.

Would you be willing to consider using a more common license that is more compatible with existing code? It's your code, so it's up to you! If not, I will consider looking into reimplementing from the Amazon python example, instead of PR'ing or building on the great work you have done here, which is sad, but it happens.

git tags for releases?

It is common in ruby (and suggested as common in https://semver.org/) to add a git tag for a released version. There is one rubygems release, 0.1.0 of wt_s3_signer, so it would be nice to have a v0.1.0 tag in this repo.

One thing we could easily do with it is see how much (if anything?) has changed in master since the last/first 0.1.0 release. Without that tag, it's cumbersome/infeasible to figure that out. (Looking at the commit history it looks like there are few significant commits since the date of 0.1.0 release, so probably not?)

Would you consider using version tags in the future? If you use the rake release task that can be provided by bundler to do your releases, it adds and pushes a git tag automatically.

If it's still retrievable what git sha corresponds to the 0.1.0 release, it would still be helpful to tag!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.