concourse / s3-resource Goto Github PK

View Code? Open in Web Editor NEW

60.0 23.0 106.0 2 MB

Concourse resource for interacting with AWS S3

License: Apache License 2.0

Go 99.21% Shell 0.18% Dockerfile 0.61%

s3-resource's Introduction

S3 Resource

Versions objects in an S3 bucket, by pattern-matching filenames to identify version numbers.

Source Configuration

bucket: Required. The name of the bucket.
access_key_id: Optional. The AWS access key to use when accessing the bucket.
secret_access_key: Optional. The AWS secret key to use when accessing the bucket.
session_token: Optional. The AWS STS session token to use when accessing the bucket.
aws_role_arn: Optional. The AWS role ARN to be assumed by the user identified by access_key_id and secret_access_key.
region_name: Optional. The region the bucket is in. Defaults to us-east-1.
private: Optional. Indicates that the bucket is private, so that any URLs provided are signed.
cloudfront_url: Optional. The URL (scheme and domain) of your CloudFront distribution that is fronting this bucket (e.g https://d5yxxxxx.cloudfront.net). This will affect in but not check and put. in will ignore the bucket name setting, exclusively using the cloudfront_url. When configuring CloudFront with versioned buckets, set Query String Forwarding and Caching to Forward all, cache based on all to ensure S3 calls succeed.
endpoint: Optional. Custom endpoint for using S3 compatible provider.
disable_ssl: Optional. Disable SSL for the endpoint, useful for S3 compatible providers without SSL.
skip_ssl_verification: Optional. Skip SSL verification for S3 endpoint. Useful for S3 compatible providers using self-signed SSL certificates.
skip_download: Optional. Skip downloading object from S3. Useful only trigger the pipeline without using the object.
server_side_encryption: Optional. An encryption algorithm to use when storing objects in S3.
sse_kms_key_id: Optional. The ID of the AWS KMS master encryption key used for the object.
use_v2_signing: Optional. Use signature v2 signing, useful for S3 compatible providers that do not support v4.
disable_multipart: Optional. Disable Multipart Upload. useful for S3 compatible providers that do not support multipart upload.

File Names

One of the following two options must be specified:

regexp: Optional. The forward-slash (/) delimited sequence of patterns to match against the sub-directories and filenames of the objects stored within the S3 bucket. The first grouped match is used to extract the version, or if a group is explicitly named version, that group is used. At least one capture group must be specified, with parentheses.

The version extracted from this pattern is used to version the resource. Semantic versions, or just numbers, are supported. Accordingly, full regular expressions are supported, to specify the capture groups.

The full regexp will be matched against the S3 objects as if it was anchored on both ends, even if you don't specify ^ and $ explicitly.
versioned_file: Optional If you enable versioning for your S3 bucket then you can keep the file name the same and upload new versions of your file without resorting to version numbers. This property is the path to the file in your S3 bucket.

Initial state

If no resource versions exist you can set up this resource to emit an initial version with a specified content. This won't create a real resource in S3 but only create an initial version for Concourse. The resource file will be created as usual when you get a resource with an initial version.

You can define one of the following two options:

initial_path: Optional. Must be used with the regexp option. You should set this to the file path containing the initial version which would match the given regexp. E.g. if regexp is file/build-(.*).zip, then initial_path might be file/build-0.0.0.zip. The resource version will be 0.0.0 in this case.
initial_version: Optional. Must be used with the versioned_file option. This will be the resource version.

By default the resource file will be created with no content when get runs. You can set the content by using one of the following options:

initial_content_text: Optional. Initial content as a string.
initial_content_binary: Optional. You can pass binary content as a base64 encoded string.

Behavior

`check`: Extract versions from the bucket.

Objects will be found via the pattern configured by regexp. The versions will be used to order them (using semver). Each object's filename is the resulting version.

`in`: Fetch an object from the bucket.

Places the following files in the destination:

(filename): The file fetched from the bucket.
url: A file containing the URL of the object. If private is true, this URL will be signed.
version: The version identified in the file name.
tags.json: The object's tags represented as a JSON object. Only written if download_tags is set to true.

Parameters

skip_download: Optional. Skip downloading object from S3. Same parameter as source configuration but used to define/override by get. Value needs to be a true/false string.
unpack: Optional. If true and the file is an archive (tar, gzipped tar, other gzipped file, or zip), unpack the file. Gzipped tarballs will be both ungzipped and untarred. It is ignored when get is running on the initial version.
download_tags: Optional. Write object tags to tags.json. Value needs to be a true/false string.

`out`: Upload an object to the bucket.

Given a file specified by file, upload it to the S3 bucket. If regexp is specified, the new file will be uploaded to the directory that the regex searches in. If versioned_file is specified, the new file will be uploaded as a new version of that file.

Parameters

file: Required. Path to the file to upload, provided by an output of a task. If multiple files are matched by the glob, an error is raised. The file which matches will be placed into the directory structure on S3 as defined in regexp in the resource definition. The matching syntax is bash glob expansion, so no capture groups, etc.
acl: Optional. Canned Acl for the uploaded object.
content_type: Optional. MIME Content-Type describing the contents of the uploaded object

Example Configuration

Resource

When the file has the version name in the filename

- name: release
  type: s3
  source:
    bucket: releases
    regexp: directory_on_s3/release-(.*).tgz
    access_key_id: ACCESS-KEY
    secret_access_key: SECRET

When the file is being versioned by s3

- name: release
  type: s3
  source:
    bucket: releases
    versioned_file: directory_on_s3/release.tgz
    access_key_id: ACCESS-KEY
    secret_access_key: SECRET

Plan

- get: release

- put: release
  params:
    file: path/to/release-*.tgz
    acl: public-read

Required IAM Permissions

Non-versioned Buckets

The bucket itself (e.g. "arn:aws:s3:::your-bucket"):

s3:ListBucket

The objects in the bucket (e.g. "arn:aws:s3:::your-bucket/*"):

s3:PutObject
s3:PutObjectAcl
s3:GetObject
s3:GetObjectTagging (if using the download_tags option)

Versioned Buckets

Everything above and...

The bucket itself (e.g. "arn:aws:s3:::your-bucket"):

s3:ListBucketVersions
s3:GetBucketVersioning

The objects in the bucket (e.g. "arn:aws:s3:::your-bucket/*"):

s3:GetObjectVersion
s3:PutObjectVersionAcl
s3:GetObjectVersionTagging (if using the download_tags option)

Development

Prerequisites

Go is required - version 1.13 is tested; earlier versions may also work.
docker is required - version 17.06.x is tested; earlier versions may also work.

Running the tests

The tests have been embedded with the Dockerfile; ensuring that the testing environment is consistent across any docker enabled platform. When the docker image builds, the test are run inside the docker container, on failure they will stop the build.

Run the tests with the following command:

docker build -t s3-resource --target tests --build-arg base_image=paketobuildpacks/run-jammy-base:latest .
 .

Integration tests

The integration requires two AWS S3 buckets, one without versioning and another with. The docker build step requires setting --build-args so the integration will run.

Run the tests with the following command:

docker build . -t s3-resource --target tests \
  --build-arg S3_TESTING_ACCESS_KEY_ID="access-key" \
  --build-arg S3_TESTING_SECRET_ACCESS_KEY="some-secret" \
  --build-arg S3_TESTING_BUCKET="bucket-non-versioned" \
  --build-arg S3_VERSIONED_TESTING_BUCKET="bucket-versioned" \
  --build-arg S3_TESTING_REGION="us-east-1" \
  --build-arg S3_ENDPOINT="https://s3.amazonaws.com"

Speeding up integration tests by skipping large file upload

One of the integration tests uploads a large file (>40GB) and so can be slow. It can be skipped by adding the following option when running the tests:

  --build-arg S3_TESTING_NO_LARGE_UPLOAD=true

Integration tests using role assumption

If S3_TESTING_AWS_ROLE_ARN is set to a role ARN, this role will be assumed for accessing the S3 bucket during integration tests. The whole integration test suite runs either completely using role assumption or completely by direct access via the credentials.

Required IAM permissions

In addition to the required permissions above, the s3:PutObjectTagging permission is required to run integration tests.

Contributing

Please make all pull requests to the master branch and ensure tests pass locally.

s3-resource's People

Contributors

Stargazers

Watchers

Forkers

amitkgupta jamesclonk drnic jamiemonserrate pivotal-cf-experimental dgodd vmware-archive keymon saliceti alphagov cjcjameson barthy1 aequitas fmy hajimeni shinji62 pesama parastoo-62 abelhu baszalmstra dsabeti mweagle luan smgoller jmcarp jobpsdk camelpunch pdelagrave malcolmgc cloud-gov msarahan andrew-edgar engineerbetter troyready ljfranklin ships johannesrudolph crsimmons sduenas shyx0rmz calebwashburn willejs xinzweb mevansam syplayground cappyzawa queeno digitalcoffee ruurdk mgruener scottillogical r-chris getoutreach renjuzac daryn talset periscopedata santus444 goutamtadi1 fmr-llc cxhercules neumayer andreasscherbaum miclip antonu17 svrc 7factor cirocosta mchughtb sneal grenzr chinnukoduru burukuru tamimigithub fiftin peterellisjones sammarsayed mgsolid kallisti5 ufranske ctreatma matt-royal doytsujin mdelillo spatialbuzz adarshasp87 schmurfy maxknee jvshahid spiegela isabella232 sbanoth-vmw mdreem fidelityinternational wn-doolittle instana cscanlin-kwh mitodl yujiazhen jmbrowne-ca

s3-resource's Issues

Add an element that puts the full url of an uploaded resource into the clipboard

We use this resource for archiving log files. When we need to pull a specific log it would be nice to get a full download url instead of just the filename.

Generating signed URLs should be done via `put`

Related to concourse/concourse#622 and #47

private: true on get is confusingly named and breaks resource semantics, because the URL it provides has an expiration value, and is different every time it's generated, making get nondeterministic.

Generating the URL should instead be generated by a put, guaranteeing that it'll be a new URL each time. This could look like:

- get: my-bucket-thing
- put: my-bucket-thing
  params: {sign: my-bucket-thing, expires_in: 24h}

The put would yield a version like {path: foo, signed_at: <timestamp>} (or version_id if using versioned_file). The get resulting from the put would generate the signed URL. This fixes the hole because the get will be generating a URL valid for that particular timestamp (because it's the start of the expiration countdown), and the timestamp will be different every time.

show sha1 for get/puts in s3 resource

Moved from concourse/concourse#197

cc @cppforlife

Would be nice to see sha1 of a file being downloaded/uploaded just like git resource shows:

Cloning into '/tmp/build/get'...
...

Upload and Download fails if S3 requires v2 signature

The upload and download files operations is using s3manager.NewDownloader instead of s3manager.NewDownloaderWithClient that will leverage the client that has been processed with v2Signature support. This fails with http status code 400 on s3 environments that require v2 signatures.

please provide example of versioned_file usage for main config, get and put

Support S3 compatible / custom endpoint URL?

Would it be possible to add support for S3 compatible endpoints?
Currently there seems to be no way to specify a custom endpoint URL.

Something along the lines of this:

if endpointURL != "" {
    region = aws.Region{S3Endpoint: endpointURL}
}
...
s3.New(auth, region)

Feature suggestion: initial resource version & content

We are using a S3 bucket to hold state for terraform, which we share between jobs and we've been using the S3 resource to do this.

The first job in the pipeline that uses terraform state, needs to both consume the S3 resource (to read existing state) and write out its modified state.

When runnning the pipeline for the first time, the state is empty and the bucket
contains no matching object. In our situation what we'd actually like to do is
for the resource to return an inital, low numbered, version and some user-defined empty content.

In order to do this we were thinking of adding two bits of configuration:

initial_version: Optional. If present, when there's no object in the bucket, this would be returned as the only version.
initial_content: Optional. If present, when there's no matching object in the S3 bucket, this content will be placed in filename.

We'd probably need to check that if initial_version was set, then initial_content was also set and vice versa.

Before going away and writing the code to do this, we wanted to check that feature was something you would consider for inclusion. Also, we're relatively new to concourse and if there's a better way of passing state between jobs than using resources, that would be good to know.

Subdirectories need to have a trailing slash when used as a prefix

https://github.com/concourse/s3-resource/blob/master/versions/versions.go#L127

This doesn't add back in a trailing slash to the prefix, which is necessary for the strictest permissions.

`put` fails when `cloudfront_url` specified

When specifying the cloudfront_url parameter, a put to that resource fails with

error running command: Forbidden: Forbidden
	status code: 403, request id:

Our cloudfront distribution does have Query String Forwarding and Caching set to Forward all, cache based on all.

We hijacked into the container and ran cat config.json | /opt/resource/in /tmp with config.json containing {"source":{...}, "version":{"path":"path_to_file_in_s3_bucket"}}. This failed when the cloudfront_url key was present.

Output directory when fetch files from S3

Hi,

We don't know if its possible to configure a output directory in get operations inside the pipeline.

The reason is that we need to use the file stored in S3 to copy in a dockerfile (using add or copy directives), and it's necessary to have in a subdirectory from dockerfile path.

Thanks in advance!

Feature Request: IAM instance profile support

If concourse is running on AWS, it can take advantage of IAM instance profiles to access the S3 buckets.

http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html

It can be enabled if, for instance, the credentials are empty or if with an additional option credentials_source, like:

  - name: myfile.txt
    type: s3
    source:
      bucket: {{state_bucket}}
      region_name: {{aws_region}}
      versioned_file: myfile.txt
      credentials_source: env_or_profile

Get specific version of file from s3

I would like to request a specific version from S3. My jobs on one test server never start, and on the other, they get the newest version of that tarball, rather than the specified one

I have my resource like this:

- name: s3-archive
  source:
    access_key_id: <key>
    bucket: <bucket>
    regexp: recipes-(.*).tar.bz2
    secret_access_key: <secret-key>

and my get section like:

  - get: s3-archive
    params:
      version: 1.0.0
    passed: []
    trigger: true

The output in the web UI is:

discovering any new versions of s3-archive
waiting for a suitable set of input versions
s3-archive - no versions available

However, I can see from the AWS console that I do have matching patterns available.

I have tried removing the trigger: true part, but no change.

Is this expected behavior? Is it not possible to specify a particular version to get?

Support bzip2 archive

Please consider support for tar with bzip2 compression format for archives.

Unclear how to specify how to map container's directory structure to S3 destination directory structure

Hi there,

Based on the changes in v0.68, we now see that puts to/from syntax has been deprecated. After some digging and looking at the example that you provided in the issue here, we see that the resource definition's regexp field needs to contain an foldername if you want the artifact to end up within a folder.

Can you please clarify in the readme exactly how to achieve the following:
I have a file in the output of a job: my_folder/my_file.1.2.3.tgz
I want to put all files that match the following regex in that output to an s3 resource: my_folder/my_file.*.tgz
I want them to end up in the folder my_files on the s3 bucket.

It is our understanding that we should specify the following regexp property on the s3 resource definition: my_files/my_file.*.tgz
and the file property of the put to the s3 resource should be: my_folder/my_file.*.tgz

If this understanding is correct we believe that the readme is unclear and could be updated to better demonstrate this functionality.

Thanks!

C.J. and Zak

AWS SDK can panic when retrieving files that are not chunked

This is mostly to cross reference aws/aws-sdk-go#417.

On smaller files, S3 may not return the Content-Range header if it did not chunk the response. In the AWS SDK, it was accessing the ContentRange value (a *string) without any nil checks and it was causing a panic when the put step does a get after doing the upload.

This caused a huge rash of 20+ build failures for us over the weekend. I have a patch that I'll be submitting upstream, but wanted to filed this issue to highlight it so you can pull in the latest SDK once it is merged.

The panic output was:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x531305]

goroutine 7 [running]:
github.com/aws/aws-sdk-go/service/s3/s3manager.(*downloader).setTotalBytes(0xc820096380, 0xc8201360e0)
    /tmp/build/b6237934-9e2f-4838-56a4-73bd768b9a80/gopath/src/github.com/concourse/s3-resource/Godeps/_workspace/src/github.com/aws/aws-sdk-go/service/s3/s3manager/download.go:206 +0x95
github.com/aws/aws-sdk-go/service/s3/s3manager.(*downloader).downloadPart(0xc820096380, 0xc8200146c0)
    /tmp/build/b6237934-9e2f-4838-56a4-73bd768b9a80/gopath/src/github.com/concourse/s3-resource/Godeps/_workspace/src/github.com/aws/aws-sdk-go/service/s3/s3manager/download.go:175 +0x46a
created by github.com/aws/aws-sdk-go/service/s3/s3manager.(*downloader).download
    /tmp/build/b6237934-9e2f-4838-56a4-73bd768b9a80/gopath/src/github.com/concourse/s3-resource/Godeps/_workspace/src/github.com/aws/aws-sdk-go/service/s3/s3manager/download.go:114 +0xbe

Versioned S3 Resource Version No Longer Exists or Version Null in Concourse DB

We saw this when Concourse thought the version_id was null or had a version_id that no longer existed in S3. In both cases, S3 had files with new version_ids. This could possibly be reproduced creating a versioned resource then deleting the most recent version in S3.

When hijacked into the container and manually running the check, we got [] as the response when supplying a null version_id. With no version_id, we got the latest version in the list.

To fix, we had to update the versioned_resource record in the Concourse DB, setting the version_id to the latest version id from S3.

Resource versions not updated after pausing the latest version

We butchered a version in a s3 bucket so basically our version would not update the older versions because we believe it just compares the name of the newest one and sees if it needs to get a later one.
For example

2.0.0-dev.1.tgz
1.3.0-dev.40.tgz
1.3.0-dev.39.tgz
1.3.0-dev.38.tgz

We accidentally put a 2.0.0 version into our bucket. And we are still building 1.3.0-dev versions. However, concourse will not update anymore 1.3.0 versions (e.g, 1.3.0-dev-41) because we believe 2.0.0 sits on top because it's a higher number.

We would like a way to clear the resource cache and reset through fly at least. Or have the s3 resource properly check the version and the timestamp and put the latest one inserted as the latest one we want to use.

Can this resource upload a directory tree?

Well, can it? Or can it only upload a single file? The documentation doesn't touch on this.

Having read the code a bit, i think this depends on whether Go's glob function matches directories, and then on whether the AWS client handles directory trees, which it looks like it doesn't.

If this resource can't upload directories, can i suggest that (a) the documentation explicitly say so and (b) an attempt to upload a directory fails with an error saying so? I'm happy to submit a PR to do this.

Even better would be if it could upload directories, of course, but i suspect you won't want to do that.

Possible to use without access key?

The docs specify that the access key id and secret are required. Yet, in the source, it looks like the resource will use anonymous credentials if they are missing. We are running Concourse on AWS, and would like to use Roles to govern access. I tried setting up the required role and use the bucket resource with no access key id/secret, but it didn't work.

First of all this is a question: Is it possible to use the bucket resource without an access keypair, and govern access with server Roles? If not, is this something you'd consider accepting a patch to support?

Fail to upload file encrypted with KMS key, but works on the downalod

When trying to put a file to a S3 bucket it fails. The S3 bucket is private with versioning on. The file is using a KMS key to secure it. When I use a get to pull the file from the S3 bucket it works as planned, but when I do a put it fails with this error:

error running command: InvalidArgument: Server Side Encryption with AWS KMS managed key requires HTTP header x-amz-server-side-encryption : aws:kms
	status code: 400, request id: 04541E0A8076886F

Any help is appreciated.

Officially deprecate the out "from/to" syntax

This syntax is currently working, but not documented. Can you please issue an official deprecation notice, or if it's going to stay, then provide documentation?

Putting to a filename that doesn't match the configured regexp should fail fast

Otherwise you get weird mis-matches between the check and the put returning different versions.

I know this is user error, but it would have saved us some debugging as to what was going on

When version capture group fails to parse as semver order is alphabetical

We started pre-compiling our releases and appending the stemcell version to the filename. This meant that our existing regexp capture group no longer could be parsed as semver.

The trouble is that we had no idea until we realized that we were using totally the wrong versions because it was ordering them in what I assume was alphabetical order.

It would be preferable for the resource to error out as alphabetical is probably never the desired ordering. Unless someone is using a timestampt? I dunno.

Ability to specify version in the directory path of the regex

I am trying to use s3 resource to store build artifacts in directory which are named according to the versions. I am trying the following

s3 resource regexp is "regexp: server_build/release-(.)/output.txt"
s3 put is " file: output/release-/output.txt"

Running this task gives this error.

panic: version number was not valid: Invalid character(s) found in major number "("

include sha1 and sha256 in the metadata during a put and a get

How to use minio as s3 endpoint

I'm using concourse 1.3.0-rc.73, I'm trying to use Minio (https://github.com/minio/minio) as a compatible s3 endpoint but when I use a configuration like this:

- name: release_candidate
  type: s3
  source:
    bucket: artifacts
    regexp: dev-(.*).tgz
    access_key_id: {{s3-access-key-id}}
    secret_access_key: {{s3-secret-access-key}}
    endpoint: http://192.168.1.128:9000
    cloudfront_url: http://192.168.1.128:9000

I get this error:

error: check failed with exit status '1':
error checking for new versions: InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.
        status code: 403, request id: 8220B4E973DE1AC9

It seems the service is still targeting AWS. Is it possible to use Minio and if so, what are the proper parameters to use?

Thanks!

Use the default credentials for the aws client

By default, the aws client will use a credential provider chain to discover authentication. If a user doesn't specify an access key id and secret access key then the defaults should be left alone. Currently the defaults are set to credentials.AnonymousCredentials

https://github.com/concourse/s3-resource/blob/master/s3client.go#L87

Details on the provider chain: https://github.com/aws/aws-sdk-go/blob/b2dc98bb584e48b0f5f39c93110633173c5da43c/aws/config.go#L38

This would mean that if the worker is an EC2 instance, then the IAM Profile associated with the instance could be used.

Cannot parse filenames with versions greater than int32

Attempting to pull down artifact-1440786165236.tar fails with the following error

resource script '/opt/resource/check []' failed: exit status 2

stderr:
panic: version number was not valid: Error parsing version: strconv.ParseInt: parsing "1440786165236": value out of range

goroutine 1 [running]:
github.com/concourse/s3-resource/versions.Extract(0xc2082a3500, 0x35, 0xc20802a080, 0x36, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1)
    /tmp/build/a7803cba-ead4-4240-7108-ee20e10dee9f/gopath/src/github.com/concourse/s3-resource/versions/versions.go:57 +0x1d9
github.com/concourse/s3-resource/versions.GetBucketFileVersions(0x7f1a77f48090, 0xc20801f3c0, 0xc20801f1e0, 0x14, 0xc20800d4d0, 0x28, 0xc20801f200, 0x1e, 0xc20802a080, 0x36, ...)
    /tmp/build/a7803cba-ead4-4240-7108-ee20e10dee9f/gopath/src/github.com/concourse/s3-resource/versions/versions.go:137 +0x2d9
github.com/concourse/s3-resource/check.(*CheckCommand).checkByRegex(0xc20808e2a0, 0xc20801f1e0, 0x14, 0xc20800d4d0, 0x28, 0xc20801f200, 0x1e, 0xc20802a080, 0x36, 0x0, ...)
    /tmp/build/a7803cba-ead4-4240-7108-ee20e10dee9f/gopath/src/github.com/concourse/s3-resource/check/check_command.go:34 +0x9e
github.com/concourse/s3-resource/check.(*CheckCommand).Run(0xc20808e2a0, 0xc20801f1e0, 0x14, 0xc20800d4d0, 0x28, 0xc20801f200, 0x1e, 0xc20802a080, 0x36, 0x0, ...)
    /tmp/build/a7803cba-ead4-4240-7108-ee20e10dee9f/gopath/src/github.com/concourse/s3-resource/check/check_command.go:27 +0x26d
main.main()
    /tmp/build/a7803cba-ead4-4240-7108-ee20e10dee9f/gopath/src/github.com/concourse/s3-resource/cmd/check/main.go:27 +0x1a6

goroutine 9 [runnable]:
net/http.(*persistConn).readLoop(0xc20804a210)
    /usr/src/go/src/net/http/transport.go:928 +0x9ce
created by net/http.(*Transport).dialConn
    /usr/src/go/src/net/http/transport.go:660 +0xc9f

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/src/go/src/runtime/asm_amd64.s:2232 +0x1

goroutine 10 [select]:
net/http.(*persistConn).writeLoop(0xc20804a210)
    /usr/src/go/src/net/http/transport.go:945 +0x41d
created by net/http.(*Transport).dialConn
    /usr/src/go/src/net/http/transport.go:661 +0xcbc

Support EC2 IAM Role SessionToken

I'm using IAM Roles for EC2 security credentials. In trying to use the S3 resource, it looks as though the SessionToken must also be set. Just using the AccessKeyID and SecretAccessKey values results in an error.

I didn't see a way to provide the SessionToken to NewStaticCredentials.

Related: concourse/semver-resource#16

Naming of "private" flag is confusing (low-priority, docs)

We see that if we set "private" on the resource definition, all the urls we get are signed. Signing the URLs actually makes them more public, in a way. Without signing, if the bucket is private, you can only access it with proper IAM credentials. With signing, a "public" consumer has a credentialed way to get at them. For our current use case, that's the feature we actually want! We have a legacy CI system that needs to consume Concourse outputs but doesn't have AWS credentials. But the naming is throwing us off.

Maybe this is just a README change, maybe a deprecation and new flag.

Example configuration uses deprecated syntax

The example plan configuration below--

- put: release
  params:
    from: a/release/path/release-(.*).tgz

-- uses the "from" parameter, which is deprecated. Because of this, it was not clear to us that this syntax was deprecated. It should be updated to use the "file" syntax.

Also, it was difficult for us to understand what format the file path should be written in-- whether it was a regexp or a glob or something else entirely. We had a number of problems uploading files to an S3 bucket, and there was a long period where weren't sure if we weren't creating the file in question correctly, or if we weren't referring to it correctly in the Resource/Plan configuration. More explicit guidance on this point in the documentation would have helped us solve this problem faster and with less direct assistance from the Concourse team.

please retry downloads

We keep getting errors like:

error running command: read tcp 10.254.0.22:54409->54.231.64.25:443: read: connection reset by peer

And have to manually press the re-run button because this is something we have no control over.

S3 Resource doesn't utilize AWS S3's Multipart feature for large uploads/downloads

AWS has a feature for faster uploads of a single large file using Multipart uploads. This is a problem when uploading large release tarballs to S3. Would be great if the s3 resource could be smart and decide what type of upload to use for what type of file.

`put` should upload to the directory specified in `regexp`

Today you have to specify regexp: path/to/foo-(.*).tgz, which is fine and all for checking, but gets awkward when you want to upload a file to path/to. The only way to do that currently us via using from and to, which is deprecated and a bit confusing in how it works.

Instead, the file specified should be uploaded to whatever parent directory is specified in the regexp.

Full usage example:

resources:
- name: foo
  type: s3
  source:
    bucket: my-bucket
    regexp: path/to/foo-(.*).tgz

jobs:
- name: upload-foo
  plan:
  - task: make-foo
    # ...
  - put: foo
    params: {file: made-foo/foo-*.tgz}

Out progress bar is incorrect

Given a large file being put to Minio with s3-resource, I see the following:

9.21 GB / 9.21 GB [======================================] 100.00 % 4.45 TB/s 0s

The reason for this is because the progress reader is incremented when the file to be uploaded is seeked in, rather than when bytes are actually read. The S3 client goes out of its way to use concurrency to upload the file as quickly as possible, and in doing so actually seeks the entire file quickly and immediately before even attempting to upload it.

I looked at the code change that would be necessary to fix this, and it's a bit more than I wanted to take on at this time.

Briefly, this should be moved down into the ReadAt function below. However, in doing just that you'll note that while the progress now tracks the actual upload of the file, it actually ends up reading 200%. This is again because of how the AWS S3 client works.

For now I think it's best to remove the incorrect progress bar and come up with a real solution later on. Otherwise it's just wildly incorrect.

I noticed this while uploading to a local Minio server, and I'm not sure if that makes it a special case or not.

Can we have file put to support definition of filenames using regex

There are situations where you want to define filenames to put in the same regex that you use to specify the versioned file, and the regular filename wildcards are not enough.

(also makes for good consistency to the resource types).

The resource does not offer a way to trust a private CA

Other than the option "disable_ssl" which has no effect on the AWS client regarding certificate trust, there appears to be no way to add trusted CAs to the client. This means it appears to be impossible to trust an internal S3 compatible storage endpoint signed by a private CA.

Implement caching options

When deploying to S3, especially for website resources, you'd want to have more refined control on the caching (--cache-control). Our use-case would set no caching for the html files, but all other website resources (images, scripts, etc.) are to be cached.

The broader picture is that support for PUT Bucket tagging (metadata) is lacking.

The current work around seems to be to put some extra configurations in CloudFront.

Only works with us Standard region not with other

I have a resource defined like this

name: pipeline-version
type: semver
source:
bucket: {{aws-pipeline-bucket}}
key: pipeline-version
initial_version: 0.1.0
access_key_id: {{aws-access-key-id}}
secret_access_key: {{aws-secret-access-key}}
region: us-west-2

I created my bucket in us-west-2

was getting
resource script '/opt/resource/check []' failed: exit status 1
stderr:
error checking for new versions: Get : 301 response missing Location header

so it was not passing the region correctly to s3 request

only when bucket is in US Standard it works as it default to us-east-1
https://github.com/concourse/s3-resource/blob/9b39d518283368260128292d8309da489b1f148e/s3client.go#L59-61

Document that `to` requires a trailing slash to be considered a subdirectory

The documentation currently states

to: Optional. A destination directory in the bucket.

This is only true if you have a trailing slash at the end of your to, otherwise it considers that the file name to upload it as.

Unable to get the s3 resource to point to my own s3 server

Hi,

I have an s3 server running on my computer, concourse running inside a bosh-lite VM on the same computer.
I do not know how to specify "endpoint" or "cloudfront_url" in such a way as to communicate with that server (via ip address).
I suspect "endpoint" is the wrong way to go about it, as that generates a URL of the format "https://.", which will not suit my needs.
In attempting to use cloudfront_url, though, it appears to be ignored, and when attempting to "put", I get error running command: 403: "The AWS Access Key Id you provided does not exist in our records."

My first guess was that I was unable to reach the s3 server, and it was falling back to amazon, but I was able to make a task that could ping the server.

When I added "endpoint" to the resource's configuration, it then started using that (leading to error messages like error running command: Post https://s3_test_artifacts.10.24.1.174:10453/test-0.0.1.tar.gz?uploads: dial tcp: lookup s3_test_artifacts.10.24.1.174: no such host), which surprised me, given from what I've read of the source code suggests cloudfront_url takes precedence.

However, I have not been able to identify whether the s3 resource I have running on my VM (hence deployed via bosh) has the same source code.

Any help on how to identify what's causing this problem for me, or how to identify what the version of the s3-resource I am running is would be appreciated.

Regards,

Jonathan

Retrying downloads aggressively

We're seeing failures to put to an S3 resource like this a lot.

error running command: read tcp 10.254.1.222:37446->54.231.237.6:443: read: connection reset by peer

I see that there was a previous, closed issue #15 with this problem, however, it doesn't seem to have addressed it sufficiently for us. Is it possible to look into increasing the number of retries or other solutions to this problem again?
Thanks.

Put task to an S3 bucket fails even though the actual upload succeeded

This describes an issue that we found a workaround for, but we felt still merited attention. We had a hard time figuring out how to upload screenshots to an S3 bucket; for a while, we were able to upload the file to the bucket, but then the task failed for reasons we don't entirely understand.

We had a task with the following configuration:

          on_failure: &screenshots
            do:
            - put: capybara-screenshots
              params:
                file: prepared-screenshots/*.tar.gz

With this resource definition:

  - name: capybara-screenshots
    type: s3
    source:
      bucket: capybara-failure-screenshots
      access_key_id: {{bam-aws-access-key-id}}
      secret_access_key: {{bam-aws-secret-access-key}}

As far as we've been able to figure out from the documentation, this should work, and in fact the file is uploaded to Amazon, but the put step fails when it tries to pull the file back down. This is the error thrown:

error running command: InvalidParameter: 1 validation errors:
- field too short, minimum length 1: Key

While we fixed the immediate issue by switching to using a versioning file, we're still not sure why we were seeing this behavior in the first place. We had to get a member of the Concourse team to physically look at our set up in order to fix the issue. It's not clear to us how our configuration was different from the example in the documentation. An updated or more robust example, or an explanation of this error message, would have helped us diagnose and fix this problem on our own.

S3 Resource Regex does not accept hyphens

Looks like the regex validator here: https://github.com/concourse/s3-resource/blob/master/versions/versions.go#L114

Causes Concourse to not access s3 bucket regexes of the format e.g., dir/hyphen-dir/file_(.*).tgz.

And we end up seeing S3's AccessDenied because we don't have ListBucket permissions for the higher-level directories in the bucket:

resource script '/opt/resource/check []' failed: exit status 1

stderr:
�[31merror listing files: AccessDenied: Access Denied
    status code: 403, request id: 
�[0m

Add option to ignore self-signed cert validation

When using Riak-CS as S3 compatible resource and the service is using self-signed certs, verification fails.

Error: error running command: RequestError: send request failed
caused by: Post https://s3.pez.pivotal.io/asfdasdf?uploads=: 
x509: certificate signed by unknown authority

Will help having an option to ignore/handle self-signed certs.

Resource should checksum the downloaded blob to confirm integrity

One of our workers downloaded a blob from S3 to our local vSphere environment. The get succeeded but subsequent tasks failed with Seg Fault when trying to run the downloaded binary. Running shasum confirmed that the binary checksum did not have the expected value. Both the get and task took place on the same worker. This corrupt binary is now cached, requiring us to recreate the worker.

It would be great if the resource would check the checksum before succeeding or better yet retry the download on a mismatch.

Loading the "private" field with fly -v quotes "true" and breaks resource

Due to this concourse issue: concourse/concourse#360

The difference between the request time and the current time is too large when uploading to s3 bucket

I encountered the following problem frequently when uploading resource to s3 bucket.

391.52 MB / 391.52 MB [============================] 100.00 % 404.10 KB/s 16m32s
error running command: RequestTimeTooSkewed: The difference between the request time and the current time is too large.
upload id: YAsz7CGh5S5KTL_FJVCp5VQ8pe8C01qQNpCad8ZouTfARLqVVzz28oXpoCEzd_IIqdU4EJKEbVKzxY_kMDfKUA--

Publishing multiple files with the same version

We have a build that creates packages for 13 different linux distributions.
For each distribution we build the package itself, debuginfo and -dev (if applicable).
The packages are built with the same semver.

It will be great if s3-resource could publish and consume multiple files with the same version.

Getting resource fails when regex is changed

If you deploy the resource with something like

- name: artifact
  type: s3
  source:
    ...
    regexp: blah-(\d+)\.tar

and suddenly you add compression (gzip) to your tars, so you change the resource to

- name: artifact
  type: s3
  source:
    ...
    regexp: blah-(\d+)\.tar\.gz

If a previous version of the resource has already been fetched (the uncompressed version, e.g. version 1337), then after a fly configure, getting new versions of the resource fails with an error:

checking failed

---
resource script '/opt/resource/check []' failed: exit status 1

stderr:
�[31merror running command: version number could not be found in: blah-1337.tar
�[0m




path blah-1337.tar

concourse / s3-resource Goto Github PK

s3-resource's Introduction

S3 Resource

Source Configuration

File Names

Initial state

Behavior

check: Extract versions from the bucket.

in: Fetch an object from the bucket.

Parameters

out: Upload an object to the bucket.

Parameters

Example Configuration

Resource

Plan

Required IAM Permissions

Non-versioned Buckets

Versioned Buckets

Development

Prerequisites

Running the tests

Integration tests

Speeding up integration tests by skipping large file upload

Integration tests using role assumption

Required IAM permissions

Contributing

s3-resource's People

Contributors

Stargazers

Watchers

Forkers

s3-resource's Issues

Recommend Projects

Recommend Topics

Recommend Org

`check`: Extract versions from the bucket.

`in`: Fetch an object from the bucket.

`out`: Upload an object to the bucket.