Giter VIP home page Giter VIP logo

s3-sync's Introduction

s3-sync

A streaming upload tool for Amazon S3, taking input from a readdirp stream, and outputting the resulting files.

s3-sync is also optionally backed by a level database to use as a local cache for file uploads. This way, you can minimize the frequency you have to hit S3 and speed up the whole process considerably.

You can use this to sync complete directory trees with S3 when deploying static websites. It's a work in progress, so expect occasional API changes and additional features.

Installation

npm install s3-sync

Usage

require('s3-sync').createStream([db, ]options)

Creates an upload stream. Passes its options to knox, so at a minimum you'll need:

  • key: Your AWS access key.
  • secret: Your AWS secret.
  • bucket: The bucket to upload to.

The following are also specific to s3-sync:

  • concurrency: The maximum amount of files to upload concurrently.
  • retries: The maximum number of times to retry uploading a file before failing. By default the value is 7.
  • headers: Additional headers to include on each file.
  • hashKey: By default, file hashes are stored based on the file's absolute path. This doesn't work very nicely with temporary files, so you can pass this function in to map the file object to a string key for the hash.
  • acl: Use a custom ACL header. Defaults to public-read.
  • force: Force s3-sync to overwrite any existing files.

You can also store your local cache in S3, provided you pass the following options, and use getCache and putCache (see below) before/after uploading:

  • cacheDest: the path to upload your cache backup to in S3.
  • cacheSrc: the local, temporary, text file to stream to before uploading to S3.

If you want more control over the files and their locations that you're uploading, you can write file objects directly to the stream, e.g.:

var stream = s3sync({
    key: process.env.AWS_ACCESS_KEY
  , secret: process.env.AWS_SECRET_KEY
  , bucket: 'sync-testing'
})

stream.write({
    src: __filename
  , dest: '/uploader.js'
})

stream.end({
    src: __dirname + '/README.md'
  , dest: '/README.md'
})

Where src is the absolute local file path, and dest is the location to upload the file to on the S3 bucket.

db is an optional argument - pass it a level database and it'll keep a local cache of file hashes, keeping S3 requests to a minimum.

stream.putCache(callback)

Uploads your level cache, if available, to the S3 bucket. This means that your cache only needs to be populated once.

stream.getCache(callback)

Streams a previously uploaded cache from S3 to your local level database.

stream.on('fail', callback)

Emitted when a file has failed to upload. This will be called each time the file is attempted to be uploaded.

Example

Here's an example using level and readdirp to upload a local directory to an S3 bucket:

var level = require('level')
  , s3sync = require('s3-sync')
  , readdirp = require('readdirp')

// To cache the S3 HEAD results and speed up the
// upload process. Usage is optional.
var db = level(__dirname + '/cache')

var files = readdirp({
    root: __dirname
  , directoryFilter: ['!.git', '!cache']
})

// Takes the same options arguments as `knox`,
// plus some additional options listed above
var uploader = s3sync(db, {
    key: process.env.AWS_ACCESS_KEY
  , secret: process.env.AWS_SECRET_KEY
  , bucket: 'sync-testing'
  , concurrency: 16
  , prefix : 'mysubfolder/' //optional prefix to files on S3
}).on('data', function(file) {
  console.log(file.fullPath + ' -> ' + file.url)
})

files.pipe(uploader)

You can find another example which includes remote cache storage at example.js.

s3-sync's People

Contributors

aantthony avatar christophercliff avatar hguillermo avatar hughsk avatar michalkrupa avatar rkmax avatar sampsasaarela avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

s3-sync's Issues

Headers should be on a file basis not per options

I would think that it is more practical to be able to specify which set of files gets the headers not just all files that are synced. Am I missing something else here? All I see is the option to add additional headers to all files that are specified to be synced. Would be nice if it could be:

files: [
    {
        root: __dirname,
        src: 'here/some.js',
        dest: 'there/',
        gzip: true,
        compressionLevel: 9,
        headers: {
            'Content-Encoding': 'gzip'
        }
    }
]

peerDependencies dont allow me install the package

$ npm i --save level s3-sync readdirp
npm WARN peerDependencies The peer dependency [email protected] included from s3-sync will no
npm WARN peerDependencies longer be automatically installed to fulfill the peerDependency 
npm WARN peerDependencies in npm 3+. Your application will need to depend on it explicitly.
npm WARN deprecated [email protected]: Please update to the latest object-keys
|
> [email protected] install /home/rkmax/Development/BIX/Velo/node_modules/level/node_modules/leveldown
> prebuild --download

npm ERR! Linux 4.1.6-1-ARCH
npm ERR! argv "/home/rkmax/.nvm/versions/node/v0.12.7/bin/node" "/home/rkmax/.nvm/versions/node/v0.12.7/bin/npm" "i" "--save" "level" "s3-sync" "readdirp"
npm ERR! node v0.12.7
npm ERR! npm  v2.11.3
npm ERR! code EPEERINVALID

npm ERR! peerinvalid The package level does not satisfy its siblings' peerDependencies requirements!
npm ERR! peerinvalid Peer [email protected] wants [email protected]

npm ERR! Please include the following file with any support request:
npm ERR!     /home/rkmax/myproject/npm-debug.log

Make the cache remote

Instead of maintaining your own local cache db, have the sync module:

  1. Check for the cache db on S3. If exists, download and start it up
  2. Check the delta and do the sync
  3. Push the db back to S3 and remove the local copy

Does this make sense? Doing a little research today and found some other folks using this approach. It's nice because the module handles the caching for you and you don't have to deal with any local files.

Option to define my own filename/path for cache invalidation purposes

We use a system where we generate temporary files (gzip) that we want to upload, but for the purposes of cache invalidation we want them to be counted as the files we're gzipping, if we weren't gzipping them before and then choose to now.

Some way to define the file name and/or path that we think we're uploading, separate to the file name and/or path that we know we're uploading would be a pretty awesome feature to have!

allows custom url

var destination =
          protocol + '://'
        + subdomain
        + '.amazonaws.com/'
        + options.bucket
        + '/' + relative

I am using a private s3 service, allow custom url is helpful

Files are not recognized as changed if they are modified with something other than s3-sync

If I mix tools like s3cmd and s3-sync, I can end up PUTting a file without including the "x-amz-meta-syncfilehash" header. Then subsequent calls to s3-sync won't recognize the file as changed and just glides silently past it.

I'm sorry for being a bad citizen and not providing a simple test case. I'm actually using this through grunt-s3-sync, so it's a little awkward. I do have a proposed fix, though, that works for me. It's over at my fork so I guess I'll make a pull request.

Unsure which file has the issue when I get Error: Bad status code 400

>> [uploaded] https://s3.amazonaws.com/afr-prod/projects/politics/img/maps/Perth.png
Error: Bad status code: 400
>> [uploaded] https://s3.amazonaws.com/afr-prod/projects/politics/img/maps/Sydney.png

Would be nice to know what file got the 400, and if it's retrying or silently failing.

hash comparisons with s3 headers

Hey mate,

Just been updating some apps to use this module (thanks!) and ran in to an issue where the files were always being re-uploaded even if they already exist. Looks like it's because the S3 ETAG header never matches the generated md5:

if (res.statusCode === 404 || res.headers.etag !== '"' + details.md5 + '"') return uploadFile(details, next)

Due to hashFile including the headers and destination in the hash. Commenting out these lines fixes it:

    hash.update(JSON.stringify([
        options.headers
      , destination
    ]))

It might be better to just hash the file contents at first to compare the ETAG and then update it again with the metadata before storing in leveldb. What do you think?

If I get some time this week I'll take a look and send a PR :)

files are uploaded as public-read-write

see: https://github.com/hughsk/s3-sync/blob/master/index.js#L114
and the corresponding docs: http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#CannedACL

The docs say this about public-read-write:

Owner gets FULL_CONTROL. The AllUsers group gets READ and WRITE access. Granting this on a bucket is generally not recommended.

And the AllUsers group is defined as this:

Access permission to this group allows anyone to access the resource. The requests can be signed (authenticated) or unsigned (anonymous). Unsigned requests omit the Authentication header in the request.

So if I'm reading this correctly, anyone on the internet can overwrite files uploaded by this tool. That sounds like a pretty serious security issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.