Giter VIP home page Giter VIP logo

archive-upload's Introduction

archive-upload

Script to compress, archive and upload data files to AWS S3. The project that inspired creation of this script was a project to record electric utility powerhouse data (e.g. voltage, frequency, power) using a Raspberry Pi. The data collection script records measurement values in CSV data files and records application error and other logged information into text log files. This script then compresses those files, stores them locally on the Pi in archive directories and also uploads the files to an AWS S3 bucket.

Features

Some of the features of this script are:

  • The script can watch multiple directories for data files to archive and upload. File patterns can be used to identify specific data files in those directories.
  • The script can be configured to only archive/upload files that have not been modified for a specified amount of time. For example, if a data collection script writes to a data file for each day, the archive script can be told to not archive the file until it has not been modified for 30 minutes, for example.
  • If the script fails to upload files to AWS S3 due to loss of Internet connectivity or any other reason, uploads will be retried each subsequent run of the script until successful uploads are achieved.
  • The script uses the bz2 compression algorithm, generally providing high levels of compression when applied to text data files. These compressed files are archived locally and uploaded to AWS S3. The original data file is deleted after successfully archiving the file on the local system.

Requirements

This is a Python script requiring Python version 3.6.x or above. F-strings are used forcing use of Python 3.6 or above.

Some third party Python packages are required and are identified in the requirements.txt file. These packages can be installed with the command:

sudo pip3 install -r requirements.txt

assuming "pip3" is the Python 3 pip command.

The script uses the boto3 package to upload to AWS S3 buckets. boto3 expects to find suitable credentials in the ~/.aws directory or in environment variables, as described on this AWS page. . The credentials must allow for writing to the S3 buckets that are identified in this script's configuration file.

This script is meant to be run periodically through use of cron or a similar scheduling tool.

Usage

Here is an example cron job line to run the script:

20 * * * * /usr/bin/python3 /home/pi/archive-upload/archive-upload.py /home/pi/archive-config.yaml

The script will run every hour at 20 minutes past the hour. There is one required command line argument for the script, which is the path to the script configuration file. The format of this configuration file is discussed in the next section.

Configuration File

A configuration file controls the operation of the script. The configuration file is in YAML format. Indentation matters in the YAML format, so try to follow the indentation in the sample configuration file exactly.

Here is a sample configuration file that shows all possible configuration options. Documentation for each of the configuration options is provided in this sample file and will not be repeated here.

The configuration file can have any name and can be located anywhere on the system. The full path to the configuration file is a required command line argument for the script.

archive-upload's People

Contributors

alanmitchell avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.