Giter VIP home page Giter VIP logo

rustus's Introduction

logo

Docker Image Size (latest by date) Docker Image Version (latest semver) GitHub

Production-ready TUS protocol implementation written in Rust.

Features

This implementation has several features to make usage as simple as possible.

  • Rustus is robust, since it uses asynchronous Rust;
  • It can store information about uploads not only in files;
  • You can define your own directory structure to organize your uploads;
  • It has a lot of hooks options, and hooks can be combined;
  • Highly configurable.

Please check out Documentation for more information about configuration and deploy.

Installation

You can install rustus by 3 different ways.

From source

To build it from source rust must be installed. We rely on nightly features, so please switch to nightly channel before building.

rustup update nightly
git clone https://github.com/s3rius/rustus.git
cd rustus
cargo install --path .

Binaries

All precompiled binaries available on github releases page. You can download binaries from here, unpack it and run.

./rustus

Make sure that you download version for your cpu and os.

Using docker

One of the most simple ways to run rustus is docker.

Rustus has two containers for each version.

  1. debian based image
  2. alpine based image

Alpine based images are more lightweight than debian

To run rustus you just need to run this command

docker run --rm -p "1081:1081" -d s3rius/rustus --log-level "DEBUG"

To persist data you can mount volume to /data directory

docker run --rm -p "1081:1081" -d s3rius/rustus --log-level "DEBUG"

rustus's People

Contributors

ciceronriman avatar dependabot[bot] avatar happysalada avatar kaelten avatar kolaer avatar ryanrussell avatar s3rius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rustus's Issues

Add sentry integration.

Sentry is a useful tool for finding out errors. Rustus is used in production in companies, so it will be useful to have.

Sync files after write.

I faced an issue when writing to file can cause eventual freezing of a whole application.

Maybe it can be fixed by calling sync_data on files after write.

Add pre-terminate hook

It's would be super nice if rustus will run pre-terminate hook an it would be blocking. Because now you can delete files freely without verifying authorization.

Add healthchecks

It's mandatory to have liveness and readiness probes.
The main reason for that is k8s integration. It must be implemented.

server responsed with empty location

fyi, I was trying tusc-sh which is a shell implementation of tus to try to upload to rustus but had some problems
I created the following issue adhocore/tusc.sh#21
I wonder if it's because rustus should respond with the location or if the tus protocol has changed.

Healthcheck URL 404s

When working towards deployments I went and checked the /health route and at least for me locally 0.5.6 is return a 404

Am I misreading the code on how I should query it?

❯ curl localhost:1081/health -v
*   Trying 127.0.0.1:1081...
* Connected to localhost (127.0.0.1) port 1081 (#0)
> GET /health HTTP/1.1
> Host: localhost:1081
> User-Agent: curl/7.79.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< content-length: 0
< tus-resumable: 1.0.0
< tus-version: 1.0.0
< access-control-expose-headers: tus-extension, upload-concat, location, tus-version, tus-checksum-algorithm, tus-resumable, upload-offset, content-length, upload-length, tus-max-size, content-type, upload-metadata, upload-defer-length
< vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
< date: Fri, 22 Jul 2022 23:15:46 GMT

ability to customize url ?

Hey, I haven't thanked you yet, so a big thank you for this repo, everything is working like a charm!

I have a question.
When uploading some documents like docx, when trying to download them, browsers are confused. without an extension, they don't know what file type those are.
Is there a way to customize the url to add the extension at the end ?
currently it's just a uuid,, but enabling adding a file name when downloading the file would be amazing.
I know this can be done on the client. But how can we keep the original file name that was uploaded. Is it stored somewhere ? enabling a download of the exact filename that was uploaded would be amazing.

Maybe there is already a way, just not aware of it.

Application examples?

Is rustus supposed to be exposed to the world or proxied by a "domain-specific" web service?

Did I understand well from this example:

rustus/docs/hooks.md

Lines 852 to 855 in 3206250

print(f"Received: {hook_name}")
if authorization != "Bearer jwt":
raise HTTPException(401)
return None

That hooks could be used to authenticate file uploads?

I think it would be interesting as a narrative example to unroll a whole demo app that leverages rustus. Or give more context on why it was built?
Or for example, answer questions like "if I want to built a WeTransfer clone, what is the intended way to integrate rustus?"

If you're willing to give me a few pointers, I'd be glad to contribute something :)

Extra path separator when using `RUSTUS_URL=/`

When I use '/' for the path instead of /files things work, but the path that is generated for the files contain an extra /. Here's a log snippet that shows the issue.

rustus  | [2022-06-22][01:03:13+00:00][INFO] "POST / HTTP/1.1" "-" "201" "172.19.0.1" "66.578333"
rustus  | [2022-06-22][01:03:14+00:00][INFO] "PATCH //10ac0e0e-d67d-4e2c-880b-1346793008b5 HTTP/1.1" "-" "204" "172.19.0.1" "263.382625"

In these situations I've found that using a url join library often is the cleanest way to eliminate duplications and ensure urls are built properly. Failing that doing a regex replace so that /+ is replaced with / works fairly well as well.

Pre-allocate files

The main idea is to create files with the same size as the result and write bytes using given offset.
It may be useful for avoiding problems with storage capacity.

error with https tls using hybrid-s3 storage

i am deployed rustus with helm chart providing changing only these values

env:
  RUSTUS_DIR_STRUCTURE: "{year}/{month}/{day}"
  RUSTUS_MAX_BODY_SIZE: "100000000"
  RUSTUS_MAX_FILE_SIZE: "1000000000"
  RUSTUS_LOG_LEVEL: "debug"
  RUSTUS_STORAGE: "hybrid-s3"
  RUSTUS_S3_URL: https://s3.eu-central-1.amazonaws.com
  RUSTUS_S3_BUCKET: my-bucket-name
  RUSTUS_S3_REGION: "eu-central-1"
  RUSTUS_S3_ACCESS_KEY: "<AWS_ACCESS_KEY>"
  RUSTUS_S3_SECRET_KEY: "<AWS_SECRET_KEY>"
  RUSTUS_HOOKS: "post-finish"

persistence:
  enabled: true

  existingClaim: "rustus-pvc"


ingress:
  enabled: true
  className: "nginx"
  annotations: 
    kubernetes.io/ingress.class: nginx
    cert-manager.io/issuer: "letsencrypt-prod"
    # kubernetes.io/tls-acme: "true"
  hosts:
    - host: rustus.mydomain.com
      paths:
        - path: /
          pathType: Prefix
  tls: 
   - secretName: rustus-tls-secret
     hosts:
       - rustus.mydomain.com

Everything deployed as expected including cert issuing by my cert maker but that must not related to error

Some info about cluster. Its quite simple eks cluster with nginx ingress controller cert-maker ebs-csi-controller

But on the cluster i got this


[2023-11-19][17:08:52+00:00][DEBUG] Starting uploading f44d552d-1d58-4650-99fb-c5dd40b79f68 to S3 with key `2023/11/19/f44d552d-1d58-4650-99fb-c5dd40b79f68`
[2023-11-19][17:08:52+00:00][DEBUG] starting new connection: https://my-bucket-name.s3.eu-central-1.amazonaws.com/
[2023-11-19][17:08:52+00:00][DEBUG] resolving host="my-bucket-name.s3.eu-central-1.amazonaws.com"
[2023-11-19][17:08:52+00:00][DEBUG] connecting to <some ip>:443
[2023-11-19][17:08:52+00:00][DEBUG] connected to <some ip>0:443
[2023-11-19][17:08:52+00:00][ERROR] Found S3 error: reqwest: error sending request for url (https://my-bucket-name.s3.eu-central-1.amazonaws.com/2023/11/19/f44d552d-1d58-4650-99fb-c5dd40b79f68): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1919: (unable to get local issuer certificate)
[2023-11-19][17:08:52+00:00][DEBUG] Error in response: S3Error(Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("my-bucket-name.s3.eu-central-1.amazonaws.com")), port: None, path: "/2023/11/19/f44d552d-1d58-4650-99fb-c5dd40b79f68", query: None, fragment: None }, source: hyper::Error(Connect, Ssl(Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 337047686, library: "SSL routines", function: "tls_process_server_certificate", reason: "certificate verify failed", file: "ssl/statem/statem_clnt.c", line: 1919 }]))) }, X509VerifyResult { code: 20, error: "unable to get local issuer certificate" })) }))

its kind a openssl related error? Due to connection to s3 on https?
Or maybe i just misunderstanding configuration and i need provide more envs in prod like token session etc?
And please proof me if i wrong in using s3 url

(i am also cloned original repo and tested locally with this env and s3 syncing, all works as expected without errors and i can see my files on s3 side)

Also, in any case, I want to thank you for the work done, this project is very cool, well thought out and interesting!
Thanks in advance for any help.

Request: Option to disable health check access logging

When I deploy services, I prefer to filter out the logging of the health check endpoint since these are so often spammy and not representative of actual performance or issues. Having a toggle in Rustus to enable/disable logging for that one endpoint would be great.

add env var for url restriction

It would be good to have an environment variable to specify that uploads can only happen from a certain url.
so
RUSTUS_UPLOAD_URL="https://example.com"
to make sure the uploads only come from that url or are rejected.
Just for security reasons, since as far as I understand, right now, anyone can upload files to a tus url without restrictions.

Switch testing macro to assay

This improvement will allow to create setup and teardown for tests, get rid of creating directories by hand and much more.

Implement S3 data storage.

Create S3 Data storage to be able to store files on S3.

The main idea is to create many file chunks 5MB each and concatenate them after upload is finished.

Webhook Documentation Suggestions

While understanding hooks I ran across a few thoughts.

  1. The documentation provided for the schema doesn't include the variations of possible values/types depending on when the hook is triggered.
  2. The fastapi example could benefit from providing the pydantic models needed to parse the api.
  3. It strikes me as odd that only URI is uppercase, guessing this is an encoding artifact based on how rust's serialization works
  4. I was a little surprised that the hook name wasn't available in the payload, but not a huge deal either way.

To help address the first two thoughts, here is the schema I've reversed engineered from watching the hooks over several types of uploads. I'm not sure I've captured all the variations yet.

import datetime
import uuid

from pydantic import BaseModel, conint, Field

class UploadInfo(BaseModel):
    id: uuid.UUID
    offset: conint(ge=0)
    length: conint(ge=0) | None
    path: str | None
    created_at: datetime.datetime
    deferred_size: bool
    is_partial: bool
    is_final: bool
    parts: list[uuid.UUID] | None
    storage: str
    metadata: dict


class RequestInfo(BaseModel):
    uri: str = Field(alias='URI')
    method: str
    remote_addr: str
    headers: dict


class RustusPayload(BaseModel):
    upload: UploadInfo
    request: RequestInfo

I'd also suggest providing example payloads for each event type that covers the variations or at least document when a field can be in different states. i.e. parts is None unless it's a concat finalization in which it gives the ids of the other parts.

Fix memory leaks in debian images

I found something interesting while using rustus for really large uploads.
image

At some point rustus was using up to 2Gi per instance.

Add rootless images

It's more secure to have rootless containers.
We need to have rootless debian and alpine.

CORS Support

In working up a POC with rustus and uppy, uppy is unable to upload directly to rustus because of a CORS error. scanning through the code, it appears that rustus doesn't have native cors handling, is that correct?

Read secrets from paths

Hi, thanks for the awesome software !

Would it be possible to have an option to read secrets from a path instead of directly from a value ?

The reason i ask this is that both docker secrets and systemd load credentials which are systems to manage secrets both want to read from a path rather than from a naked value.

So just adding an option to read a secret from a path would be great for security

Architecture improvements

At this point project is released, but we have a poor architecture decisions that were made to develop the application fast.

At first we must separate InfoStorage and Storage from each other. Because today FileStorage makes way more things than just storing data on a file system.

The main idea is to rebuild Storage trait that would use FileInfo structure instead of a file_id string and remove get_file_info method.
Also we have to save InfoStorage as a part of the actix state in order to use it in controllers. It gives us more flexibility and makes it easy to develop new storage types.

Helm publishing

We need to create a helm repo in github pages environment somehow.

It's required if we want to be able to use helm chart without cloning this repo.

Update ENV names.

When deploy to kubernetes with container named rustus it creates env variable with name "RUSTUS_PORT" and breaks application.

Possible solution is to rename server related envs to

  • RUSTUS_SERVER_PORT;
  • RUSTUS_SERVER_HOST.

LRU cache for files

Maybe it'll be a great idea to have some kind of LRU cache to hold recently created/written files. That way we'll skip file opening in each request if we touched specified files recently

Unexpected hook behaviors.

I've been testing hooks for the last little while and there's a few things that are not behaving as I expected.

I'm using the webhook settings with all hooks enabled and in both of the below log segments I'm logging the additional info with this line.

f'    {payload.upload.id}: hook {hook_name}; method {payload.request.method}; is_final: {payload.upload.is_final}; is_partial: {payload.upload.is_partial};'

When doing a simple upload without concat I see the following logs:

    8f40fed6-4431-4ee3-8506-2e08871af41d: hook pre-create; method POST; is_final: False; is_partial: False;
INFO:     127.0.0.1:63439 - "POST /rustus/callback HTTP/1.1" 200 OK
    8f40fed6-4431-4ee3-8506-2e08871af41d: hook post-create; method POST; is_final: False; is_partial: False;
INFO:     127.0.0.1:63439 - "POST /rustus/callback HTTP/1.1" 200 OK
    8f40fed6-4431-4ee3-8506-2e08871af41d: hook post-receive; method PATCH; is_final: False; is_partial: False;
INFO:     127.0.0.1:63439 - "POST /rustus/callback HTTP/1.1" 200 OK
    8f40fed6-4431-4ee3-8506-2e08871af41d: hook post-finish; method PATCH; is_final: False; is_partial: False;
INFO:     127.0.0.1:63439 - "POST /rustus/callback HTTP/1.1" 200 OK

When doing an upload using concatenation of two parts I see this as the logs:

    dad05558-2c08-4250-aaea-8238736c7ebb: hook pre-create; method POST; is_final: False; is_partial: True;
    46c25c7d-6bfe-4486-a07e-9210cbd6f3d2: hook pre-create; method POST; is_final: False; is_partial: True;
INFO:     127.0.0.1:62958 - "POST /rustus/callback HTTP/1.1" 200 OK
INFO:     127.0.0.1:62959 - "POST /rustus/callback HTTP/1.1" 200 OK
    dad05558-2c08-4250-aaea-8238736c7ebb: hook post-create; method POST; is_final: False; is_partial: True;
INFO:     127.0.0.1:62959 - "POST /rustus/callback HTTP/1.1" 200 OK
    46c25c7d-6bfe-4486-a07e-9210cbd6f3d2: hook post-create; method POST; is_final: False; is_partial: True;
INFO:     127.0.0.1:62960 - "POST /rustus/callback HTTP/1.1" 200 OK
    dad05558-2c08-4250-aaea-8238736c7ebb: hook post-finish; method PATCH; is_final: False; is_partial: True;
    46c25c7d-6bfe-4486-a07e-9210cbd6f3d2: hook post-finish; method PATCH; is_final: False; is_partial: True;
INFO:     127.0.0.1:62960 - "POST /rustus/callback HTTP/1.1" 200 OK
INFO:     127.0.0.1:62959 - "POST /rustus/callback HTTP/1.1" 200 OK
    cf318548-2c74-4a4f-bf02-c528f248e9d5: hook pre-create; method POST; is_final: True; is_partial: False;
INFO:     127.0.0.1:62962 - "POST /rustus/callback HTTP/1.1" 200 OK
    cf318548-2c74-4a4f-bf02-c528f248e9d5: hook post-create; method POST; is_final: True; is_partial: False;
INFO:     127.0.0.1:62962 - "POST /rustus/callback HTTP/1.1" 200 OK

A few things that surprised me:

  • None of the hooks for the simple upload are labeled as is_final
  • There is only a pre-create and post-create event for the concatenation call.
  • Only the concat related events are set as is_final
  • There's no post-receive events for partial uploads

This leads me to the following thoughts, and also wondering if these are correct assumptions:

  1. Fragmented uploads (concat enabled) are identifiable by looking at the is_partial flag.
  2. To understand upload progress I'd have to look at the offset/size information provided through post_receive hooks.
  3. Since concat enabled uploads don't get post_receive hooks I can't tell what's going on with them as they upload.
  4. To understand when a upload is finished I have to look for post-create hooks that have is_final: True as well as post-finish hooks.

Is all of the above correct? Are any of the above pointing to small bugs or logic errors in how the hooks are triggered?

Make state cloneable.

In order to achievethis we must make all components of a state cloneable.

This may help to make rustus even faster.

Add tests

We have no tests in this application wich is really bad for such an important piece of software.

Concatenation extension regression in 0.5.3

When uploading with the concatenation extension against 0.5.2 things work as expected. However in 0.5.3 the combined file stays 0 bytes and the parts are not removed. I've enabled debug logging, and there's nothing in the logs to indicate a failure or issue.

rustus  | [2022-06-23][00:40:05+00:00][INFO] "PATCH /60a7d100-2b2b-447b-8c48-c080f0b4e6da/ HTTP/1.1" "-" "204" "172.19.0.1" "302.296001"
rustus  | [2022-06-23][00:40:05+00:00][INFO] "PATCH /1009fb2b-263e-4cea-9019-d8a879155417/ HTTP/1.1" "-" "204" "172.19.0.1" "307.389292"
rustus  | [2022-06-23][00:40:05+00:00][INFO] "POST / HTTP/1.1" "-" "201" "172.19.0.1" "5.008000"

Kafka hooks

It would be really cool to have Kafka as a new hook type.

Add prometheus metrics

It would be really nice to add prometheus metrics to rustus.
In this issue we discuss which metrics we want to see on /metrics page.

They can be later used in frontend page.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.