tiki-archive / l0-storage Goto Github PK

View Code? Open in Web Editor NEW

2.0 1.0 1.0 2.91 MB

Long-term (10+ years) immutable (WORM) low-frequency backup via a shared, cloud-hosted bucket.

Home Page: https://docs.mytiki.com/reference/l0-storage-upload-post

License: MIT License

Dockerfile 3.88% HCL 5.82% Java 90.30%

layer0 storage immutable

l0-storage's People

Contributors

Stargazers

Watchers

Forkers

tedmarov

l0-storage's Issues

Require a customer ID on policy gen. requests

We need a customer ID on all policy generation requests.

Requests should fail with a 401 if the ID is missing or invalid
Requests should succeed if ID is valid

Add lake storage config to l0 storage

In addition to controlling read/writes to our shared licensing repository, L0 Storage should be expanded to perform the same type of functionality for hosted data lakes.

Customers should be able to:

use their publishing ID to request a temporary write token
use the write token to write data to the lake
use their private key to read data from their lake

Add Function to lookup usage based on customer ID

A single customer ID may span millions of addresses, we need a method to quickly lookup the total consumption for a given ID.

Request requires a valid token for security.

Add support for a developer sandbox

Developers should be able to test & fuck around without muddying up their production backup/storage.

Requirements:

Separate wasabi bucket that works the same as the prod bucket
Set compliance mode (immutability) to 30 days (so we can periodically purge)
The same worker should be able to toggle between prod/sandbox based on the JWT claims

Update API ID logic to use l0-auth (JWTs)

API IDs are moving to the l0-auth service, meaning the l0-storage service needs to be updated to use JWT validation (with audience validation) for the token routes.

Update worker/upload config to support minio

The current worker/upload config only supports https for where the bucket name is also the dns name.

This is incompatible with minio in a docker-compose setup.

The worker should support an optional environment variable override for use in development that toggles https to http.

Note: The minio compose config may additionally need adjusting to define the bucket name, as well as additional toml configs for the worker. Requires testing for confirmation.

Add signature validation for policy generation

The client must provide a valid signature to match the address for which it is requesting a policy for.

note: (the address is the public key).

if not a valid signature return 401 unauthorized

Refactor repo structure per updated conventions

The following changes should be made:

the /docker folder should be scrapped.
the dockerfiles should all move into their respective src folders and named Dockerfile
there should be a new folder added called /mock/auth which contains the mocked auth code (jwks.json) and it's Dockerfile
docker-compose.yaml should live at the root
worker/upload should move to its own repository name l0-storage-upload
docker-compose should be updated to move minio config to the new l0-storage-upload repo docker-compose

Create a hosted cloud function for POST policy generation

We want to take the POST policy generation code out from the bouncer repository and move it into a digital ocean function.

This will a) decrease costs and b) improve scalability

Record a log of the address + customer ID pair on policy generation

When successfully generating a policy, record the address and customer ID to the database.

We can then use this record to determine the relationship between address & customer IDs at any given moment enabling usage monitoring, optionally address pinning, and other features.

Report writes to L0 Index

Upon each successful write, an index report should be submitted to index.l0.mytiki.com/api/latest/report

Add JS Tests to GH Actions

Add worker unit tests to the automated tests (& reporting) action

Support for bulk ingests

Our typical data ingestion model is asynchronous and incremental.

However, occasionally data suppliers have or are accruing zero-party data without using TIKI, but still want to leverage the power of TIKI's data pooling.

Data will provided in bulk (likely in csv, or parquet) to a bucket (likely s3) where it should be cataloged by TIKI. If the data is re-serialized to fit TIKI's format, the original should then be discarded to avoid duplicate storage.

Add readme

see tiki-sdk-dart for template.

direct customers to website for hosted service.

Setup Cloudflare Worker as a proxy

We need an network-edge deployed proxy function that can report back block write sizes so we do not need to constantly keep traversing the entire storage repo to compute use by account id.

Add Terraform script for deploying infrastructure

Configure GH action to automate/manage deployment using the terraform script.

Infrastructure includes:

DB
Functions
DNS
Possibly Wasabi

Local Dev Environment

Task List

Need to get the worker function running in a container with CloudFlare's wrangler CLI.
Dockerfile under worker/upload needs to be finished so that NodeJS app can run in a container.
Container for MinIO needs to be finished so that we can use it instead of production storage (Currently uses the S3-compatible Wasabi SaaS).
Couple of code and settings changes to point the services to the proper local endpoints.

Use a custom post policy to force clients to use TIKI proxy

We want to create our own simplified version of the S3 post policy to force all clients to use TIKI's proxy to write to the buckets so we can accurately log utilization.

Add Sentry to Worker

As a developer I want to see error reporting (and ideally performance metrics) on our serverless worker functions.

HOWEVER

Sentry does not offer a pure JS SDK for serverless; they have a nodejs SDK with wrappers for Lambda/GCP.

But! Cloudflare workers are special (no cold start) and don't really support a node runtime (in beta and a bit wonky).

One option is to take inspiration from the node & pure js SDKs and create a new JS serverless SDK using their API
A second possible option is to investigate migrating the worker code to Rust and using the Sentry Rust SDK. It's unclear without more investigation IF this is viable.

Move upload worker into it's own repo

Upload worker should be split out into it's own repository with it's own automated release process

Req:

Move worker into repo called l0-storage-worker-upload
Add JEST tests & workflow automation
Add CF release workflow automation

Move to l0auth from Auth0

As a developer I want to be able to create an API ID from the console.

Requirements:

accept JWTs from l0auth
remove Auth0 (not needed anymore)

Migrate S3 API to Wasabi

For cost savings (and to get off AWS) we want to move from S3 to Wasabi for object storage.

API's should be identical and compatible.

Wasabi account needs to be set up
Wasabi needs to be tested for Object Lock (with legal hold)
Wasabi needs to be tested for POST policies (https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-HTTPPOSTConstructPolicy.html)

Expand storage to allow application level docs

Customers should be able to upload immutable docs like terms of service or contract terms and conditions at the application layer (above the scope of a single address).

This will remove duplicative data stored at the transaction layer, both improving performance and making it easier for customers to manage, update, and deploy contract terms.

This is a technical spec/design story, not an implementation story. Implementation stories will be created upon sign-off on the technical spec.

Add Function & SQL for customer ID registration/expiration

Add a cloud function with a corresponding database for registering & expiring customer IDs.

Security will be handled thru JWT validation, issued by Auth0 (maybe supertokens).

Registrant ID should be stored alongside the customer ID for future features like billing.