profianinc / benefice Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 8.0 640 KB

Demo workload executor

License: GNU Affero General Public License v3.0

Rust 49.43% HTML 49.43% Nix 1.14%

benefice's People

Contributors

Stargazers

Watchers

Forkers

rvolosatovs npmccallum nickvidal haraldh bstrie puiterwijk dpal rjzak

benefice's Issues

Get the users stars using the Auth0 API

We should also use the users token when retrieving the stars. This will avoid rate limiting.

Drawbridge integration

It would be a nice improvement to automatically create a Drawbridge user, repository and tag on upload of files and pass a slug to that Drawbridge artifact to enarx rather than actual contents.

Show listening ports

When a workload is running and listening on certain ports it would be good to tell that to the user/provide links.

There should also be warnings if we expect there to be browser warnings.

Fix some more misc frontend issues

Fix the navbar hamburger not doing anything (appears on smaller displays)
Fix being able to submit empty workloads
Fix not being able to hide the message box when warnings occur.
Make the Enarx.toml text area larger in relation to the screen height.
Add syntax highlighting to Enarx.toml.

Enable SNP Attestation

Basically, this means we need to fetch the VCEK for each server. Depends on profianinc/infrastructure#12.

Workload deployment fails with no output

Sometimes workload deployment fails with no output, however the URL changes to https://snp.equinix.try.enarx.dev/?message=no_session

Frontend messaging

There should be some explanation of what the page is and some relevant links.

Secured by Profian

The main page needs "secured by Profian" branding.

Login functionality

Users should be able to log in to access #5

The login page already exists and is provided by our identity provider. All that is required here is the code-based OpenID-Connect login flow via the browser.
On successful authentication, token (potentially encrypted, although not sure if that's necessary) is stored in the cookie.

We need to add 2 routes for this:

/login -> on request, intialize the OpenID Connect code-based login flow
/authorized or /login/success, whichever you prefer and/or is simpler to implement -> on request, exchange the code from parameter into an access token, store in a cookie

Backend is configured with an oidc_issuer, oidc_client and oidc_secret. https://github.com/profianinc/drawbridge/blob/715193d53ae294b306bf37c9fea5e0686fc8fc5a/src/main.rs#L18-L138
At startup, the backend performs discovery on the oidc_issuer, constructs the client and stores it in an extension https://github.com/profianinc/drawbridge/blob/715193d53ae294b306bf37c9fea5e0686fc8fc5a/crates/app/src/builder.rs#L53-L60
On request to /login retrieve the authorization URL from the stored client via https://docs.rs/openidconnect/2.3.1/openidconnect/struct.Client.html#method.authorize_url and redirect the user to authorization URL. See example here https://docs.rs/openidconnect/2.3.1/openidconnect/index.html#getting-started-authorization-code-grant-w-pkce
On request to /authorized (or /login/success), parse code parameter and exchange it into a token using the client. Very rough example:

let uri = req.uri();
let query = uri.path_and_query().unwrap().query().unwrap();

let params = query.split('&').collect::<Vec<&str>>();
let code = params
    .iter()
    .find(|p| p.starts_with("code="))
    .unwrap()
    .trim_start_matches("code=");

// Now you can exchange it for an access token and ID token.
let token_response = client
    .exchange_code(AuthorizationCode::new(code.to_string()))
    .request(http_client)
    .unwrap();

// Extract the ID token claims after verifying its authenticity and nonce.
let access_token = token_response.access_token();

println!("Token: {}", access_token.secret()); // This should be stored in the cookie and possibly encrypted

On request to endpoint requiring authorization, the backend extracts the token from the cookie header and performs user info request using the stored client https://github.com/profianinc/drawbridge/blob/715193d53ae294b306bf37c9fea5e0686fc8fc5a/crates/app/src/auth/oidc.rs#L78-L111 If the user info request fails, the endpoint returns 403

Validate Enarx.toml in realtime

It would be much nicer if we could warn the users and disable the deploy button until all issues in the Enarx.toml are fixed.

Validate for:

Port conflicts
Missing file descriptors

This should be done using

enarxTomlEditor.session.on('change', function() {
    // set a flag for the refreshEnarxToml to see.
    // which gets triggered every second.
});

Create some built in examples

It would be nice if there were some simple built in example files which users could select to run. This would help people understand some of the Enarx.toml syntax and whats possible. There should also be link(s) to the sources of the examples.

Collision-Resistant Listening Sockets

Currently, all workloads are deployed in the same network namespace. This means that listening sockets are likely to collide, even if chosen randomly. We should come up with some way to make this more reasonable.

One such proposal is to allocate a block of public IP addresses to the server and start each workload in a separate network namespace with an ipvlan interface.

Fix short-lived workloads having no output

If a workload starts and stops too quickly then the user may not have time to see its output. This could be very confusing for users and will often effect the simplest workloads.

Track open ports to mitigate port conflicts

One potential way of mitigating port conflicts is to only allow benefice to deploy repos with public Enarx.tomls and let users know that a port is reserved.. this doesn't stop users from stepping on each other but helps until a real solution is created.

Workloads listening on sockets do not start

When running locally on KVM any socket listening configuration makes the job hang forever with no output. For example, adding

[[files]]
kind = "listen"
prot = "tcp"
port = 9000

to application:

fn main() {
    println!("Hello, world!");
}

compiled as cargo build --target wasm32-wasi

produces no output in the log (with RUST_LOG=info) and no output in the web terminal window.
Running the workload without the socket produces "Hello, world!" in the web terminal window

Enhance last page

Enhance last page, when deploying a Wasm workload, so users can:

Deploy workload to another platform
Learn how to use WebAssembly from various programming languages
Explore demos to run on Enarx
Join the Enarx community

Scheduling FIFO queue

We want a few properties from the queue:

FIFO
Persistence and fail-safety, once a workload is scheduled by the user it does not get lost even on server crashes
Efficiency and scalability, N workloads should be able to be safely scheduled concurrently and M workload executors should be able to pick elements from the queue concurrently
At most one element per GitHub ID, that means that the data structure should do some bookkeeping

The actual queue items should be just Drawbridge slugs, which are resolved at execution time

I propose to use a Redis stream https://redis.io/docs/manual/data-types/streams/ for the workload queue and possibly pair it with a hash set for bookkeeping. This would provide us with a very robust solution, which we can trivially scale (by just starting more instances) and easily debug (by looking into the Redis queue). I have extensive experience working with Redis doing almost exactly this, so I'd be happy to pick this up

Improve deploy button

The deploy button has the following problems:

It is verbose
It is inconsistently capitalized
It is underneath the wasm uploader widget

I suspect the page would look better if the deploy button was right-aligned. For example:

[wasmfile] <---- space ----> [deploy]

Stream input/output over websockets

Instead of polling for stderr and stdout we should just create a websocket connection.

Run Enarx in a Container

It would be good for us to launch a separate container for each Enarx instance. This means we can have an official container release and dogfood it in benefice. It might also help in solving the problem in issue #20 (since docker and podman claim to support ipvlan networks).

Support linking to workloads

Adding this proposal as a starting point for a discussion around the UI to load workloads from GitHub/Drawbridge.

On the running page (and on the upload page), have a Wasm gallery like:

Selecting a demo will populate the 'Load URL' field on the upload page:

Refresh tokens

Sometimes the tokens stored in the cookie can get outdated, when that happens, the web app should refresh the token on behalf of the user.
Currently this manifests as authentication failure on scheduling and requires a relogin.

"Starred" functionality does not work

Jul 22 19:15:06 sgx benefice[63026]: 2022-07-22T19:15:06.445900Z ERROR benefice: Failed to get stars for user github|7810941: https://api.github.com/user/---/starred: Connection Failed: Connect error: connection timed out
Jul 23 12:04:07 sgx benefice[63026]: 2022-07-23T12:04:07.406001Z ERROR benefice: Failed to get stars for user github|773636: https://api.github.com/user/---/starred: Connection Failed: Connect error: connection timed out
Jul 24 23:10:21 sgx benefice[63026]: 2022-07-24T23:10:21.168524Z ERROR benefice: Failed to get stars for user github|7810941: https://api.github.com/user/---/starred: Connection Failed: Connect error: connection timed out
Jul 24 23:10:47 sgx benefice[63026]: 2022-07-24T23:10:47.140827Z ERROR benefice: Failed to get stars for user github|7810941: https://api.github.com/user/---/starred: Connection Failed: Connect error: connection timed out

Test the endpoints

There should be tests for the existing endpoints including authentication.

Protect a Job from Other Users

Once we have authentication (#6), all created jobs should keep track of the users that created them. Only the user that creates the job should have access to its state.

Job cancellation

There should be an API that allows a user to stop any of their running jobs. This can be integrated into the dashboard in #5.

Jobs should also be killed automatically if a user logs out.

Cache user info to prevent rate limiting

If you hit the /user_info endpoint on auth0 too many times you will get rate limited and the app will break. We should either cache and query as infrequently as possible.

Handle errors in a more user friendly way

When the user isn't logged in or something isn't found there should be appropriate error messages or redirects. The user should never see a blank page with no errors.

Move `crates/auth` from Drawbridge to this crate

Drawbridge should not do anything Github-specific, this server, however, very much should.

Enable SGX Attestation

Basically, we need to install and enable AESMD and PCCS. This depends on profianinc/infrastructure#12.

Web UI

User Rate Limiting

We need to limit the system so that each authorized user can only submit one job at a time. Depends on #6.

Give every workload its own IP

A temporary fix would be to track the open ports #72. A longer term fix would be to give every workload its own IP address somehow so users cannot step on each other.

Randomizing ports in the Enarx.toml and telling the user the new ones could be useful and should be considered.

User dashboard

There should be a dashboard, where users can observe the workloads that have been scheduled by them and/or executed by the server.

Workload deployment fails with no output and eventual 504 timeout

Release https://github.com/profianinc/benefice/releases/tag/v0.1.0-rc10 fails to deploy workloads, instead the window starts loading forever rendering server unusable until restart. Eventually Nginx fails with 504 (gateway timeout) if you wait long enough

recording.mp4

Add support for stdin

This will allow for some interactive demos to work.

Limit execution time to a configurable timeout

Workloads should only execute for at most T, where T is a runtime parameter to the server

Fix: responsive console

Do we want the console output to be responsive? Currently the width is set to 80 characters, but the rest of the page remains responsive. Looks fine to me and I think I prefer it this way.

Allow starred users to have benefits

If a user has starred enarx they should be able to run their payload for a longer period of time or upload a larger payload (10 vs 50 MB for example).

Fix some frontend scripting issues

There is some incorrect behavior related to if the user is logged in or not, inconsistent formatting in the html, broken progress bar, etc.

almost all redirects are broken

Basic design and CSS

The web form is lacking any design at the moment. We need to improve that.
Requirements:

Some very basic CSS to make buttons and form look a little bit nicer. I think this should be just a very minimal bare-bones library we can import and fetch quickly (i.e. not Bootstrap)
Align the button with the form. Currently, the submit button is not aligned with the edge of the form, which looks bad and gets even worse with scaling.

Different workload scheduling for stargazers

Only users that have starred the project should be able to add items to the workload queue

After #6 is done, on scheduling request:

Use the OIDC subject received from the UserInfo request to get the Auth0 user https://auth0.com/docs/api/management/v2/#!/Users/get_users_by_id
For each object contained in identities field with provider field value equal to github, collect the access_token field into a vector. Something like

let github_tokens: Vec<_> = info.identities.into_iter().filter_map(|id| if id.provider == "github" { Some(id.access_token) } else { None }).collect())

If the github_tokens vector is empty after previous step, return an error stating that the user should login with github
For each token in github_tokens, check that the user has starred https://github.com/enarx/enarx repository https://docs.github.com/en/rest/activity/starring#check-if-a-repository-is-starred-by-the-authenticated-user

curl -v -H "Authorization: Bearer gho_<TOKEN>" https://api.github.com/user/starred/enarx/enarx

Once a successful code is received, abort the search and authorize scheduling, otherwise return an error stating that the user should star the enarx/enarx repository and provide a URL to it in the error message

Broken X on "Login required" box

The "Login required" box has an (X) to close the box. But when you click it, nothing happens.

Jobs Limit

We need to restrict the maximum number of jobs that can be submitted from all users. We should probably start with the number of CPUs as a good number and tweak from there based on deployment experience.

(Structured) logging

We should review the logging done in this crate, the logging levels and log context handling.
Logs should be useful for the operator and we should be able to query by arbitrary keys like job ID.

Authenticate users via encrypted tokens

In #89 the cookie format was reworked and now volatile UUIDs are issued at runtime and kept in memory server-side and in the cookie. These UUIDs are then used to authenticate users.

UUIDs are not designed to be used for this purpose:

Do not assume that UUIDs are hard to guess; they should not be used
as security capabilities (identifiers whose mere possession grants
access), for example. A predictable random number source will
exacerbate the situation.

Do not assume that it is easy to determine if a UUID has been
slightly transposed in order to redirect a reference to another
object. Humans do not have the ability to easily check the integrity
of a UUID by simply glancing at it.

Distributed applications generating UUIDs at a variety of hosts must
be willing to rely on the random number source at all hosts. If this
is not feasible, the namespace variant should be used.

https://www.rfc-editor.org/rfc/rfc4122#section-6

Moreover, we need the actual github token to check for "starred" condition via https://docs.github.com/en/rest/activity/starring#check-if-a-repository-is-starred-by-the-authenticated-user as a single API call, which does not suffer from rate limiting issues.

The most applicable solution here is probably to encrypt the tokens (both the ID token and a refresh token #43) and store them encrypted in the cookie

Workoad scheduler and executor

Allow the Enarx.toml be to be editable when deploying from drawbridge

blocked until #68 is merged

enarx deploy doesn't allow us to change the Enarx.toml configuration to change ports/arguments. We should instead try to download the main.wasm/Enarx.toml and use enarx run instead.

This will simplify things and allow users to avoid port conflict errors and change arguments on a per workload basis.

We should also handle refreshing the Enarx.toml on the frontend via a button and disallow deployments if the Enarx.toml can not be pulled.