mozilla-services / syncstorage-rs Goto Github PK

Sync Storage server in Rust

License: Mozilla Public License 2.0

Rust 63.02% Shell 0.30% Dockerfile 0.40% Python 36.02% Makefile 0.21% Nix 0.05%

services-engineering-team sync rust spanner

syncstorage-rs's Issues

Support batches across collections

It seems likely (although not yet certain) that in the future we will end up wanting loosely-defined relationships between collections. For example, we might want to add a new "containers" collection which holds all the Firefox containers defined by a user, and the history collection might want to refer to what container a visit relates to.

If the existing batch mechanism could be extended such that the same batch ID can be used for multiple collections, it would mean that clients can ensure that the data uploaded across collections is always consistent. There are probably no changes needed to the API itself, just to the semantics of when the various batch query params are valid.

This is a little vague and something we don't need now. To be honest we aren't even 100% sure we will need it in the future (and our current plans mean we almost certainly will not need it over the next few months), but I thought I'd get this on the radar anyway. cc @rfk, @thomcc

Port get/delete collection/bso validators

Port the Python post validators:

POST_VALIDATORS = DEFAULT_VALIDATORS + (
    extract_batch_state,
    parse_multiple_bsos,
    check_for_known_bad_payloads
)

Reduce usage of macros on core handler methods

Also clean-up handler methods for uniformity.

Create db abstraction trait

We need to eventually support Google Spanner, but for local testing/hosting we need something we can run as well as our users. To handle this, we should abstract the db commands we need into a trait, and perhaps provide some default generic SQL queries that Spanner/SQL specializations can extend as needed.

The generic sql queries for Sync are here:
https://github.com/mozilla-services/server-syncstorage/blob/master/syncstorage/storage/sql/queries_generic.py

Schemas:
https://github.com/mozilla-services/server-syncstorage/blob/master/syncstorage/storage/sql/dbconnect.py#L105

Add a deleted_at to user_collections table

See rfk's comments here:

#22 (comment)

I believe the go version gets away without this field because it stores the top level modified timestamp separately in its per user KeyValues store.
mozilla-services/server-syncstorage#62

Implement batch db calls

Solidify the batches table schema (base it off the python version's schema or can we use the go version's simpler design?) and implement its calls.

Refactor handlers into web package

Web handling related code is a bit spread out, a nicer organization would have a web/ with the auth, handlers, and extractors under it.

Switch db interface back to collection: String (vs collection_id: i32)

Passing collection_ids was somewhat inherited from the go version and additionally I missed the python version's handling of collection_ids via its global collection_id cache. Now that this cache is in place it makes sense to switch the higher level db interface to take collection names as Strings everywhere.

This also saves the handlers an extra future callback they'd need to make by pushing it down into the db layer

Support a mysql backend

Create a Db trait that works w/ mysql. This will be used for users who want their own local syncstorage instances and either potentially to speak to Aurora OR as a local test suite database.

We'll use diesel to handle connection management and mapping sql results to rust structs.

Let's prefer raw sql query strings vs the diesel query builder for now. Our next db target might be Auorora but a database that could potentially reuse many of the mysql query strings but wouldn't be supported underneath diesel (it's still up in the air so let's prefer this route while we're figuring it out)

Enforce limits on final batch commit

We now match the Python on enforcing the /info/configuration limits. There's room for improvement w/ batches: the batch handler could additionally check the max_total_{bytes/records} limits on the final collection of bsos in a batch before committing.

Currently we only check the user supplied headers of these values during the batch calls but they could potentially mismatch against the actual totals.

go-syncstorage actually does this in its handler.

Reject unknown fields in PUT bso

POSTing multiple BSOs will rejects unknown bso fields, but our PUT of a single bso allows them.

This causes test_storage::test_handling_of_invalid_bso_fields to fail (its final assertion):

        # Invalid BSO - unknown field
        bso = {"id": "TEST", "unexpected": "spanish-inquisition"}
        ...
        res = self.app.put_json(coll_url + "/" + bso["id"], bso, status=400)

Enforce the limits returned by /info/configuration

In #39, the /info/configuration endpoint is implemented to return some static limits relating to payloads:

static KILOBYTE: u32 = 1024;
static MEGABYTE: u32 = KILOBYTE * KILOBYTE;
static DEFAULT_MAX_POST_BYTES: u32 = 2 * MEGABYTE;
static DEFAULT_MAX_POST_RECORDS: u32 = 100;
static DEFAULT_MAX_RECORD_PAYLOAD_BYTES: u32 = 2 * MEGABYTE;
static DEFAULT_MAX_REQUEST_BYTES: u32 = DEFAULT_MAX_POST_BYTES + 4 * KILOBYTE;
static DEFAULT_MAX_TOTAL_BYTES: u32 = 100 * DEFAULT_MAX_POST_BYTES;
static DEFAULT_MAX_TOTAL_RECORDS: u32 = 100 * DEFAULT_MAX_POST_RECORDS;

Most of these need to be enforced manually I guess, except for DEFAULT_MAX_REQUEST_BYTES which can be set wholesale at the web server level.

Port batch handler

Dependent on its extractor (#86)

Port batch validator

Originally part of #49: port the extract_batch_state validator from Python to an actix-web extractor

get_collection must support newlines reply

get_collection should return the items with newline separators instead of JSON if the Accept header indicates application/newlines.

Add handler stubs for Sync 1.5 API methods

Return 404s on invalid bso ids/collection in paths

The Python sets up a regex to validate 'collection' and 'item' (bso ids) within the path:

https://github.com/mozilla-services/server-syncstorage/blob/e992a39/syncstorage/views/__init__.py#L116

This has test_storage::test_handling_of_invalid_bso_fields expecting 404s for these invalid values:

        # Invalid ID - too long
        bso = {"id": "X" * 65, "payload": "testing"}
        ...
        res = self.app.put_json(coll_url + "/" + bso["id"], bso, status=404)

Maybe we can emulate this behavior w/ actix's custom route predicates?

Initial Rust project with file structure layout

Create an initial Rust project, with travis file/config, and rough architecture layout.

Remove dbexecutor abstraction

The dbexecutor abstraction was needed for talking to blocking databases, we're going to be working against a database we can talk to with futures, so this abstraction should be removed.

[meta] Create Sync Server Prototype

Creating a sync server prototype and retain existing handling and response characteristics with the existing Python version.

All of these components should behave in a similar compatible manner.

Request validation/extraction logic
Error handling for responses
- Use failure crate everywhere
Logging & Sentry Usage
- Add Sentry
Database methods (MySQL)
Handler (view) logic to manipulate request data to db calls

404 response not returned when expected

test_storage::test_that_404_responses_have_a_json_body currently fails, expecting a 404 at "/nonexistent/url"

Port put/post validators

Port the put validators:

PUT_VALIDATORS = DEFAULT_VALIDATORS + (
    parse_single_bso,
    check_for_known_bad_payloads
)

Port default validators

The default validators from Python:

DEFAULT_VALIDATORS = (
    extract_target_resource,
    extract_precondition_headers,
    extract_query_params,
)

Issue #29's last PR brought over extract_query_params and the machinery needed. Closing this requires implementing the other 2 default validators.

Cleanup db types

Cleanup many of the various XXX comments around type signatures around the mysql db

fixing modified/'sizes' to become u64s
fix the user_id column to be 64 bits (it's a BigInt in token server yet a regular Int in server-syncstorage for some reason)
may want to revisit handling of u64s passed into the database: they must be casted to i64s so we may want to bounds check them beforehand (fixed for user_id)

Create precondition header middleware

To avoid doing queries we shouldn't, the precondition header check should be done in middleware in front of the db middlware, to verify whether the query has any chance of succeeding.

Invalid BsoBodies should fall through extraction

post_bsos returns a mapping of "failed" bsos that failed validation: but the presence of invalid bsos shouldn't prevent other bsos in the post from being written to the database:

https://github.com/mozilla-services/server-syncstorage/blob/master/syncstorage/views/__init__.py#L354

The BsoBodies extractor currently produces an error on these which bails out the entire request.

This causes test_storage::test_set_collection to fail

Implement a custom deserializer for Settings::master_token_secret

#20 adds master_token_secret to Settings, giving it type Vec<u8>. Afaik, there is no way to deserialize that without writing a custom deserializer, so we'll need to add one before we can set it from environment variable or config file.

Supported other statuses in ValidationErrorKind::FromDetails

Not all validation errors should be a 400, in this case we want a 415 Unsupported Media Type:

https://github.com/mozilla-services/syncstorage-rs/pull/75/files#diff-9de2327cbb67a90931e98ff70715da9dR205

Support hawk authentication

We'll need to figure out the details of hawk authentication and build it via the https://github.com/taskcluster/rust-hawk crate (we have some notes from rfk in handlers::HawkHeader).

(Despite the crate's warning of production use, a couple different projects in the https://github.com/mozilla/application-services repo have already used it successfully)

Reject application/newlines body of "\n"

We currently accept this expected failure from test_storage::test_batch_empty_commit:

 testEmptyCommit("application/newlines", "\n", status=400)

Implement collection-level locking

Build the equivalent of the lock_for_{read/write} context managers and trigger their use for each handler transaction. read's triggered for GET/HEAD requests, otherwise write.

delete on individual nonexistent bsos should return a 404

Right now we silently ignore a nonexistent bso delete (somewhat inherited from the go version: its view layer checks for the existence of the bso before attempting the delete to cause a 404).

We should follow the python version, emitting a BsoNotFound when the delete affects no rows.

This causes test_storage::test_delete_item to fail

get_collection should return empty for non-existent collections

See: https://github.com/mozilla-services/server-syncstorage/blob/e992a39/syncstorage/views/__init__.py#L300

This generally causes a 404 but is special cased for get_collection to return an empty collection instead.

This causes test_storage::test_get_collection to fail

Cleanup into()'s

There are a number of places we call into() gratuitously, where, instead, we could refactor the code to make better use of the Into trait. For example, our extractors all set <T as FromRequest>::Error (where T is the implementing type) to be actix_web::Error. While this works, all of our extractors return ValidationErrorKinds, which means we need to call into() to perform the conversion into actix_web::Error. If we were to set <T as FromRequest>::Error to ValidationErrorKind, we could do away with the calls to into().

We can make a similar change in the handlers. Because Responder is implemented for Result<R: Responder, E: Into<actix_web::Error>, there's no need to return actix_web::Error directly from the handlers (as we currently do), since ApiError implements ResponseError and thus Into<Error>.

We should also explore getting rid of the DbErrorKind, ApiErrorKind, and ValidationErrorKind types. These types are sources of many of the calls to into(), and we should be able to get away with just having DbError and ApiError enums (they should either both implement ResponseError or ApiError should implement From<DbError>.

Requests should include X-Last-Modified

A variety of method calls fail to include the appropriate X-Last-Modified header that the Python tests expect, they should add it when appropriate.

Remove the sqlite db handling code

We're going to a shared db model, so the sqlite scheme isn't needed.

Add sentry

Create a canonical session timestamp

The initial mysql support more matches go-syncstorage's handling of what's essentially a per session 'modified' timestamp.

We should more closely match the python version by creating the timestamp in one place per API call (per SqlStorageSession). Likely living in a similar DbSession struct within the Db impl.

Use failure crate everywhere

api/handlers.rs and api/auth.rs do their error-marshalling manually. The db code uses failure and I know it has magic in it for the marshalling stuff, I just never got round to reading about it yet.

json and newlines responses should include an X-Weave-Records header

Python's {Json,Newlines}Renderer ensures X-Weave-Records are always included, its value being the number of records returned.

This causes test_storage::get_collection and a couple of batch tests to fail

Implement the /configuration endpoint

Returns static config data.

Improper response status codes

Several requests return a 400 instead of 415, or a 400 instead of 404, or a 200 instead of a 400.

Attempt a retry on ConflictErrors

We lack the equivalent of the python's sleep_and_retry_on_conflict decorator. It attempts to retry a request once in the face of timestamp ConflictErrors.

This can affect the e2e tests: seeing ConflictErrors occasionally reported because the tests can potentially hit the server with successive requests in less than 10 milliseconds

Implement sync middlewares

Sync has a batch of middle-wares that populate common response headers and other basic spec handling. These should be relatively easy to port to actix middleware.

Build the docs in CI and push them to the gh-pages branch?

Over in fxa-email-service we have a Travis job that builds the docs and then pushes them to the gh-pages branch. This means we have automagically refreshing docs available here:

https://mozilla.github.io/fxa-email-service/fxa_email_service/

Are you guys interested in something similar for this repo?

Don't send 4xx errors to Sentry

This just bit us in fxa-email-service and the same mistake I made there, I made here:

https://github.com/mozilla-services/syncstorage-rs/blob/master/src/error.rs#L136

We should only send 500 errors to Sentry.

Allow text/plain json bodies

According to test_storage::test_set_item_input_formats this is a bw-compat case we need to support:

        # Unless we use text/plain, which is a special bw-compat case.
        self.app.put(self.root + '/storage/col2/TEST', body, headers={
            "Content-Type": "text/plain"
        })

mozilla-services / syncstorage-rs Goto Github PK

syncstorage-rs's Issues

Recommend Projects

Recommend Topics

Recommend Org