Giter VIP home page Giter VIP logo

syncstorage-rs's Issues

Support batches across collections

It seems likely (although not yet certain) that in the future we will end up wanting loosely-defined relationships between collections. For example, we might want to add a new "containers" collection which holds all the Firefox containers defined by a user, and the history collection might want to refer to what container a visit relates to.

If the existing batch mechanism could be extended such that the same batch ID can be used for multiple collections, it would mean that clients can ensure that the data uploaded across collections is always consistent. There are probably no changes needed to the API itself, just to the semantics of when the various batch query params are valid.

This is a little vague and something we don't need now. To be honest we aren't even 100% sure we will need it in the future (and our current plans mean we almost certainly will not need it over the next few months), but I thought I'd get this on the radar anyway. cc @rfk, @thomcc

Create db abstraction trait

We need to eventually support Google Spanner, but for local testing/hosting we need something we can run as well as our users. To handle this, we should abstract the db commands we need into a trait, and perhaps provide some default generic SQL queries that Spanner/SQL specializations can extend as needed.

The generic sql queries for Sync are here:
https://github.com/mozilla-services/server-syncstorage/blob/master/syncstorage/storage/sql/queries_generic.py

Schemas:
https://github.com/mozilla-services/server-syncstorage/blob/master/syncstorage/storage/sql/dbconnect.py#L105

Implement batch db calls

Solidify the batches table schema (base it off the python version's schema or can we use the go version's simpler design?) and implement its calls.

Refactor handlers into web package

Web handling related code is a bit spread out, a nicer organization would have a web/ with the auth, handlers, and extractors under it.

Switch db interface back to collection: String (vs collection_id: i32)

Passing collection_ids was somewhat inherited from the go version and additionally I missed the python version's handling of collection_ids via its global collection_id cache. Now that this cache is in place it makes sense to switch the higher level db interface to take collection names as Strings everywhere.

This also saves the handlers an extra future callback they'd need to make by pushing it down into the db layer

Support a mysql backend

Create a Db trait that works w/ mysql. This will be used for users who want their own local syncstorage instances and either potentially to speak to Aurora OR as a local test suite database.

We'll use diesel to handle connection management and mapping sql results to rust structs.

Let's prefer raw sql query strings vs the diesel query builder for now. Our next db target might be Auorora but a database that could potentially reuse many of the mysql query strings but wouldn't be supported underneath diesel (it's still up in the air so let's prefer this route while we're figuring it out)

Enforce limits on final batch commit

We now match the Python on enforcing the /info/configuration limits. There's room for improvement w/ batches: the batch handler could additionally check the max_total_{bytes/records} limits on the final collection of bsos in a batch before committing.

Currently we only check the user supplied headers of these values during the batch calls but they could potentially mismatch against the actual totals.

go-syncstorage actually does this in its handler.

Reject unknown fields in PUT bso

POSTing multiple BSOs will rejects unknown bso fields, but our PUT of a single bso allows them.

This causes test_storage::test_handling_of_invalid_bso_fields to fail (its final assertion):

        # Invalid BSO - unknown field
        bso = {"id": "TEST", "unexpected": "spanish-inquisition"}
        ...
        res = self.app.put_json(coll_url + "/" + bso["id"], bso, status=400)

Enforce the limits returned by /info/configuration

In #39, the /info/configuration endpoint is implemented to return some static limits relating to payloads:

static KILOBYTE: u32 = 1024;
static MEGABYTE: u32 = KILOBYTE * KILOBYTE;
static DEFAULT_MAX_POST_BYTES: u32 = 2 * MEGABYTE;
static DEFAULT_MAX_POST_RECORDS: u32 = 100;
static DEFAULT_MAX_RECORD_PAYLOAD_BYTES: u32 = 2 * MEGABYTE;
static DEFAULT_MAX_REQUEST_BYTES: u32 = DEFAULT_MAX_POST_BYTES + 4 * KILOBYTE;
static DEFAULT_MAX_TOTAL_BYTES: u32 = 100 * DEFAULT_MAX_POST_BYTES;
static DEFAULT_MAX_TOTAL_RECORDS: u32 = 100 * DEFAULT_MAX_POST_RECORDS;

Most of these need to be enforced manually I guess, except for DEFAULT_MAX_REQUEST_BYTES which can be set wholesale at the web server level.

Port batch validator

Originally part of #49: port the extract_batch_state validator from Python to an actix-web extractor

Return 404s on invalid bso ids/collection in paths

The Python sets up a regex to validate 'collection' and 'item' (bso ids) within the path:

https://github.com/mozilla-services/server-syncstorage/blob/e992a39/syncstorage/views/__init__.py#L116

This has test_storage::test_handling_of_invalid_bso_fields expecting 404s for these invalid values:

        # Invalid ID - too long
        bso = {"id": "X" * 65, "payload": "testing"}
        ...
        res = self.app.put_json(coll_url + "/" + bso["id"], bso, status=404)

Maybe we can emulate this behavior w/ actix's custom route predicates?

Remove dbexecutor abstraction

The dbexecutor abstraction was needed for talking to blocking databases, we're going to be working against a database we can talk to with futures, so this abstraction should be removed.

[meta] Create Sync Server Prototype

Creating a sync server prototype and retain existing handling and response characteristics with the existing Python version.

All of these components should behave in a similar compatible manner.

Port put/post validators

Port the put validators:

PUT_VALIDATORS = DEFAULT_VALIDATORS + (
    parse_single_bso,
    check_for_known_bad_payloads
)

Port default validators

The default validators from Python:

DEFAULT_VALIDATORS = (
    extract_target_resource,
    extract_precondition_headers,
    extract_query_params,
)

Issue #29's last PR brought over extract_query_params and the machinery needed. Closing this requires implementing the other 2 default validators.

Cleanup db types

Cleanup many of the various XXX comments around type signatures around the mysql db

  • fixing modified/'sizes' to become u64s
  • fix the user_id column to be 64 bits (it's a BigInt in token server yet a regular Int in server-syncstorage for some reason)
  • may want to revisit handling of u64s passed into the database: they must be casted to i64s so we may want to bounds check them beforehand (fixed for user_id)

Create precondition header middleware

To avoid doing queries we shouldn't, the precondition header check should be done in middleware in front of the db middlware, to verify whether the query has any chance of succeeding.

Implement collection-level locking

Build the equivalent of the lock_for_{read/write} context managers and trigger their use for each handler transaction. read's triggered for GET/HEAD requests, otherwise write.

delete on individual nonexistent bsos should return a 404

Right now we silently ignore a nonexistent bso delete (somewhat inherited from the go version: its view layer checks for the existence of the bso before attempting the delete to cause a 404).

We should follow the python version, emitting a BsoNotFound when the delete affects no rows.

This causes test_storage::test_delete_item to fail

Cleanup into()'s

There are a number of places we call into() gratuitously, where, instead, we could refactor the code to make better use of the Into trait. For example, our extractors all set <T as FromRequest>::Error (where T is the implementing type) to be actix_web::Error. While this works, all of our extractors return ValidationErrorKinds, which means we need to call into() to perform the conversion into actix_web::Error. If we were to set <T as FromRequest>::Error to ValidationErrorKind, we could do away with the calls to into().

We can make a similar change in the handlers. Because Responder is implemented for Result<R: Responder, E: Into<actix_web::Error>, there's no need to return actix_web::Error directly from the handlers (as we currently do), since ApiError implements ResponseError and thus Into<Error>.

We should also explore getting rid of the DbErrorKind, ApiErrorKind, and ValidationErrorKind types. These types are sources of many of the calls to into(), and we should be able to get away with just having DbError and ApiError enums (they should either both implement ResponseError or ApiError should implement From<DbError>.

Create a canonical session timestamp

The initial mysql support more matches go-syncstorage's handling of what's essentially a per session 'modified' timestamp.

We should more closely match the python version by creating the timestamp in one place per API call (per SqlStorageSession). Likely living in a similar DbSession struct within the Db impl.

Use failure crate everywhere

api/handlers.rs and api/auth.rs do their error-marshalling manually. The db code uses failure and I know it has magic in it for the marshalling stuff, I just never got round to reading about it yet.

Attempt a retry on ConflictErrors

We lack the equivalent of the python's sleep_and_retry_on_conflict decorator. It attempts to retry a request once in the face of timestamp ConflictErrors.

This can affect the e2e tests: seeing ConflictErrors occasionally reported because the tests can potentially hit the server with successive requests in less than 10 milliseconds

Implement sync middlewares

Sync has a batch of middle-wares that populate common response headers and other basic spec handling. These should be relatively easy to port to actix middleware.

Allow text/plain json bodies

According to test_storage::test_set_item_input_formats this is a bw-compat case we need to support:

        # Unless we use text/plain, which is a special bw-compat case.
        self.app.put(self.root + '/storage/col2/TEST', body, headers={
            "Content-Type": "text/plain"
        })

Integrate db layer and handlers w/ a Middleware/Extractor

The Handlers should be provided a Db instance created from the pool (probably from an extractor). An associated middleware will ensure an appropriate transaction + lock is established and committed (or rolled back) at the end of the request

Implement custom datatype for common extractions

A lot of boiler-plate exists for the handlers as they frequently require at leas 3 different extractors. This can be done in a single object for a cleaner API with less boilerplate with its own extractor.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.