mozilla-services / syncstorage-rs Goto Github PK
View Code? Open in Web Editor NEWSync Storage server in Rust
License: Mozilla Public License 2.0
Sync Storage server in Rust
License: Mozilla Public License 2.0
It seems likely (although not yet certain) that in the future we will end up wanting loosely-defined relationships between collections. For example, we might want to add a new "containers" collection which holds all the Firefox containers defined by a user, and the history collection might want to refer to what container a visit relates to.
If the existing batch mechanism could be extended such that the same batch ID can be used for multiple collections, it would mean that clients can ensure that the data uploaded across collections is always consistent. There are probably no changes needed to the API itself, just to the semantics of when the various batch query params are valid.
This is a little vague and something we don't need now. To be honest we aren't even 100% sure we will need it in the future (and our current plans mean we almost certainly will not need it over the next few months), but I thought I'd get this on the radar anyway. cc @rfk, @thomcc
Port the Python post validators:
POST_VALIDATORS = DEFAULT_VALIDATORS + (
extract_batch_state,
parse_multiple_bsos,
check_for_known_bad_payloads
)
Also clean-up handler methods for uniformity.
We need to eventually support Google Spanner, but for local testing/hosting we need something we can run as well as our users. To handle this, we should abstract the db commands we need into a trait, and perhaps provide some default generic SQL queries that Spanner/SQL specializations can extend as needed.
The generic sql queries for Sync are here:
https://github.com/mozilla-services/server-syncstorage/blob/master/syncstorage/storage/sql/queries_generic.py
See rfk's comments here:
I believe the go version gets away without this field because it stores the top level modified timestamp separately in its per user KeyValues store.
mozilla-services/server-syncstorage#62
Solidify the batches table schema (base it off the python version's schema or can we use the go version's simpler design?) and implement its calls.
Web handling related code is a bit spread out, a nicer organization would have a web/ with the auth, handlers, and extractors under it.
Passing collection_ids was somewhat inherited from the go version and additionally I missed the python version's handling of collection_ids via its global collection_id cache. Now that this cache is in place it makes sense to switch the higher level db interface to take collection names as Strings everywhere.
This also saves the handlers an extra future callback they'd need to make by pushing it down into the db layer
Create a Db trait that works w/ mysql. This will be used for users who want their own local syncstorage instances and either potentially to speak to Aurora OR as a local test suite database.
We'll use diesel to handle connection management and mapping sql results to rust structs.
Let's prefer raw sql query strings vs the diesel query builder for now. Our next db target might be Auorora but a database that could potentially reuse many of the mysql query strings but wouldn't be supported underneath diesel (it's still up in the air so let's prefer this route while we're figuring it out)
We now match the Python on enforcing the /info/configuration limits. There's room for improvement w/ batches: the batch handler could additionally check the max_total_{bytes/records} limits on the final collection of bsos in a batch before committing.
Currently we only check the user supplied headers of these values during the batch calls but they could potentially mismatch against the actual totals.
go-syncstorage actually does this in its handler.
POSTing multiple BSOs will rejects unknown bso fields, but our PUT of a single bso allows them.
This causes test_storage::test_handling_of_invalid_bso_fields to fail (its final assertion):
# Invalid BSO - unknown field
bso = {"id": "TEST", "unexpected": "spanish-inquisition"}
...
res = self.app.put_json(coll_url + "/" + bso["id"], bso, status=400)
In #39, the /info/configuration
endpoint is implemented to return some static limits relating to payloads:
static KILOBYTE: u32 = 1024;
static MEGABYTE: u32 = KILOBYTE * KILOBYTE;
static DEFAULT_MAX_POST_BYTES: u32 = 2 * MEGABYTE;
static DEFAULT_MAX_POST_RECORDS: u32 = 100;
static DEFAULT_MAX_RECORD_PAYLOAD_BYTES: u32 = 2 * MEGABYTE;
static DEFAULT_MAX_REQUEST_BYTES: u32 = DEFAULT_MAX_POST_BYTES + 4 * KILOBYTE;
static DEFAULT_MAX_TOTAL_BYTES: u32 = 100 * DEFAULT_MAX_POST_BYTES;
static DEFAULT_MAX_TOTAL_RECORDS: u32 = 100 * DEFAULT_MAX_POST_RECORDS;
Most of these need to be enforced manually I guess, except for DEFAULT_MAX_REQUEST_BYTES
which can be set wholesale at the web server level.
Dependent on its extractor (#86)
Originally part of #49: port the extract_batch_state
validator from Python to an actix-web extractor
get_collection should return the items with newline separators instead of JSON if the Accept header indicates application/newlines
.
The Python sets up a regex to validate 'collection' and 'item' (bso ids) within the path:
This has test_storage::test_handling_of_invalid_bso_fields expecting 404s for these invalid values:
# Invalid ID - too long
bso = {"id": "X" * 65, "payload": "testing"}
...
res = self.app.put_json(coll_url + "/" + bso["id"], bso, status=404)
Maybe we can emulate this behavior w/ actix's custom route predicates?
Create an initial Rust project, with travis file/config, and rough architecture layout.
The dbexecutor abstraction was needed for talking to blocking databases, we're going to be working against a database we can talk to with futures, so this abstraction should be removed.
Creating a sync server prototype and retain existing handling and response characteristics with the existing Python version.
All of these components should behave in a similar compatible manner.
test_storage::test_that_404_responses_have_a_json_body currently fails, expecting a 404 at "/nonexistent/url"
Port the put validators:
PUT_VALIDATORS = DEFAULT_VALIDATORS + (
parse_single_bso,
check_for_known_bad_payloads
)
The default validators from Python:
DEFAULT_VALIDATORS = (
extract_target_resource,
extract_precondition_headers,
extract_query_params,
)
Issue #29's last PR brought over extract_query_params and the machinery needed. Closing this requires implementing the other 2 default validators.
Cleanup many of the various XXX comments around type signatures around the mysql db
To avoid doing queries we shouldn't, the precondition header check should be done in middleware in front of the db middlware, to verify whether the query has any chance of succeeding.
post_bsos returns a mapping of "failed" bsos that failed validation: but the presence of invalid bsos shouldn't prevent other bsos in the post from being written to the database:
The BsoBodies extractor currently produces an error on these which bails out the entire request.
This causes test_storage::test_set_collection to fail
#20 adds master_token_secret
to Settings
, giving it type Vec<u8>
. Afaik, there is no way to deserialize that without writing a custom deserializer, so we'll need to add one before we can set it from environment variable or config file.
Not all validation errors should be a 400, in this case we want a 415 Unsupported Media Type:
We'll need to figure out the details of hawk authentication and build it via the https://github.com/taskcluster/rust-hawk crate (we have some notes from rfk in handlers::HawkHeader).
(Despite the crate's warning of production use, a couple different projects in the https://github.com/mozilla/application-services repo have already used it successfully)
We currently accept this expected failure from test_storage::test_batch_empty_commit:
testEmptyCommit("application/newlines", "\n", status=400)
Build the equivalent of the lock_for_{read/write} context managers and trigger their use for each handler transaction. read's triggered for GET/HEAD requests, otherwise write.
Right now we silently ignore a nonexistent bso delete (somewhat inherited from the go version: its view layer checks for the existence of the bso before attempting the delete to cause a 404).
We should follow the python version, emitting a BsoNotFound when the delete affects no rows.
This causes test_storage::test_delete_item to fail
This generally causes a 404 but is special cased for get_collection to return an empty collection instead.
This causes test_storage::test_get_collection to fail
There are a number of places we call into()
gratuitously, where, instead, we could refactor the code to make better use of the Into
trait. For example, our extractors all set <T as FromRequest>::Error
(where T
is the implementing type) to be actix_web::Error
. While this works, all of our extractors return ValidationErrorKind
s, which means we need to call into()
to perform the conversion into actix_web::Error
. If we were to set <T as FromRequest>::Error
to ValidationErrorKind
, we could do away with the calls to into()
.
We can make a similar change in the handlers. Because Responder
is implemented for Result<R: Responder, E: Into<actix_web::Error>
, there's no need to return actix_web::Error
directly from the handlers (as we currently do), since ApiError
implements ResponseError
and thus Into<Error>
.
We should also explore getting rid of the DbErrorKind
, ApiErrorKind
, and ValidationErrorKind
types. These types are sources of many of the calls to into()
, and we should be able to get away with just having DbError
and ApiError
enums (they should either both implement ResponseError
or ApiError
should implement From<DbError>
.
A variety of method calls fail to include the appropriate X-Last-Modified header that the Python tests expect, they should add it when appropriate.
We're going to a shared db model, so the sqlite scheme isn't needed.
The initial mysql support more matches go-syncstorage's handling of what's essentially a per session 'modified' timestamp.
We should more closely match the python version by creating the timestamp in one place per API call (per SqlStorageSession). Likely living in a similar DbSession struct within the Db impl.
api/handlers.rs
and api/auth.rs
do their error-marshalling manually. The db code uses failure
and I know it has magic in it for the marshalling stuff, I just never got round to reading about it yet.
Python's {Json,Newlines}Renderer ensures X-Weave-Records are always included, its value being the number of records returned.
This causes test_storage::get_collection and a couple of batch tests to fail
Returns static config data.
Several requests return a 400 instead of 415, or a 400 instead of 404, or a 200 instead of a 400.
We lack the equivalent of the python's sleep_and_retry_on_conflict decorator. It attempts to retry a request once in the face of timestamp ConflictErrors.
This can affect the e2e tests: seeing ConflictErrors occasionally reported because the tests can potentially hit the server with successive requests in less than 10 milliseconds
Sync has a batch of middle-wares that populate common response headers and other basic spec handling. These should be relatively easy to port to actix middleware.
Over in fxa-email-service
we have a Travis job that builds the docs and then pushes them to the gh-pages
branch. This means we have automagically refreshing docs available here:
https://mozilla.github.io/fxa-email-service/fxa_email_service/
Are you guys interested in something similar for this repo?
This just bit us in fxa-email-service and the same mistake I made there, I made here:
https://github.com/mozilla-services/syncstorage-rs/blob/master/src/error.rs#L136
We should only send 500
errors to Sentry.
According to test_storage::test_set_item_input_formats this is a bw-compat case we need to support:
# Unless we use text/plain, which is a special bw-compat case.
self.app.put(self.root + '/storage/col2/TEST', body, headers={
"Content-Type": "text/plain"
})
The Handlers should be provided a Db instance created from the pool (probably from an extractor). An associated middleware will ensure an appropriate transaction + lock is established and committed (or rolled back) at the end of the request
A lot of boiler-plate exists for the handlers as they frequently require at leas 3 different extractors. This can be done in a single object for a cleaner API with less boilerplate with its own extractor.
A TODO item described here: https://github.com/mozilla-services/syncstorage-rs/blob/bea8032/src/db/mysql/models.rs#L242
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.