Giter VIP home page Giter VIP logo

entity's Introduction

Entity

Entity is a privacy-aware data layer for defining, caching, and authorizing access to application data models.

tests docs codecov npm NPM

Core Features

  • Declarative actor authorization using Privacy Policies
  • Configurable data storage using Database Adapters
  • Configurable, optional full-object caching using Cache Adapters
  • Dataloader in-memory caching
  • Well-typed model declaration

Getting Started

Background

Authorization is the process of determining whether a user has access to a piece of data or a feature.

One could imagine a simple application with users and their photos. The authorization logic is simple: when the user loads their photos, only query photos WHERE user_id = user.id. A more complex authorization system is most likely overkill at this point.

Now, lets add teams to our simple application, where users on the same team can see each others' photos. The authorization logic becomes more complex: WHERE user_id = user.id OR user_id IN (list of users for all organizations that user belongs to). While still maintainable, one can see that as requirements are added, this logic becomes increasingly difficult to express in just the query or simple checks in code.

A common next step is to add an authorization system on top of the data loading layer. Pundit, Django Rules, and Laravel Policies are examples of excellent libraries that provide a method to authorize a piece of loaded data in the following manner:

PhotoModel
    def authorize_read():
        if rules.is_photo_owner(user, photo)
            return true
        if rules.has_organization_permission(user, photo)
            return true
    def authorize_create():
        ...

PhotoView
    def render():
        photo = Photo.find(params[:id])
        authorize(photo, 'read')
        render_html(photo)

This works well and is flexible since it allows executing ad-hoc authorization checks. Most libraries also provide hooks into views or controllers such that these authorization checks are performed automatically. This is sufficient for many applications but still has one main drawback: it is prone to error in cases where the authorization check is forgotten or the incorrect check is performed.

The Entity framework solves this by adding an additional property to the system: all data accesses are authorized. Given an object and a viewer, the framework provides a clear and testable mechanism for expressing complex relationships between object and viewer needed to authorize access during CRUD operations, and makes it impossible to perform CRUD operations without performing the authorization checks. This combines the data load and authorization steps from above into a single step:

class PhotoPrivacyPolicy {
  const readRules = [
    new AllowIfOwnerRule(),
    new AllowIfOrganizationPermissionRule(),
  ];
}

// in the view, for example
async function get_photo_page(viewer: ViewerContext): string {
  const photo = await PhotoEntity.loader(viewer).loadById(id);
  return render_html(photo);
}

Use Case

Entity is not limited in where it can or should be used, but was designed for use in a Koa-like environment with a request and response. At Expo, we use Entity in the following manner:

  1. A request comes into Koa router
  2. Middleware initializes the Entity framework for the request
  3. A ViewerContext is created identifying the individual making the request.
  4. The request fulfiller uses the Entity framework and the ViewerContext to load or mutate some data and return a response.

Note: The entity framework instance should not be shared across multiple requests since it contains a unique memoized Dataloader. A long-lived instance is prone to data synchronization issues, especially when the application is scaled horizontally and multiple shared caches would exist for the same data.

Releasing

To release a new version:

  1. git checkout main
  2. yarn lerna publish [patch|minor|major] -- --conventional-commits
  3. In GitHub release interface, create a new release from the tag, copy changelog changes to release description.

License

The Entity source code is made available under the MIT license.

entity's People

Contributors

fiberjw avatar haydendaly avatar ide avatar kgc00 avatar quinlanj avatar wschurman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

entity's Issues

Investigate distributed reader-writer locks for entities

To guarantee cache consistency (and dataloader consistency) across a set of machines or even across requests and keep the read-through caching strategy that we employ, we'll need to implement some sort of distributed reader-writer lock for entities by type and ID.

Another interesting way to mitigate this temporarily but not theoretically fix it is to defer all cache invalidation to the end of the write transaction so that it becomes less likely to have a read-through cache inconsistency (though not impossible still obviously).

Entity Relationships

This is a fairly broad task covering:

  • Add ability to express relationships (associations, foreign keys, etc) in EntityFieldDefinition
  • Add ability to query relationships easily. Put another way, add a way to nicely query the associationLoader for the relationships defined in EntityFieldDefinition.
  • Add support for cascading deletes to the relationship definition, and update the delete mutator to start a transaction, topologically sort the relationship graph to delete and everything that depends on an entity, and delete from the leaves to the root in the transaction.

Support non-string IDs in StubDatabaseAdapter

Currently this simulates ID creation by generating a UUID for the entity id field. Instead, we should check the type of field and appropriately generate either a random number, string, boolean(?), etc.

Thanks @quinlanj for the report!

Shared table with disjoint rows example can't loadMany

The shared table example isn't possible to issue a loadMany on since if one is the wrong type the whole thing will fail. We should probably have a specific error type for constructor errors and send those in the result rather than throw them.

SecondaryCacheLoader.fetchObjectsFromDatabaseAsync should return sparse map

Context: https://github.com/expo/universe/pull/7641/files/90dc1071ada997f57c0b8d33b7b0e7d7e5fe2b15#r641291143

Looking at this PR, I wonder if EntitySecondaryCacheLoader should allow this method to return "sparse" maps that are missing some of the provided keys. A missing key would mean the object was missing from the database and be treated the same way as null or undefined. This way, implementers of fetchObjectsFromDatabaseAsync wouldn't need to pad maps with null entries like this.
EntitySecondaryCacheLoader would then fill in null for all of the missing keys so that the caller of the loader still gets a map with every single key in it, potentially with null values.

Add nullable loadByID EntityLoader functionality

One common use case we're noticing is wanting to check if an ID exists. Currently the best way to accomplish that is by doing something like:

await TestEntity.load(vc).loadByFieldEqualing('id', id)

We should simplify this to something like:

await TestEntity.load(vc).loadByIDNullable(id);

or maybe even make loadByID nullable by default and add a enforceLoadByID or something.

Add support for underlying database adapter column renaming

There are instances where renaming or gracefully deleting a column in a database is useful. The entity framework should support something like "aliasing" where if it receives a column in either the old name or the new name it puts the data in the field.

Add generated JSDoc site

Since all the docblocks in the code are JSDoc formatted, we can autogenerate a documentation site.

Throw error when more than one entity is deleted or updated

While deletes happen by entity ID, there's no way to guarantee at the framework level that the field specified in the entity definition is unique or an ID.

One way we could do this is by ensuring that updates and deletes only affect a single row in the database adapter, and throwing otherwise.

Use @expo/batcher for batch inserts

Documenting this for posterity, but after investigating it looks like we probably won't want to use it for general-purpose mutations in entity.

Background

Let's say we have an entity with one integer field, backed by a postgres DB with a unique constraint on the column. Batch inserting is something that isn't very well supported by entity as it would need to run N inserts to insert N items into the same table.

The following efficient query would be far less efficient expressed in entity code:

INSERT INTO blah_table (num) VALUES (1), (2), (3), (4) RETURNING *;

Entity equivalent:

const numsToInsert = [1..4];
const results = await Promise.all(numsToInsert.map(n => TEntity.creator(vc).setField('num', n).enforceCreateAsync()));

which translates to (in parallel):

INSERT INTO blah_table (num) VALUES (1) RETURNING *;
INSERT INTO blah_table (num) VALUES (2) RETURNING *;
INSERT INTO blah_table (num) VALUES (3) RETURNING *;
INSERT INTO blah_table (num) VALUES (4) RETURNING *;

Hypothesis

We can use https://www.npmjs.com/package/@expo/batcher to coalesce inserts into the DB. That way, the example above could translate into the first query automatically if all the entity writes are executed in a short enough period of time for batcher's batch.

Conclusion

Batcher's error handling doesn't quite work for our case. Let's say a row existed in the able with num = 3 and we tried to insert a batch: INSERT INTO blah_table (num) VALUES (1), (2), (3), (4) RETURNING *;. This whole query would fail, so all the batch items would fail. This could cause some unintended consequences in entity if we're inserting multiple entities of the same type from very code locations at the same time (same batch). It would mean that there could be a side-effect where a failure to insert one entity in codepath A could cause a failure to insert an entity for codepath B even if A and B are supposed to be independent.

Document entityUtils

Add inline docblocks to entityUtils. One of the few places where documentation is missing.

Field Validation

Entity fields should do basic validation.

  • UUIDField should validate that it's a UUID
  • Primitive fields should validate their types
  • EnumField should validate value being a member of a specified enum
  • Custom validation functions should be possible as well.

Add codecov

Once repo is made public, add codecov. This is so that we don't need private repo token.

Add ability to just invalidate an entity's cache

There will probably be times that a manual update or delete raw SQL query is needed, and the entity framework should provide a mechanism to invalidate something manually that is now known to be invalid.

Investigate fan-out join queries for loaders (cache warming)

Entity works on full rows of data (the projection of all queries is *). This is because the privacy logic of a privacy policy is allowed to use the entire entity rather than just a subset of fields, which makes for a very expressive potential of authorization logic.

Joins are extremely powerful in the sense that they allow combining two or more tables in order to create a third intermediate/temporary "table". The issue is that in the general case, this third table isn't possible to express using an entity (since it's dynamic) and therefore creates an issue with how to authorize access to a row of that "table".

So, the tradeoff that entity makes is to not allow joins and to require doing them in the application in exchange for authorization correctness. While this does incur potentially higher memory and an extra round trip to the DB, the tradeoff is most often worth it since we can now guarantee that all data access is authorized and the authorization is sound.

Now, there is a very narrow join case that could work with entities, and that's the case of doing a fan-out load with the projection limited to the * of the resulting object. For example:

select b.*
from apps ap
join builds b on b.app_id = ap.id
where ap.account_id = 'aaa-bbb-ccc-123'

This seems like an interesting case to investigate since it is theoretically possible to authorize the returned objects since they have entity type "build". The association loader could make use of it as well. Association loader is currently just a set of convenience methods that load the chain of entities specified and currently only supports 1:1 foreign keys but it's definitely possible to extend it to 1:n for fan-outs. We could also potentially change it to construct these join queries, but we'd have to be fairly careful to ensure that it's built in a way that is general enough that both RDBMS and nosql could theoretically implement the "join" logic since entity is built to be database-independent.

Add ability to run update/delete privacy policy outside of mutator

This is useful for situations where putting a sequence of mutations in a transaction isn't possible, and thus a reasonable effort must be made ahead of time of the mutations to guess as to whether they would fail.

For example:

runInTransaction(() => {
  const someThirdPartyServiceID = await StripedZebra.makePaymentAsync(info);
  await PurchaseEntity.creator(viewerContext).setField('zebra_id', someThirdPartyServiceID).createAsync(); // this could fail due to privacy policy
});

In this case, there's no way to undo the third party API call when the entity creation privacy policy fails.

Something like the following seems to work. Would be useful to clean up and formalize into a sensible API.

static async canViewerUpdate<
    TMFields,
    TMID,
    TMViewerContext extends ViewerContext,
    TMEntity extends Entity<TMFields, TMID, TMViewerContext>,
    TMPrivacyPolicy extends EntityPrivacyPolicy<TMFields, TMID, TMViewerContext, TMEntity>
  >(
    this: IEntityClass<TMFields, TMID, TMViewerContext, TMEntity, TMPrivacyPolicy>,
    existingEntity: TMEntity,
    queryContext: EntityQueryContext = existingEntity
      .getViewerContext()
      .getViewerScopedEntityCompanionForClass(this)
      .getQueryContextProvider()
      .getRegularEntityQueryContext()
  ): Promise<boolean> {
    const privacyPolicy = new (this.getCompanionDefinition().privacyPolicyClass)();
    const evaluationResult = await asyncResult(
      privacyPolicy.authorizeUpdateAsync(
        existingEntity.getViewerContext(),
        queryContext,
        existingEntity
      )
    );
    return evaluationResult.ok;
  }

Add a way to record isolated operations

While debugging, it would be useful to be able to get information about entity loads/mutations/etc to see things like:

  • If it hit the DB, what was the query run
  • If it hit or wrote to the cache, what were the cache keys used
  • If it only went to the dataloader, indicate as such
  • etc...

An API for this could look something like:

const [queryContextAuditResult, entityResultFromInnerBlock] = await withIsolatedQueryContext(async (queryContext) => {
  return await BlahEntity.loader(viewerContext, queryContext).load(...);
});
console.log(queryContextAuditResult);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.