expo / entity Goto Github PK

View Code? Open in Web Editor NEW

116.0 13.0 3.0 10.08 MB

Entity is a privacy-aware data layer for defining, caching, and authorizing access to application data models.

License: MIT License

TypeScript 99.85% JavaScript 0.07% Shell 0.09%

data privacy authorization

entity's Introduction

Entity

Entity is a privacy-aware data layer for defining, caching, and authorizing access to application data models.

Core Features

Declarative actor authorization using Privacy Policies
Configurable data storage using Database Adapters
Configurable, optional full-object caching using Cache Adapters
Dataloader in-memory caching
Well-typed model declaration

Getting Started

Background

Authorization is the process of determining whether a user has access to a piece of data or a feature.

One could imagine a simple application with users and their photos. The authorization logic is simple: when the user loads their photos, only query photos WHERE user_id = user.id. A more complex authorization system is most likely overkill at this point.

Now, lets add teams to our simple application, where users on the same team can see each others' photos. The authorization logic becomes more complex: WHERE user_id = user.id OR user_id IN (list of users for all organizations that user belongs to). While still maintainable, one can see that as requirements are added, this logic becomes increasingly difficult to express in just the query or simple checks in code.

A common next step is to add an authorization system on top of the data loading layer. Pundit, Django Rules, and Laravel Policies are examples of excellent libraries that provide a method to authorize a piece of loaded data in the following manner:

PhotoModel
    def authorize_read():
        if rules.is_photo_owner(user, photo)
            return true
        if rules.has_organization_permission(user, photo)
            return true
    def authorize_create():
        ...

PhotoView
    def render():
        photo = Photo.find(params[:id])
        authorize(photo, 'read')
        render_html(photo)

This works well and is flexible since it allows executing ad-hoc authorization checks. Most libraries also provide hooks into views or controllers such that these authorization checks are performed automatically. This is sufficient for many applications but still has one main drawback: it is prone to error in cases where the authorization check is forgotten or the incorrect check is performed.

The Entity framework solves this by adding an additional property to the system: all data accesses are authorized. Given an object and a viewer, the framework provides a clear and testable mechanism for expressing complex relationships between object and viewer needed to authorize access during CRUD operations, and makes it impossible to perform CRUD operations without performing the authorization checks. This combines the data load and authorization steps from above into a single step:

class PhotoPrivacyPolicy {
  const readRules = [
    new AllowIfOwnerRule(),
    new AllowIfOrganizationPermissionRule(),
  ];
}

// in the view, for example
async function get_photo_page(viewer: ViewerContext): string {
  const photo = await PhotoEntity.loader(viewer).loadById(id);
  return render_html(photo);
}

Use Case

Entity is not limited in where it can or should be used, but was designed for use in a Koa-like environment with a request and response. At Expo, we use Entity in the following manner:

A request comes into Koa router
Middleware initializes the Entity framework for the request
A ViewerContext is created identifying the individual making the request.
The request fulfiller uses the Entity framework and the ViewerContext to load or mutate some data and return a response.

Note: The entity framework instance should not be shared across multiple requests since it contains a unique memoized Dataloader. A long-lived instance is prone to data synchronization issues, especially when the application is scaled horizontally and multiple shared caches would exist for the same data.

Releasing

To release a new version:

git checkout main
yarn lerna publish [patch|minor|major] -- --conventional-commits
In GitHub release interface, create a new release from the tag, copy changelog changes to release description.

License

The Entity source code is made available under the MIT license.

entity's People

Contributors

Stargazers

Watchers

Forkers

doytsujin akshay5995 seeyouin2x5x

entity's Issues

Translate uniqueness constraint violation to EntityError

In entity-database-adapter-knex, an error with code 23505 means a unique constraint was violated in postgres. It may be worth looking into translating this type of error into an EntityError subclass. Or maybe not.

CASCADE_DELETE_INVALIDATE_CACHE -> CASCADE_DELETE_INVALIDATE_CACHE_AND_RUN_TRIGGERS

Investigate distributed reader-writer locks for entities

To guarantee cache consistency (and dataloader consistency) across a set of machines or even across requests and keep the read-through caching strategy that we employ, we'll need to implement some sort of distributed reader-writer lock for entities by type and ID.

Another interesting way to mitigate this temporarily but not theoretically fix it is to defer all cache invalidation to the end of the write transaction so that it becomes less likely to have a read-through cache inconsistency (though not impossible still obviously).

Entity Relationships

This is a fairly broad task covering:

Add ability to express relationships (associations, foreign keys, etc) in EntityFieldDefinition
Add ability to query relationships easily. Put another way, add a way to nicely query the associationLoader for the relationships defined in EntityFieldDefinition.
Add support for cascading deletes to the relationship definition, and update the delete mutator to start a transaction, topologically sort the relationship graph to delete and everything that depends on an entity, and delete from the leaves to the root in the transaction.

Support non-string IDs in StubDatabaseAdapter

Currently this simulates ID creation by generating a UUID for the entity id field. Instead, we should check the type of field and appropriately generate either a random number, string, boolean(?), etc.

Thanks @quinlanj for the report!

Shared table with disjoint rows example can't loadMany

The shared table example isn't possible to issue a loadMany on since if one is the wrong type the whole thing will fail. We should probably have a specific error type for constructor errors and send those in the result rather than throw them.

SecondaryCacheLoader.fetchObjectsFromDatabaseAsync should return sparse map

Context: https://github.com/expo/universe/pull/7641/files/90dc1071ada997f57c0b8d33b7b0e7d7e5fe2b15#r641291143

Looking at this PR, I wonder if EntitySecondaryCacheLoader should allow this method to return "sparse" maps that are missing some of the provided keys. A missing key would mean the object was missing from the database and be treated the same way as null or undefined. This way, implementers of fetchObjectsFromDatabaseAsync wouldn't need to pad maps with null entries like this.
EntitySecondaryCacheLoader would then fill in null for all of the missing keys so that the caller of the loader still gets a map with every single key in it, potentially with null values.

Add nullable loadByID EntityLoader functionality

One common use case we're noticing is wanting to check if an ID exists. Currently the best way to accomplish that is by doing something like:

await TestEntity.load(vc).loadByFieldEqualing('id', id)

We should simplify this to something like:

await TestEntity.load(vc).loadByIDNullable(id);

or maybe even make loadByID nullable by default and add a enforceLoadByID or something.

Create an eslint rule to enforce async methods have Async suffix in method name

Add support for underlying database adapter column renaming

There are instances where renaming or gracefully deleting a column in a database is useful. The entity framework should support something like "aliasing" where if it receives a column in either the old name or the new name it puts the data in the field.

Add queryContext to params docblock in EntityPrivacyPolicy

Noticed this was missing in generated TS defs, which means it's missing in the code itself. Shouldn't be hard to add, can just copy the comment from another place that has it.

Move testfixtures into separate package

Conventional commits parser fails when using "!" to signify breaking change

conventional-changelog/conventional-changelog#648

Angular parser is the correct one I think https://github.com/conventional-changelog/conventional-changelog/tree/0d7385543cbf14394206c2f739a82d1ccf118586/packages/conventional-changelog-angular

As described in https://www.conventionalcommits.org/en/v1.0.0/.

This is making the changelog incorrect for our project since I've been following that spec.

Redis cache key should use column name instead of field name

This is because field renames should be a safe, code-only change and not need a cache key bump.

Add generated JSDoc site

Since all the docblocks in the code are JSDoc formatted, we can autogenerate a documentation site.

Throw error when more than one entity is deleted or updated

While deletes happen by entity ID, there's no way to guarantee at the framework level that the field specified in the entity definition is unique or an ID.

One way we could do this is by ensuring that updates and deletes only affect a single row in the database adapter, and throwing otherwise.

Use @expo/batcher for batch inserts

Documenting this for posterity, but after investigating it looks like we probably won't want to use it for general-purpose mutations in entity.

Background

Let's say we have an entity with one integer field, backed by a postgres DB with a unique constraint on the column. Batch inserting is something that isn't very well supported by entity as it would need to run N inserts to insert N items into the same table.

The following efficient query would be far less efficient expressed in entity code:

INSERT INTO blah_table (num) VALUES (1), (2), (3), (4) RETURNING *;

Entity equivalent:

const numsToInsert = [1..4];
const results = await Promise.all(numsToInsert.map(n => TEntity.creator(vc).setField('num', n).enforceCreateAsync()));

which translates to (in parallel):

INSERT INTO blah_table (num) VALUES (1) RETURNING *;
INSERT INTO blah_table (num) VALUES (2) RETURNING *;
INSERT INTO blah_table (num) VALUES (3) RETURNING *;
INSERT INTO blah_table (num) VALUES (4) RETURNING *;

Hypothesis

We can use https://www.npmjs.com/package/@expo/batcher to coalesce inserts into the DB. That way, the example above could translate into the first query automatically if all the entity writes are executed in a short enough period of time for batcher's batch.

Conclusion

Batcher's error handling doesn't quite work for our case. Let's say a row existed in the able with num = 3 and we tried to insert a batch: INSERT INTO blah_table (num) VALUES (1), (2), (3), (4) RETURNING *;. This whole query would fail, so all the batch items would fail. This could cause some unintended consequences in entity if we're inserting multiple entities of the same type from very code locations at the same time (same batch). It would mean that there could be a side-effect where a failure to insert one entity in codepath A could cause a failure to insert an entity for codepath B even if A and B are supposed to be independent.

Document entityUtils

Add inline docblocks to entityUtils. One of the few places where documentation is missing.

feature request: createWithUniqueConstraintRecoveryAsync

https://github.com/expo/universe/pull/10903#pullrequestreview-1192479168

Field Validation

Entity fields should do basic validation.

UUIDField should validate that it's a UUID
Primitive fields should validate their types
EnumField should validate value being a member of a specified enum
Custom validation functions should be possible as well.

feature request: loadByCompositeKeyAsync

https://github.com/expo/universe/pull/10823/files#r1020707597

Add codecov

Once repo is made public, add codecov. This is so that we don't need private repo token.

Add optional queryContext arg to canViewerUpdateAsync

So that this method can be run transactionally.

Incorrect loader types for field selection

Should only be able to load by fields in field selection

Investigate using and enforcing conventional commits

https://www.conventionalcommits.org/en/v1.0.0/

Then, in lerna use them to automatically generate CHANGELOGs and version bump.

Add ability to just invalidate an entity's cache

There will probably be times that a manual update or delete raw SQL query is needed, and the entity framework should provide a mechanism to invalidate something manually that is now known to be invalid.

Investigate fan-out join queries for loaders (cache warming)

Entity works on full rows of data (the projection of all queries is *). This is because the privacy logic of a privacy policy is allowed to use the entire entity rather than just a subset of fields, which makes for a very expressive potential of authorization logic.

Joins are extremely powerful in the sense that they allow combining two or more tables in order to create a third intermediate/temporary "table". The issue is that in the general case, this third table isn't possible to express using an entity (since it's dynamic) and therefore creates an issue with how to authorize access to a row of that "table".

So, the tradeoff that entity makes is to not allow joins and to require doing them in the application in exchange for authorization correctness. While this does incur potentially higher memory and an extra round trip to the DB, the tradeoff is most often worth it since we can now guarantee that all data access is authorized and the authorization is sound.

Now, there is a very narrow join case that could work with entities, and that's the case of doing a fan-out load with the projection limited to the * of the resulting object. For example:

select b.*
from apps ap
join builds b on b.app_id = ap.id
where ap.account_id = 'aaa-bbb-ccc-123'

This seems like an interesting case to investigate since it is theoretically possible to authorize the returned objects since they have entity type "build". The association loader could make use of it as well. Association loader is currently just a set of convenience methods that load the chain of entities specified and currently only supports 1:1 foreign keys but it's definitely possible to extend it to 1:n for fan-outs. We could also potentially change it to construct these join queries, but we'd have to be fairly careful to ensure that it's built in a way that is general enough that both RDBMS and nosql could theoretically implement the "join" logic since entity is built to be database-independent.

Add ability to run update/delete privacy policy outside of mutator

This is useful for situations where putting a sequence of mutations in a transaction isn't possible, and thus a reasonable effort must be made ahead of time of the mutations to guess as to whether they would fail.

For example:

runInTransaction(() => {
  const someThirdPartyServiceID = await StripedZebra.makePaymentAsync(info);
  await PurchaseEntity.creator(viewerContext).setField('zebra_id', someThirdPartyServiceID).createAsync(); // this could fail due to privacy policy
});

In this case, there's no way to undo the third party API call when the entity creation privacy policy fails.

Something like the following seems to work. Would be useful to clean up and formalize into a sensible API.

static async canViewerUpdate<
    TMFields,
    TMID,
    TMViewerContext extends ViewerContext,
    TMEntity extends Entity<TMFields, TMID, TMViewerContext>,
    TMPrivacyPolicy extends EntityPrivacyPolicy<TMFields, TMID, TMViewerContext, TMEntity>
  >(
    this: IEntityClass<TMFields, TMID, TMViewerContext, TMEntity, TMPrivacyPolicy>,
    existingEntity: TMEntity,
    queryContext: EntityQueryContext = existingEntity
      .getViewerContext()
      .getViewerScopedEntityCompanionForClass(this)
      .getQueryContextProvider()
      .getRegularEntityQueryContext()
  ): Promise<boolean> {
    const privacyPolicy = new (this.getCompanionDefinition().privacyPolicyClass)();
    const evaluationResult = await asyncResult(
      privacyPolicy.authorizeUpdateAsync(
        existingEntity.getViewerContext(),
        queryContext,
        existingEntity
      )
    );
    return evaluationResult.ok;
  }

Add a way to record isolated operations

While debugging, it would be useful to be able to get information about entity loads/mutations/etc to see things like:

If it hit the DB, what was the query run
If it hit or wrote to the cache, what were the cache keys used
If it only went to the dataloader, indicate as such
etc...

An API for this could look something like:

const [queryContextAuditResult, entityResultFromInnerBlock] = await withIsolatedQueryContext(async (queryContext) => {
  return await BlahEntity.loader(viewerContext, queryContext).load(...);
});
console.log(queryContextAuditResult);

expo / entity Goto Github PK

entity's Introduction

Entity

Core Features

Getting Started

Background

Use Case

Releasing

License

entity's People

Contributors

Stargazers

Watchers

Forkers

entity's Issues

Background

Hypothesis

Conclusion

Recommend Projects

Recommend Topics

Recommend Org