Comments (1)
Thanks for opening this @fflorent ! Going to dump in some thoughts a developer prepared about this in 2022. Grist has changed somewhat since, and also this was not a plan just one person's thoughts (I personally disagree with some of it), but the set of concerns raised could be helpful for things about this project.
Externalizing attachments
External store
We need a generic interface for storing and retrieving file data that can be implemented in different ways. One obvious data store is S3-compatible stores. Theoretically, the local filesystem might also work.
Migration
Once at least one store is implemented, _gristsys_Files could be deprecated, and a special migration could move the data from there to the external store. The usual Python migration system won’t be enough on its own because the data engine doesn’t see _gristsys_Files, but maybe it could make an external call to node to deal with that.
Downloading documents
We still want to be able to download a single self-contained .grist sqlite file containing all the attachments. When this happens, we’d need to:
- Make a copy of the database file
- Download all the externalized attachments and put them in the copy, perhaps back in _gristsys_Files
- Give that to the user to download
- When the document is uploaded again, perform the same process as the migration to move data to the external store.
This would also allow using downloaded documents in older versions of Grist.
Serving attachments without the DocWorker
Currently the client uses a special DocWorker API to view and download attachments. To serve the files, the DocWorker retrieves them from _gristsys_Files. In the first iteration of work, this would be changed to retrieving them from the external store instead. But in the long term, it would be nice if the client could bypass the DocWorker and retrieve the files directly from the store. S3 would work well for this, but other types of store may not allow this.
Deleting externalized attachments
Attachments are likely to contain sensitive data, and storing them longer than necessary is a security risk. When a user deletes an attachment, it’s reasonable for them to expect it to actually be deleted eventually, just like any other data, so that it can’t be leaked. This applies whether they deleted a row, a document, or an entire organisation. We can’t actually fully delete the data immediately in the first case because deleted rows still live in the snapshot history, but we should delete them eventually.
This is like the problem of tracking attachments referenced within a document, on a much larger scale. In this case actually tracking the references (or maybe just their counts) from documents to the external store seems essential. These would need to be updated whenever a document is copied or deleted within a Grist installation. We’d need to consider:
- “Duplicate Document”
- “Work on a copy”
- Other ways of ‘forking’ such as from fiddle mode or templates
- Creation and pruning of snapshots.
- Deleting a document permanently.
Downloading a document ‘disconnects’ it from the Grist installation so it doesn’t need to be counted. It has its own copy of the attachments so it should either delete or ignore the metadata about the externalized data.
Encryption
An alternative to tracking attachment references to allow deleting them is to encrypt the attachment data to avoid the need to delete it. Each attachment file would have a unique encryption key stored only in the corresponding row of _grist_Attachments . Once all copies of that row are fully deleted, the encryption key should be lost, and decrypting the data in the external store should become impossible. That means we don’t ever have to delete the actual data, so we don’t need to keep track of references to it.
Another security benefit of encryption is that if someone gains access to the data in the external attachments store, they can’t actually read it unless they also have the referencing documents.
One downside is that serving attachments directly from S3 instead of the DocWorker becomes more tricky. Decrypting and displaying a single encrypted file in the browser using SubtleCrypto and createObjectURL seems straightforward. But it’s a lot more delicate to handle a user scrolling through a grid filled with thumbnails, displaying them all efficiently and then reclaiming memory after they disappear from view.
Access Control
Would need thinking about. Important to preserve the property that the existing metadata (particularly fileIdent) is not enough to download the file, so that access is properly revoked even if someone has a past copy of the metadata. It might also be nice if the download URL couldn’t be computed purely from the file content, so that someone with a local copy of a file can’t test whether it exists in the document.
from grist-core.
Related Issues (20)
- Add a button/option to download csv/XLSX with colId as header
- [UX/UI] Sorting/Filter buttons
- Adding a user to a workspace can be very slow
- [UX/UI] Color styling for 'choices' / 'multiple choices'
- [UX/UI] Save button on column name editing HOT 1
- [UX/UI] Switch button added automatically on no datas line HOT 2
- [UX/UI] Multiple choice column pcking values
- [UX/UI] Adding / Deleting a file HOT 1
- [UX/UI] Column / Table tabs on right side panel
- Make the CI run migration tests on sqlite AND postgresql HOT 1
- [UX/UI] Sorting buttons position 'for me / global'
- Converting documents to tutorials or templates through the UI HOT 10
- Plugin API + Access Rules : return more informative message HOT 3
- team site user management may show misleading URL
- Cannot add [email protected] as a member of an organisation. HOT 2
- Ability to rename an organization (team site) HOT 1
- Doc Settings panel wraps wrongly in some laguages HOT 1
- Add Duplicate document action in menu of document card
- Missing header for reference or formula fields when exporting with colId
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grist-core.