Comments (4)
@pudo I’ve added this as an issue here mostly to keep track of it as it’s part of a bigger project in Aleph. I’d be happy to put in a PR to resolve this though!
from followthemoney.
One thing I'd consider before making this guarantee to downstream consumers is how this will intersect with entity fragment aggregation. I've looked at this before a bit in the context of making the pages of a document show up in proper order in the indexText/bodyText props, and could never quite conceptualise all the places in the pipeline that would need to be updated for this guarantee to really be held. It's also something that will be really hard to do wrt. to the statement-based data model I use downstream in nomenklatura, but that's more of a "me" problem :)
from followthemoney.
One thing I'd consider before making this guarantee to downstream consumers is how this will intersect with entity fragment aggregation. I've looked at this before a bit in the context of making the pages of a document show up in proper order in the indexText/bodyText props, and could never quite conceptualise all the places in the pipeline that would need to be updated for this guarantee to really be held.
Thanks for mentioning this, I wasn’t aware of the (potentially) incorrect order of indexText
property values before. My (maybe naive?) assumption though is that preserving order of property values shouldn’t make anything worse (or might even be necessary in order to be able to tackle them in the future), right?
While it won’t automatically solve the potentially incorrect order of indexText
values (because the values are stored in different fragments), it will solve the issue with previews of multipart emails (because all parts of the original email end up in the same fragment).
from followthemoney.
@pudo Do you have a preference regarding the alternative data structure?
- Using dict keys (or a small wrapper around dicts)
- Using lists (and manually checking if a value already exists when adding new values)
- Using a third-party ordered set implementation.
I have a personal preference for 2 to keep it simple, and I don’t think the perf overhead for O(n) membership tests would be relevant in Aleph. Not sure though about other FtM use cases like OpenSanctions?
from followthemoney.
Related Issues (20)
- Discussion: include statement data model in core ftm? HOT 1
- Better error handling regarding invalid domains in e-mail cleanup
- Pin gh-action-pypi-publish to a stable version
- Release information is error prone
- Reimplement FtM documentation
- Person entities have `incorporationDate`, `dissolutionDate` HOT 3
- Bussin
- label this issue for realz
- Suggest to loosen the dependency on fuzzywuzzy HOT 1
- Rename master branch to main
- Updated README to include release instructions HOT 2
- Events involved doesn't accept 'entities' list, but only entity. HOT 1
- `Passport` has two different passport number properties HOT 4
- Proposal for some schema changes HOT 5
- Entity for a mail message/parcel
- Problem with encoding values HOT 2
- class doc links are broken on Schema Extension page HOT 2
- Diffing streams HOT 1
- nitpick: write_entity should close the file handler HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from followthemoney.