Comments (8)
+1 on doing it for all verbs.
We should not combine fields with separators. I think doing so will open us to issues when we'd need to escape the separator further down.
from guac.
cc @nchelluri
from guac.
I say we do it for all verbs. I think it's a bit more extra work for helping not just from an evidence perspective but debuggability and maintainability perspective for GUAC.
We can use the blobs whether or not they are API responses or documents or something else for help debugging when something goes wrong as we have the content saved. It also helps if we parse a document, API request/response, etc. differently in the future and need to re-parse the data.
from guac.
+1 , this looks good on adding this to SourceInformation
This will allow users to find the original document (or re-ingest in the case of failure) if they need.
i would say that this will not quite directly meet the usecase of "re-ingest in the case of failure". There needs to be another solution for that - something that is more on the pubsub and having a reprocessing pipeline, but that seems out of scope for this issue.
As an aside, one of the issues we've run to before with one of our other projects that has a similar data pipeline is the reprocessing is the issue of duplicates (@mdeicas has been looking at this).
from guac.
This will allow users to find the original document (or re-ingest in the case of failure) if they need.
i would say that this will not quite directly meet the usecase of "re-ingest in the case of failure". There needs to be another solution for that - something that is more on the pubsub and having a reprocessing pipeline, but that seems out of scope for this issue.
oh yes, this is not the solution to "re-ingest in case of a failure". This is your database blows up and you have to start from scratch.
from guac.
As an aside, one of the issues we've run to before with one of our other projects that has a similar data pipeline is the reprocessing is the issue of duplicates (@mdeicas has been looking at this).
Interesting, if there is lessons learned we can apply here that would be great.
from guac.
Sorry for the late response, but I think the lesson learned is that an ingestion pipeline may have been designed with an assumption of only ingesting documents once, or otherwise to be idempotent, and so it won't support re-ingesting documents to pick up new parsing features. It might be prudent to document this somewhere for clients?
from guac.
hmmm that is an interesting case. I added to our agenda to discuss
from guac.
Related Issues (20)
- [feature] Switch out archived github.com/golang/mock repo with maintained fork github.com/uber-go/mock HOT 3
- [feature] The documentRef GraphQL field is populated by the collectors
- [ENT bug] Query hits PostgreSQL 65535 parameters limit
- [feature] Certifier should use a more specific query and not get all nouns
- [feature] Add SPDX 3.0 support HOT 2
- [ingestion/data-quality issue] GUAC ingestion failing for SBOM file generated from blackduck scanning tool HOT 15
- [feature] Improve CDX parsing HOT 5
- [rolling] Community contribution ladder climbs 04/30 HOT 7
- [feature] Ent Versioned Migration HOT 1
- [bug] `certifyVulnSpec: { vulnerability: { noVuln: true }` not functioning on ENT?
- [feature] Provide Option to Use OpenSSF Scorecard REST API for Scorecard Ingestion HOT 2
- Create v0.8.0 milestone
- Create release template for a release checklist
- [feature] CSub GRPC features: reflection, healthcheck
- [ingestion bug] File collector fails with "operation not permitted"
- [feature] Only use Scorecard API so that we can use deps.dev/api/v3alpha HOT 4
- [feature] Explore wrapping the GraphQL API to allow it to be called via REST
- [feature] Update E2E tests (or create new ones) to test `guaccollect` and the other components HOT 1
- [feature] Implement collector for ClearlyDefined HOT 5
- [bug] guacone query vuln only returns one vulnerability when keyvalue is used HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from guac.