Giter VIP home page Giter VIP logo

Comments (8)

mihaimaruseac avatar mihaimaruseac commented on June 24, 2024 2

+1 on doing it for all verbs.

We should not combine fields with separators. I think doing so will open us to issues when we'd need to escape the separator further down.

from guac.

pxp928 avatar pxp928 commented on June 24, 2024

cc @nchelluri

from guac.

mlieberman85 avatar mlieberman85 commented on June 24, 2024

I say we do it for all verbs. I think it's a bit more extra work for helping not just from an evidence perspective but debuggability and maintainability perspective for GUAC.

We can use the blobs whether or not they are API responses or documents or something else for help debugging when something goes wrong as we have the content saved. It also helps if we parse a document, API request/response, etc. differently in the future and need to re-parse the data.

from guac.

lumjjb avatar lumjjb commented on June 24, 2024

+1 , this looks good on adding this to SourceInformation

This will allow users to find the original document (or re-ingest in the case of failure) if they need.
i would say that this will not quite directly meet the usecase of "re-ingest in the case of failure". There needs to be another solution for that - something that is more on the pubsub and having a reprocessing pipeline, but that seems out of scope for this issue.

As an aside, one of the issues we've run to before with one of our other projects that has a similar data pipeline is the reprocessing is the issue of duplicates (@mdeicas has been looking at this).

from guac.

pxp928 avatar pxp928 commented on June 24, 2024

This will allow users to find the original document (or re-ingest in the case of failure) if they need.
i would say that this will not quite directly meet the usecase of "re-ingest in the case of failure". There needs to be another solution for that - something that is more on the pubsub and having a reprocessing pipeline, but that seems out of scope for this issue.

oh yes, this is not the solution to "re-ingest in case of a failure". This is your database blows up and you have to start from scratch.

from guac.

pxp928 avatar pxp928 commented on June 24, 2024

As an aside, one of the issues we've run to before with one of our other projects that has a similar data pipeline is the reprocessing is the issue of duplicates (@mdeicas has been looking at this).

Interesting, if there is lessons learned we can apply here that would be great.

from guac.

mdeicas avatar mdeicas commented on June 24, 2024

Sorry for the late response, but I think the lesson learned is that an ingestion pipeline may have been designed with an assumption of only ingesting documents once, or otherwise to be idempotent, and so it won't support re-ingesting documents to pick up new parsing features. It might be prudent to document this somewhere for clients?

from guac.

pxp928 avatar pxp928 commented on June 24, 2024

hmmm that is an interesting case. I added to our agenda to discuss

from guac.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.