Giter VIP home page Giter VIP logo

Comments (1)

Rugvip avatar Rugvip commented on May 30, 2024

There are a couple of things that get wired together for this ingestion e2e. The entity provider will create a location in the catalog with the wildcard intact, that will in turn be picked up by the UrlReaderProcessor and treated as a "search" target here:

const response = await this.options.reader.search(location, { etag });

That in turn is gonna take us to the GitHubUrlReader in this case, which in turn does an initial search request here:

const recursiveTree: GhTreeResponse = await this.fetchJson(
treesUrl.replace('{/sha}', `/${sha}?recursive=true`),
init,
);

If that response is not truncated, it will continue on to do individual fetches for each matching file over here:

const blob: GhBlobResponse = await this.fetchJson(item.url!, init);

This fetch I think we need to make sure doesn't happen for this case. I don't know if it is right now, but based on the error message it seems like it might. What would be better is if it instead falls through to do a full reading of the repo tree so that all files are available locally over here:

const tree = await this.doReadTree(archiveUrl, sha, '', init, {

If we end up there we don't have any issues with individual file fetches, since it's all available upfront.

So to move this ahead we'd appreciate if anyone wants to validate the above hypothesis that the initial search is not truncated. If that ends up being the case, I think we should probably add some additional check that cases us to fall back to the read tree path in case of very large repos.

from backstage.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.