Giter VIP home page Giter VIP logo

Comments (10)

zscholl avatar zscholl commented on May 18, 2024 2

AWS Config does not have a complete accounting of resources in AWS. IAM access keys is a good example. You can get IAM users/roles/groups out of config, but you cannot query access key IDs.

You could use it to get some subset of the data, but it wouldn't be complete.

from cloudquery.

James-Quigley avatar James-Quigley commented on May 18, 2024 1

That assumes you have a cloudtrail log. And it might be challenging to follow the stream. If the long running process dies, would it know where to pick back up, or would it repull everything, and then restart following the stream?

I like the idea of parallelizing as much as possible, and having robust retry/backoff logic for provider API calls. I made a separate issue for that: #59

from cloudquery.

yevgenypats avatar yevgenypats commented on May 18, 2024

So actually now CloudQuery concurrently pulls data from the same region. It's should be pretty easy to add the same logic for accounts. One issue I think we might hit is the rate limits if we will have too many concurrent API calls.

Few thoughts: Maybe we can add a variable that specify number of concurrent requests? Other option is to have one long running job that fetches all the data and then subscribe to cloudtrail logs to pull only resources that were changed? What do you think?

from cloudquery.

Rackme avatar Rackme commented on May 18, 2024

Rate limit could be a quick fix at first, as even with concurrency AWS API has rate limit by IP, not only by access key/role.

from cloudquery.

yevgenypats avatar yevgenypats commented on May 18, 2024

Yeah, I guess a robust retry/backoff should be part of the solution here. Also, There is AWS V2 which should be faster in general and I think we need to migrate to it. Another option is to try and pull data from AWS Config in bulk (Never tried it, so just an idea).

from cloudquery.

Rackme avatar Rackme commented on May 18, 2024

Hey @yevgenypats , what do you mean by 'AWS Config in bulk' ?

from cloudquery.

yevgenypats avatar yevgenypats commented on May 18, 2024

@Rackme I didn't do enough research yet but an idea I had in back of my mind is to try and use https://docs.aws.amazon.com/config/latest/APIReference/API_SelectResourceConfig.html or https://docs.aws.amazon.com/config/latest/APIReference/API_BatchGetResourceConfig.html api calls to somehow get the data not via the standard APIs and this should help with the throttling issue. Not sure it's possible though and this API might not have all the data we want. Are you familiar with AWS Config? maybe you can help me shed some light on this one?

from cloudquery.

Rackme avatar Rackme commented on May 18, 2024

@yevgenypats I've never used AWS config to pull a bunch of data, only for a few checks sorry ...

As you said some of already covered services by cloudquery (directconnect, emr, organizations) seems to miss in their schema :
https://github.com/awslabs/aws-config-resource-schema/tree/master/config/properties/resource-types

If there is a maximum response size, the API Select documentation is a little disturbing about the possibility to easily handle pagination, don't you think ?
LIMIT
Valid Range: Minimum value of 0. Maximum value of 100.

I've tried with select-resource-config, only 25 resources are returned per page.

from cloudquery.

James-Quigley avatar James-Quigley commented on May 18, 2024

I think AWS tends to rate limit at the account level. So if you run each account concurrently, they shouldn't step on each others toes unless AWS also implements a global rate limit based on IP or something like that

from cloudquery.

yevgenypats avatar yevgenypats commented on May 18, 2024

Solved with https://github.com/cloudquery/cq-provider-aws/releases/tag/v0.2.5

from cloudquery.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.